Automating Domain Recon in a Pentest

Great. A new penetration test task just landed in your inbox. Your boss or customer however ordered a black box pentest without giving much more information than a domain name.

Typically, when moving into a penetration test, you start off with a recon phase. Before you start hacking and slashing your way into the wordpresses of this world (and jekylls, ouch!), you start by gathering intelligence. While the first step could be just browsing the site and getting an idea of what you are about to assess the security of, the second step could already be enumerating the domain for subdomains and its' hosts. As always, more than one way leads to root Rome.

Here is how I do domain enumeration. You may find this applicable to your workflow.

Recently, I've come to like dnsrecon. The tool can be considered a Swiss Army knife for domain reconnaissance. Essentially, it helps to discover hosts, subdomains, and related domains from an initial set.

Herein I will use my domain, to demonstrate the process. I use a script similar to the following one. It serves three purposes:

  1. Applying an ordered folder structure: There is no sense in having the best results, if you can't find them again.
  2. Run all dnsrecon scans on all targets. I will explain shortly, what the types stand for. For now, just note that for all domains all type scans will be performed after one another. Each spawned with 16 threads. Sometimes it may be required to set the lifetime parameter higher than the default of 3. I'm using 10.
  3. Lastly the results are exported as DOMAIN/dnsrecon-TYPE.xml. This makes it easier to find and parse them later. You might want to switch the output to json or csv, as you like!
#!/bin/bash
PROJECT=chmey
TYPES=(std rvl brt srv axfr bing yand crt snoop tld zonewalk)
DOMAINS=(chmey.com)
for d in ${DOMAINS[@]}; do
    mkdir -p ~/Documents/Pentest/$PROJECT/Recon/$d
    for t in ${TYPES[@]}; do
        python3 dnsrecon.py -d $d -D ~/Downloads/subdomains.txt --lifetime 10 --threads 16 -t $t --xml ~/Documents/Pentest/$PROJECT/Recon/$d/dnsrecon-$t.xml
        done
done

Now, you may have noticed I've included the parameter -D subdomains.txt. To explain this, let's dive into the scan types.

  1. The std scan performs all the regular DNS requests you might see when performing a regular dig with options on the domain. You will receive a set of SOA, NS, A, AAAA, MX and SRV records. Essentially, these are important to identify servers directly hosted under the domain or where to find them. These can be mail servers, application and web servers and even external servers that are just referenced. Additionally, you will receive interesting TXT and SPF records which may indicate validated domains. If you are not familiar with DNS record type, refer to Wikipedia and/or their corresponding RFCs.
  2. Secondly, the rvl scan performs a reverse lookup of a given IP range. This is not of much use, unless we submit a -r IP_RANGE parameter. I haven't used the option much. Maybe it is worth a try to perform a reverse lookup if you identify an IP range supposedly owned by the target. At the initial stage, maybe skip it.
  3. The brt scan however, is very useful to us. Remember the domain list? This scan type uses that list to find existing subdomains from that list as a sub to the given domain. In this example, I've used the subdomain.txt from here. Essentially, dnsrecon will try and report if whisky.chmey.com, flores.chmey.com, pclab.chmey.com and 19997 others, all a combination from each line of the subdomain file and the target domain, resolve to an address. This step is very important to find other applications and servers hosted under the same domain.
  4. axfr scans all linked name servers for zone transfer. If not protected and hardened zone transfer allows for receiving a copy of the whole zone. Thus we might be lucky and learn a lot hosts at once.
  5. The crt scan is really interesting as it allows searching the certificate database crt.sh for mentions of the domain and its subdomains. This is very useful and can quickly result in a large set of subdomains, much faster than brt.
  6. bing and  yand return search engine reults for subdomains and hosts.
  7. Cache-snooping via snoop may reveal recently queried (sub-) domains from the linked name servers.
  8. In the tld scan, the top level domain will be replaced by all possible TLDs registered in IANA. This can result in a lot of new domains, but may also give a lot of false positives.
  9. Lastly, the zonewalk scan, attempts to gain information from walking the DNSSEC signing chain.

Quite the powerful tool, aye?

Handling records

Now you can either browse through the created record files by hand and look for more information - or automate it. 😉

I'm using a Python script to parse the relevant information. At this time, it is important to remove duplicates from the results. Due to running a multitude of scans at once, the same IP address or host can appear as a result of multiple scans.

In the script you'll find below, I am essentially looping over all dnsrecon XML reports and extracting relevant information from the records. The data is split into lists by their record type and can eventually be pushed into persistent storage or used in next steps.

Lastly, I am also performing an ASN lookup on discovered IP addresses. ASN resolution can be useful to identify on-premise vs externally hosted appliances. For IP-to-ASN resolves, I am using a local IP2ASN resolver API. Every IP is just passed to the API and the ASN information is retrieved as json.

#!/bin/python3
import xml.etree.ElementTree as ET
import requests
import os
import logging
logging.basicConfig(level=logging.INFO)

listIPs = []
listDomains = []
listInfoASN = []
listRecordsNS = []
listRecordsTXT = []
listRecordsNS = []
listRecordsSRV = []
listRecordsMX = []


def parseFile(filename):
    """Parse a dnsrecon result xml file and add findings to global lists."""
    global listIPs, listRecordsMX, listRecordsSRV, listDomains
    global listDomains
    root = ET.parse(filename).getroot()
    records = root.findall('record')
    for r in records:
        t = r.get('type')
        if t in ['A', 'AAAA']:
            listIPs += [r.get('address')]
            listDomains += [r.get('name')]
        elif t == 'MX':
            a = r.get('address')
            if a:
                listIPs += [r.get('address')]
            listRecordsMX += [r.get('exchange')]
        elif t == 'SRV':
            listIPs += [r.get('address')]
            listDomains += [r.get('name')]
            listRecordsSRV += [(r.get('target'), r.get('port'))]
        elif t == 'CNAME':
            listDomains += [r.get('name')]
            listDomains += [r.get('target')]


def resolveASN(IP):
    """Resolve the IP to its AS information by using github.com/chmey/py-iptoasn"""
    result = {}
    try:
        result = requests.get(f'http://localhost:8080/api/ip/{IP}').json()
    except Exception:
        logging.error(f"Failed resolving {IP}")
    return result


logging.info(f"Started in {os.getcwd()}")
for file in os.listdir(os.getcwd()):
    # Process all dnsrecon*.xml files in the current working directory.
    if file.startswith('dnsrecon') and file.endswith('.xml'):
        logging.info(f"Processing: {os.path.join(os.getcwd(), file)}")
        parseFile(os.path.join('', file))
uniqueIPs = list(set(listIPs))  # remove duplicates
for IP in uniqueIPs:
    listInfoASN += [resolveASN(IP)]  # now: uniqueIPs[i] corresponds listInfoASN[i]
# Now store / reuse lists!

Ready for the next step?

Once the gathered data is nice and tidy, it is a good idea to save it to a permanent storage. You could save them locally, for example as a csv or shuffle them into a NOSQL DB and share with your coworkers. Whatever you do in this step depends on what is best for your workflow and what you want to use the data for.