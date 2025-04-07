No matter your pentesting experience, you’ve heard this countless times: Reconnaissance is the most important stage of your attack methodology. This sentence has deep implications in our pentesting work. It gives us hints on how to manage our engagement time, gives us the fallback step when we can’t find anything, and reminds us we need to understand our target to achieve impact on it.

Without mapping your target’s attack surface to find attack vectors specific to that target, you can’t go further: If you don’t find any vulnerabilities, it’s because your recon was not thorough enough or deep enough, so it’s time to get back to it again.

Remember the meaning of “to hack” and you get the point. ;)

Sun Tzu already explained this essential fact centuries ago: Why is recon so important? Well, it’s important for at least three good reasons: Getting a strategic attack surface view Well done reconnaissance allows ethical hackers to develop a comprehensive, strategic view of the target organization or assets. This gives us a good lay of the land, shows us potential avenues (and where we'll hit roadblocks), and helps us use our pentesting experience and intuition to pinpoint where we can make the biggest impact. Focusing effort for maximum impact As pentesters or red teamers, we're always short on time and our focus has limits, no matter how driven we are. That's why having a detailed map of the target's attack surface and potential attack paths is crucial.

It lets us concentrate our efforts on what matters to deliver real impact. We avoid wasting time on dead-end exploits, pointless brute-forcing, or falling into a rabbit hole because we missed a key attack vector. Keeping OPSEC When conducting reconnaissance, another parameter to consider is to maintain OPSEC (operations security), especially when conducting a red team engagement. Indeed, we don’t want our IPs being blacklisted for too much target probing or the blue team to detect our attack infrastructure when we’re on the very first step of our cyber killchain:

The pentesting framework methodologies (OSCP, CEH, PTES, etc.) usually divide recon into passive and active steps, with most activities falling behind the “ footprinting ” concept.

Passive footprinting means we gather information without directly interacting with the target, so we don't get detected or leave any log entries. Instead, we rely on any third party OSINT source ( Google dorks , Linkedin jobs offer, TLS certificates records, and the like) to get info on our target. Active footprinting involves direct interaction with target assets, creating network traffic and leaving traces in logs and defensive equipment, clearly indicating we're preparing for an attack. For this will use Nmap services scans, web component scanning , CMS scanners, and other vulnerability scanners fit to our scope. This may be common knowledge and how most people learn recon, but nowadays those definitions are a little outdated (the last update to the PTES framework dates from 2017). Things have changed. We can now do “passive fingerprinting” by querying services like Shodan and Censys, which scan the internet for us. Also, 'active' TCP SYN scans, once a clear sign of attack preparation, are now so common that they're no longer a reliable indicator. So, just labeling reconnaissance as 'active' or 'passive' isn't accurate anymore. These changes mean we need to update our reconnaissance methods and go back to clearly separating footprinting and fingerprinting, with both 'passive' and 'active' steps within each. This is what we’ll explore in depth in this guide. For the moment let’s just acknowledge this is how things work now.

I prefer to make this distinction between footprinting and fingerprinting: FOOTPRINTING involves discovering the target organization's exposed attack surface through passive and stealthy active techniques - actions that don't alert the target to potential attacks. The goal is to gather enough information to get a broad understanding of their infrastructure and identify valuable targets on the attack surface, for further investigation,without alerting the target of the attack preparation. FINGERPRINTING identifies specific attack vectors through passive steps and also by actively probing the target to determine the versions and types of its exposed services and applications. Essentially, it reveals which potentially vulnerable services are worth trying to exploit. At this stage of our attack process, we accept the risk of being detected and alert our target blue team. We'll interact directly with the organization's systems, using tools and methods that clearly indicate we're preparing for an attack. When the scope is small, it's easy to combine footprinting and fingerprinting because the amount of data is manageable. But for large targets, such as ISPs or multinational companies with thousands of exposed systems, it's essential to keep them distinct. This is especially true in red team engagements (like those following TIBER EU guidelines) or wildcard bug bounty programs, where the scope is inherently massive.

A bug bounty case study: recon an organization network exposed infrastructure To show how important it is to have a strong reconnaissance methodology, we'll perform footprinting and fingerprinting steps on SpaceX/Starlink, which has a large-scale bug bounty program on Bugcrowd .

Let's go! Starlink's bug bounty program on Bugcrowd covers a wide range of targets, including a large number of IP addresses. Figure out the target’s ASN

We'll begin with this IP list, and our first goal is to find Starlink's Autonomous System Number (ASN). Note the IPs excluded from scope in a file named ‘exclude.txt’, we will then use it with MASSCAN . What is an ASN?

An Autonomous System Number (ASN) is assigned to each organization that manages its own range of internet addresses (IPv4). These Autonomous Systems are interconnected via the BGP protocol. This is how the internet functions: a vast network of routing paths connecting these individual systems.

Dig through WHOIS IP data You're most likely familiar with WHOIS , which stores ownership details for domain names. But did you know there's a similar registry for IP addresses?

This IP registry will tell us: The IP range that a specific IP belongs to

The organization that owns that range

The Autonomous System Number (ASN) associated with that range.

This is exactly the information we need to map out all the IP ranges belonging to an organization. Keep in mind, some organizations might have multiple ASNs. To get started, let's grab one of the IP addresses listed on Bugcrowd, like 192.31.242.1. By looking at WHOIS data, we can see that SpaceX has its own AS number: 27277 . Determine the target’s ASN ranges Now let's look for all the IP ranges of the target organization. To do this, we will query the WHOIS RADB or Internet Routing Registry. But first let’s see what the RADb actually is. The Routing Asset Database (RADb) Example: BGP routes visualization The Routing Assets Database (RADb), formerly known as the Routing Arbiter Database is a public database in which the operators of Internet networks publish authoritative declarations of routing policy for their Autonomous System (AS) which are, in turn, used by the operators of other Internet networks to configure their inbound routing policy filters. The RADb, operated by the University of Michigan's Merit Network, was the first such database, but others followed in its wake, forming a loose confederation of Internet routing registries, containing sometimes-overlapping, and sometimes-conflicting,[1] routing policy data, expressed in Routing Policy Specification Language (RPSL) syntax.

Source: Wikipedia Using a reverse Whois query For this we are making a reverse WHOIS request to look for entries that correspond to our AS number in the RADb.

whois -h whois.radb.net -- ' -i origin AS27277 ' | grep -Eo " ([0-9.]+){4}/[0-9]+ " | head Copy

Command breakdown: whois → This is the WHOIS command used to query databases for information on domains, IP addresses, and autonomous systems (AS). -h whois.radb.net → Specifies that the WHOIS query should be sent to the whois.radb.net server. -- → This separates the command options from the whois query itself. It ensures that -i is treated as a flag to whois request rather than an option to the whois command itself. '-i origin AS27277' → This performs an inverse lookup for the given Autonomous System Number (ASN) ( AS27277 ). The -i origin flag tells the WHOIS server to search for IP prefixes (routes) announced by AS27277. We now have the list of IP ranges that belong to SpaceX. Redirect the output of this command or copy and paste it in a file named asn_ranges.txt Let's see what we can get from it!

Now, we'll use Robert Graham's excellent tool, MASSCAN , to perform a TCP SYN scan on every IP address within SpaceX's IP ranges.

I know some of you are thinking, 'Wait a minute, this isn't footprinting! You're directly interacting with their systems!' And you're right.

However… Hiding into the Internet noise Typically, a TCP SYN scan is considered a fingerprinting step (or, as some call it, Active footprinting). But let's hear what Andrew Morris, the GreyNoise founder, says about it: There are three main types of actors and activity: known benign mass scanners such as Shodan, Censys, and Sonar; malicious mass scanners such as Trojans, worms, and botnets; and unknown mass scanners, which is everything else. Most scanning activity falls into this unknown category. Of the millions of IP addresses scanning the Internet, 27 IP addresses are associated with Shodan, 334 IPs with Censys, 56 for Sonar, 145 for NetCraft, 228 for Shadowserver, and 253 IPs for BinaryEdge. In comparison, Grey Noise has tracked 249,000 IP addresses associated with Mirai botnet, 92,000 for SSH worms, and 590,000 compromised residential routers attacking other people. As for the unknown bucket, it isn't always clear what those scanners are doing.

Source: Mapping the Internet, Who’s Who? (Part Three) By Fahmida Y. Rashid Source: Radware TCP scanners activity visualization Thousands of IP addresses scan the internet every day for countless purposes, many of which are legitimate. While TCP SYN scans get logged, organizations simply don't have the resources to distinguish between actual attacks and normal internet traffic. That said, these logs are crucial for attribution after a successful attack.

For this step, we're going to use MASSCAN to scan every IP range we identified. To do this: Check your asn_ranges.txt file is in the current directory. Populate your exclusion file with out of scope IPs from BugCrowd scope. Use a FOR loop to execute MASSCAN on each IP range listed in asn_ranges.txt . Direct the output of each MASSCAN scan to a single file named largescan.dat . Note: The file name largescan.dat is crucial for compatibility with the BIGF00t.sh script we'll use later.

for ip in $(cat asnranges.txt);do masscan -p 1-65535 --rate 10000 $ip --excludefile exclude.txt >> largescan.dat;done Copy

Then go for a walk. A long walk. Masscan will probe 218100480 TCP ports ;). About the Masscan rate, remember two things: Masscan uses a fixed packet sending frequency, unlike Nmap , which adapts its scanning speed based on its network environment. Masscan's speed can have undesirable effects if run from within a corporate network. It is therefore recommended to run Masscan from a VPS directly exposed to the Internet. Using a high scanning speed results in a degradation in the quality of the results, as some open ports may appear closed. Here we set it at 10000p/second, which is stil more faster than Nmap “aggressive mode” and should avoid false negative results. I developed BIGF00T.sh while red teaming RECON on an entire ISP, so please consider it as a PoC.

I needed a quick and easy way to process and triage the MASSCAN output from scanning hundreds of thousand of IPs looking for specific services exposed. While other solutions exist, I wanted something that relied solely on shell scripting, without the complexity of setting up a web server or other infrastructure.

When scanning a large organization’s exposed network, you can find interesting targets by looking at the Top Port Rank list from BIGF00t.sh. Indeed, iIt's true that critical network nodes within an organization often expose a different number of TCP ports compared to regular servers. Plus, IP ranges are frequently organized logically, grouping server types based on their function within the organization. You can see this pattern in our Top Port Ranking list from BIGF00t.sh: We've identified four distinct asset groups within AS27277: 1 server with a lot of open ports: 192.31.242.93

1 server with 4 unusual ports open (1935,10023,18100,18255): 192.31.242.94

1 server with only TCP 8443 open : 192.168.242.240

3 servers on the range 66.9.191.0/22 with the same two unusual ports open : 10022, 18255

3 servers on the range 192.31.243.0/24 with only TCP 443 open.

15 servers on the range 192.31.242.0/23 with only web TCP ports open - 443 and 80.

By looking at how the open ports are distributed, we can see the IP addresses are logically organized based on server function. They're also following security best practices by only exposing necessary services and their corresponding TCP ports. This allows us to develop several hypotheses: 192.31.242.93: this server likely handles core infrastructure, as it has a large number of services exposed.

192.31.242.94: this is also probably a core infrastructure server, but with a limited set of specialized and unusual TCP ports.2 ports exposes a webserver on 18100 and 18255 from observing 66.9.191.0/22 hosts we assume that 18255 is linked to 10023 tcp port.

192.168.242.240: this appears to be a single-purpose server, exposing only one service. The use of TCP port 8443 ("alt-http") suggests it's an HTTP-related node, possibly a load balancer, proxy, or web administration interface for a tomcat server or an appliance.

192.31.243.0/24: these servers only expose HTTPS (TCP 443), indicating a secured web interface dedicated for a device. The absence of insecure HTTP (TCP 80) indicate that the needed communication security is higher than a standard web application which would prioritize accessibility for end users.

66.9.191.0/22: the three servers in this range share the same unusual TCP ports, suggesting they form a redundancy cluster.

192.31.242.0/23: the 15 servers in this range only expose standard web ports (TCP 80 and 443), clearly indicating they are web servers. Let's start by investigating 192.31.242.93, which has a staggering 65,453 open ports.

That's a huge number of open ports, right? Of course, it could be a cyber deception tool like Portspoof , but in that case, all 65,535 TCP ports would appear open. For the purpose of this demonstration, let's perform some active fingerprinting and see what's running on port 443, for example. Tadaaaa: a Palo Alto GlobalProtect authentication page! :) It seems that this server is indeed an important Starlink organization network node. ;) If we investigate further our unusual ports on 66.9.191.0/22 and 192.31.231.94 we can find that those servers are associated with certificates for expired domains and exposing a Forbidden homepage. It is a mystery to solve ;). This footprinting phase has given us a clear picture of how the organization organizes its assets using IP ranges. Now, we have the information we need to identify the most promising targets for finding attack vectors. Step 3: Passive fingerprinting With services like Shodan or Censys, we can do passive fingerprinting, since they conduct the scans, taking responsibility for the activity.

Using Shodan for active scans while still flying under the radar Created in 2009 by John Matherly , Shodan is, as its creator describes it, a 'search engine for internet-connected devices.' Shodan continuously scans all TCP and IPv4 ranges, performing banner grabbing on every service exposed by internet servers. This data is then made searchable for users on shodan.io . Shodan essentially does the TCP scanning and banner grabbing like we do during the fingerprinting stage. However, using Shodan allows us to remain in 'passive' mode while benefiting from the 'active' scanning of a third party.

Using Smap to get Nmap-like output from Shodan Somdev Sangwan (s0mD3v) developed Smap , a tool that replicates Nmap results by using cold data from Shodan.io. Using the output from our BIGF00t.sh script, we now have a list of live hosts in the 'ports/' directory. Let's use a FOR loop to feed this list into Smap:



for ip in $( ls ../ports ); do ./smap -sV $ip ; done Copy