Beyond passive and active - a modern approach to network reconnaissance

No matter your pentesting experience, you’ve heard this countless times:

Reconnaissance is the most important stage of your attack methodology.

This sentence has deep implications in our pentesting work. It gives us hints on how to manage our engagement time, gives us the fallback step when we can’t find anything, and reminds us we need to understand our target to achieve impact on it.

Without mapping your target’s attack surface to find attack vectors specific to that target, you can’t go further:

If you don’t find any vulnerabilities, it’s because your recon was not thorough enough or deep enough, so it’s time to get back to it again.

Remember the meaning of “to hack” and you get the point. ;)

Sun Tzu already explained this essential fact centuries ago:

Why is recon so important?

Well, it’s important for at least three good reasons:

Getting a strategic attack surface view

Well done reconnaissance allows ethical hackers to develop a comprehensive, strategic view of the target organization or assets.

This gives us a good lay of the land, shows us potential avenues (and where we'll hit roadblocks), and helps us use our pentesting experience and intuition to pinpoint where we can make the biggest impact.

Focusing effort for maximum impact

As pentesters or red teamers, we're always short on time and our focus has limits, no matter how driven we are. That's why having a detailed map of the target's attack surface and potential attack paths is crucial.

It lets us concentrate our efforts on what matters to deliver real impact. We avoid wasting time on dead-end exploits, pointless brute-forcing, or falling into a rabbit hole because we missed a key attack vector.

Keeping OPSEC

When conducting reconnaissance, another parameter to consider is to maintain OPSEC (operations security), especially when conducting a red team engagement.

Indeed, we don’t want our IPs being blacklisted for too much target probing or the blue team to detect our attack infrastructure when we’re on the very first step of our cyber killchain:

Why the standard approach to network recon is outdated

The pentesting framework methodologies (OSCP, CEH, PTES, etc.) usually divide recon into passive and active steps, with most activities falling behind the “footprinting” concept.

Passive footprinting means we gather information without directly interacting with the target, so we don't get detected or leave any log entries.

Instead, we rely on any third party OSINT source (Google dorks, Linkedin jobs offer, TLS certificates records, and the like) to get info on our target.

Active footprinting involves direct interaction with target assets, creating network traffic and leaving traces in logs and defensive equipment, clearly indicating we're preparing for an attack.

For this will use Nmap services scans, web component scanning, CMS scanners, and other vulnerability scanners fit to our scope.

This may be common knowledge and how most people learn recon, but nowadays those definitions are a little outdated (the last update to the PTES framework dates from 2017).

Things have changed. We can now do “passive fingerprinting” by querying services like Shodan and Censys, which scan the internet for us. Also, 'active' TCP SYN scans, once a clear sign of attack preparation, are now so common that they're no longer a reliable indicator.

So, just labeling reconnaissance as 'active' or 'passive' isn't accurate anymore. These changes mean we need to update our reconnaissance methods and go back to clearly separating footprinting and fingerprinting, with both 'passive' and 'active' steps within each.

This is what we’ll explore in depth in this guide. For the moment let’s just acknowledge this is how things work now.

A more accurate definition of footprinting and fingerprinting

I prefer to make this distinction between footprinting and fingerprinting:

FOOTPRINTING involves discovering the target organization's exposed attack surface through passive and stealthy active techniques - actions that don't alert the target to potential attacks.

The goal is to gather enough information to get a broad understanding of their infrastructure and identify valuable targets on the attack surface, for further investigation,without alerting the target of the attack preparation.

FINGERPRINTING identifies specific attack vectors through passive steps and also by actively probing the target to determine the versions and types of its exposed services and applications. Essentially, it reveals which potentially vulnerable services are worth trying to exploit.

At this stage of our attack process, we accept the risk of being detected and alert our target blue team. We'll interact directly with the organization's systems, using tools and methods that clearly indicate we're preparing for an attack.

When the scope is small, it's easy to combine footprinting and fingerprinting because the amount of data is manageable. But for large targets, such as ISPs or multinational companies with thousands of exposed systems, it's essential to keep them distinct. This is especially true in red team engagements (like those following TIBER EU guidelines) or wildcard bug bounty programs, where the scope is inherently massive.

A bug bounty case study: recon an organization network exposed infrastructure

To show how important it is to have a strong reconnaissance methodology, we'll perform footprinting and fingerprinting steps on SpaceX/Starlink, which has a large-scale bug bounty program on Bugcrowd.

Let's go!

Step 1: Passive network footprinting

Starlink's bug bounty program on Bugcrowd covers a wide range of targets, including a large number of IP addresses.

Figure out the target’s ASN

We'll begin with this IP list, and our first goal is to find Starlink's Autonomous System Number (ASN).

Note the IPs excluded from scope in a file named ‘exclude.txt’, we will then use it with MASSCAN.

What is an ASN?

An Autonomous System Number (ASN) is assigned to each organization that manages its own range of internet addresses (IPv4). These Autonomous Systems are interconnected via the BGP protocol. This is how the internet functions: a vast network of routing paths connecting these individual systems.

Dig through WHOIS IP data

You're most likely familiar with WHOIS, which stores ownership details for domain names. But did you know there's a similar registry for IP addresses?

This IP registry will tell us:

The IP range that a specific IP belongs to
The organization that owns that range
The Autonomous System Number (ASN) associated with that range.

This is exactly the information we need to map out all the IP ranges belonging to an organization. Keep in mind, some organizations might have multiple ASNs.

To get started, let's grab one of the IP addresses listed on Bugcrowd, like 192.31.242.1.

By looking at WHOIS data, we can see that SpaceX has its own AS number: 27277.

Determine the target’s ASN ranges

Now let's look for all the IP ranges of the target organization.

To do this, we will query the WHOIS RADB or Internet Routing Registry.

But first let’s see what the RADb actually is.

The Routing Asset Database (RADb)

Example: BGP routes visualization

The Routing Assets Database (RADb), formerly known as the Routing Arbiter Database is a public database in which the operators of Internet networks publish authoritative declarations of routing policy for their Autonomous System (AS) which are, in turn, used by the operators of other Internet networks to configure their inbound routing policy filters.
The RADb, operated by the University of Michigan's Merit Network, was the first such database, but others followed in its wake, forming a loose confederation of Internet routing registries, containing sometimes-overlapping, and sometimes-conflicting,[1] routing policy data, expressed in Routing Policy Specification Language (RPSL) syntax.

Source: Wikipedia

Using a reverse Whois query

For this we are making a reverse WHOIS request to look for entries that correspond to our AS number in the RADb.

whois -h whois.radb.net -- '-i origin AS27277' | grep -Eo "([0-9.]+){4}/[0-9]+" | head

Command breakdown:

whois → This is the WHOIS command used to query databases for information on domains, IP addresses, and autonomous systems (AS).

-h whois.radb.net → Specifies that the WHOIS query should be sent to the whois.radb.net server.

-- → This separates the command options from the whois query itself. It ensures that -i is treated as a flag to whois request rather than an option to the whois command itself.

'-i origin AS27277' → This performs an inverse lookup for the given Autonomous System Number (ASN) (AS27277).

The -i origin flag tells the WHOIS server to search for IP prefixes (routes) announced by AS27277.

We now have the list of IP ranges that belong to SpaceX.

Redirect the output of this command or copy and paste it in a file named asn_ranges.txt

Let's see what we can get from it!

Step 2: Active network footprinting

Now, we'll use Robert Graham's excellent tool, MASSCAN, to perform a TCP SYN scan on every IP address within SpaceX's IP ranges.

I know some of you are thinking, 'Wait a minute, this isn't footprinting! You're directly interacting with their systems!' And you're right.

However…

Hiding into the Internet noise

Typically, a TCP SYN scan is considered a fingerprinting step (or, as some call it, Active footprinting). But let's hear what Andrew Morris, the GreyNoise founder, says about it:

There are three main types of actors and activity: known benign mass scanners such as Shodan, Censys, and Sonar; malicious mass scanners such as Trojans, worms, and botnets; and unknown mass scanners, which is everything else.
Most scanning activity falls into this unknown category.
Of the millions of IP addresses scanning the Internet, 27 IP addresses are associated with Shodan, 334 IPs with Censys, 56 for Sonar, 145 for NetCraft, 228 for Shadowserver, and 253 IPs for BinaryEdge. In comparison, Grey Noise has tracked 249,000 IP addresses associated with Mirai botnet, 92,000 for SSH worms, and 590,000 compromised residential routers attacking other people. As for the unknown bucket, it isn't always clear what those scanners are doing.

Source: Mapping the Internet, Who’s Who? (Part Three) By Fahmida Y. Rashid

Source: Radware TCP scanners activity visualization

Thousands of IP addresses scan the internet every day for countless purposes, many of which are legitimate. While TCP SYN scans get logged, organizations simply don't have the resources to distinguish between actual attacks and normal internet traffic. That said, these logs are crucial for attribution after a successful attack.

Active footprinting with MASSCAN

For this step, we're going to use MASSCAN to scan every IP range we identified. To do this:

Check your asn_ranges.txt file is in the current directory.
Populate your exclusion file with out of scope IPs from BugCrowd scope.
Use a FOR loop to execute MASSCAN on each IP range listed in asn_ranges.txt.
Direct the output of each MASSCAN scan to a single file named largescan.dat.

Note: The file name largescan.dat is crucial for compatibility with the BIGF00t.sh script we'll use later.

for ip in $(cat asnranges.txt);do masscan -p 1-65535 --rate 10000 $ip --excludefile exclude.txt >> largescan.dat;done

Then go for a walk. A long walk. Masscan will probe 218100480 TCP ports ;).

About the Masscan rate, remember two things:

Masscan uses a fixed packet sending frequency, unlike Nmap, which adapts its scanning speed based on its network environment. Masscan's speed can have undesirable effects if run from within a corporate network.

It is therefore recommended to run Masscan from a VPS directly exposed to the Internet.

Using a high scanning speed results in a degradation in the quality of the results, as some open ports may appear closed.

Here we set it at 10000p/second, which is stil more faster than Nmap “aggressive mode” and should avoid false negative results.

Using the BIGF00t.sh footprinting script

I developed BIGF00T.sh while red teaming RECON on an entire ISP, so please consider it as a PoC.

I needed a quick and easy way to process and triage the MASSCAN output from scanning hundreds of thousand of IPs looking for specific services exposed.

While other solutions exist, I wanted something that relied solely on shell scripting, without the complexity of setting up a web server or other infrastructure.

When scanning a large organization’s exposed network, you can find interesting targets by looking at the Top Port Rank list from BIGF00t.sh.

Indeed, iIt's true that critical network nodes within an organization often expose a different number of TCP ports compared to regular servers. Plus, IP ranges are frequently organized logically, grouping server types based on their function within the organization.

You can see this pattern in our Top Port Ranking list from BIGF00t.sh:

We've identified four distinct asset groups within AS27277:

1 server with a lot of open ports: 192.31.242.93
1 server with 4 unusual ports open (1935,10023,18100,18255): 192.31.242.94
1 server with only TCP 8443 open: 192.168.242.240
3 servers on the range 66.9.191.0/22 with the same two unusual ports open: 10022, 18255
3 servers on the range 192.31.243.0/24 with only TCP 443 open.
15 servers on the range 192.31.242.0/23 with only web TCP ports open - 443 and 80.

By looking at how the open ports are distributed, we can see the IP addresses are logically organized based on server function. They're also following security best practices by only exposing necessary services and their corresponding TCP ports.

This allows us to develop several hypotheses:

192.31.242.93: this server likely handles core infrastructure, as it has a large number of services exposed.
192.31.242.94: this is also probably a core infrastructure server, but with a limited set of specialized and unusual TCP ports.2 ports exposes a webserver on 18100 and 18255 from observing 66.9.191.0/22 hosts we assume that 18255 is linked to 10023 tcp port.
192.168.242.240: this appears to be a single-purpose server, exposing only one service. The use of TCP port 8443 ("alt-http") suggests it's an HTTP-related node, possibly a load balancer, proxy, or web administration interface for a tomcat server or an appliance.
192.31.243.0/24: these servers only expose HTTPS (TCP 443), indicating a secured web interface dedicated for a device. The absence of insecure HTTP (TCP 80) indicate that the needed communication security is higher than a standard web application which would prioritize accessibility for end users.
66.9.191.0/22: the three servers in this range share the same unusual TCP ports, suggesting they form a redundancy cluster.
192.31.242.0/23: the 15 servers in this range only expose standard web ports (TCP 80 and 443), clearly indicating they are web servers.

Let's start by investigating 192.31.242.93, which has a staggering 65,453 open ports.

That's a huge number of open ports, right?

Of course, it could be a cyber deception tool like Portspoof, but in that case, all 65,535 TCP ports would appear open.

For the purpose of this demonstration, let's perform some active fingerprinting and see what's running on port 443, for example.

Tadaaaa: a Palo Alto GlobalProtect authentication page! :)

It seems that this server is indeed an important Starlink organization network node. ;)

If we investigate further our unusual ports on 66.9.191.0/22 and 192.31.231.94 we can find that those servers are associated with certificates for expired domains and exposing a Forbidden homepage.

It is a mystery to solve ;).

This footprinting phase has given us a clear picture of how the organization organizes its assets using IP ranges.

Now, we have the information we need to identify the most promising targets for finding attack vectors.

Step 3: Passive fingerprinting

With services like Shodan or Censys, we can do passive fingerprinting, since they conduct the scans, taking responsibility for the activity.

Using Shodan for active scans while still flying under the radar

Created in 2009 by John Matherly, Shodan is, as its creator describes it, a 'search engine for internet-connected devices.'

Shodan continuously scans all TCP and IPv4 ranges, performing banner grabbing on every service exposed by internet servers.

This data is then made searchable for users on shodan.io.

Shodan essentially does the TCP scanning and banner grabbing like we do during the fingerprinting stage. However, using Shodan allows us to remain in 'passive' mode while benefiting from the 'active' scanning of a third party.

Using Smap to get Nmap-like output from Shodan

Somdev Sangwan (s0mD3v) developed Smap, a tool that replicates Nmap results by using cold data from Shodan.io.

Using the output from our BIGF00t.sh script, we now have a list of live hosts in the 'ports/' directory.

Let's use a FOR loop to feed this list into Smap:

for ip in $(ls ../ports);do ./smap -sV $ip;done

While the results here aren't groundbreaking, this is definitely a tool worth adding to your reconnaissance toolkit.

Step 4: Active fingerprinting

Let's t upload the list of live hosts from BIGF00t.sh in file “DATA\alive.list” as a target list in Pentest-tools.com:

Using the Port Scanner to maintain OPSEC

To maintain OPSEC by reducing our attack noise, we'll only thoroughly service scan the open ports we discovered with MASSCAN.

We'll find these ports in the '/ports' directory created by BIGF00T.sh, and we'll group them by IP/IP range based on our 'Top Port Ranking list' analysis.

Then we’ll go back to the Target section on Pentest-Tools.com and select all those IPs.

Then choose the Port Scanner from the list of tools:

Configure the Port Scanner this way to get the data you need:

Now go grab a coffee while Pentest-tools.com finishes all the scans.

Using the Attack Surface map on Pentest-tools.com

We can use the Attack Surface mapping feature on Pentest-Tools.com to get a good look at our fingerprinted targets.

What's really neat is how you can sort everything – by IP, hostname, protocol, ports, services, you name it.

For instance, let's check out our web servers using the 'Protocol' sorting filter.

We can see that the organization uses 3 different web server types:

Apache HTTP Server
IIS HTTP Server
Nginx HTTP server.

By sorting by 'Technology,' we can see that Apache servers are specifically used on the network infrastructure nodes we previously identified: 66.9.191.244, 66.9.191.245, 66.9.191.246, and 192.31.242.94.

It is interesting to note that 192.31.242.94 shares the same port as the server from the other range 66.9.191.0/22.

Looking at the mapped Attack Surface, sorted by IPs, reveals that these Apache web servers are running on the non-standard HTTP port 18255.

What’s more, they all expose either port 10022 or 10023, which are also unknown.

This leads us to believe that these Apache servers are web UIs for a service we haven't identified yet, operating on port 10022 or 10023.

How to extract strategic and tactical insights from limited exposure

Here we faced a cybersecurity aware and mature organization that’s being careful about the technical info they expose. Still, we can infer many useful things for our understanding of their exposed infrastructure.

Our footprinting and fingerprinting methodology gave us a strategic and tactical overview of our targeted organization from which we can deduce the organization’s level of security maturity and potential weak points in their infrastructure.

Depending on the type of engagement you are conducting (bug bounty or red team), you can use this intel at a strategic or tactical level for your project and profit.

Here are a few examples.

The strategic view

You get this by:

Analyzing the logical distribution of servers through IP ranges, so you can concentrate your attacks on hosts serving the purpose of the impact you seek to have.
Should you discover a vulnerability on an asset grouped in the same IP range, you can assume the vulnerability will also affect similar, neighbouring assets.
Discovering inconsistencies in how technologies are distributed or a completely heterogeneous use of infrastructure or applications components gives you insight into the security maturity of the targeted organization.

While this isn’t the case in our practical example, let’s imagine this: you find a group of IP range neighbours exposing the same ports and services, but one of the servers exposes a lesser version than the other three.

You can deduce from the fact that this server stands out may be a sign admins or developers are neglecting it, making it a valuable target to explore deeper.

Is it really an unmaintained asset waiting for you to exploit it or maybe just a honeypot?

The tactical view

From the list of component technologies and, if we’re lucky, their versions, we can tailor our wordlists, payloads, exploits, and plugin arsenal specifically for each of them.

Our next attacks will be faster and more efficient as they will be focused on the actual technologies the target organization uses and exposes.

So what’s next?

The next step would be for sure to pick some assets to target and then either find or craft exploits for the vulnerable services you discovered - or go further to find more attack vectors doing web application reconnaissance on every web server you detected (even on uncommon HTTP ports).

The art of reconnaissance

Reconnaissance is truly an art form that relies on our individual creativity to gather information and draw insights from a multitude of open-source intelligence (OSINT) sources.

Many prominent cybersecurity influencers consistently insist on its importance and offer valuable resources to refine your own recon craft.

For excellent content on this topic, look to people like Nahamsec or Jason Haddix.

However, beyond the essential tips and tricks, a structured, consistent, and methodical approach is key. This simplifies your engagement and makes your attack process more transparent for your clients.

While this guide has focused on footprinting and fingerprinting exposed network infrastructure, pentesting frameworks like PTES offer a comprehensive and detailed methodology for a broader-scope reconnaissance of a targeted organization. It is a must read!

Honestly, reconnaissance is an awesome part of the job.

I love getting to the point where I feel like I know their network infrastructure as well as (or even better than) their own admins. That's where you start to build your attack plan.

And when you're focusing on a specific asset, stepping back to see the bigger picture really helps you understand the landscape and fine-tune your approach.