Nmap Development mailing list archives

Improving address exclusion matching

From: Daniel Miller <bonsaiviking () gmail com>
Date: Thu, 1 Nov 2018 00:35:30 -0500

Back in 2012, David posted an interesting message to this list:
https://seclists.org/nmap-dev/2012/q4/420

He suggested that our existing method for checking a candidate address
against a list of excluded addresses was slower than it should be, and that
it was noticeably affecting some users' scan times. I've been thinking
about this off and on since then (6 years doesn't seem all that long at
this age :)), and I recently had a breakthrough that I was able to code up
in just a couple of days.

The previous system works by turning each exclusion specification provided
via --exclude or --excludefile into an array of bit vectors, using 256 bits
to represent all possible values for one octet of an address. Then each
candidate address is checked against each specification linearly until it
matches (the candidate is excluded) or all specifications have been checked.

The new system uses a radix tree, or "trie," data structure, where each
node represents an address prefix. Comparison is done using bitwise
operations to mask off the less-significant portion of the address and
compare only the relevant portion. When a prefix matches, the next bit of
the candidate address after the prefix determines which branch of the tree
to descend. Some branches are dead-ends: the candidate doesn't match and
can be scanned. Others are always-match nodes, caused when an exclusion
specification had a CIDR-style netmask like 192.168.0.0/24. In all, a
worst-case maximum of 128 "comparisons" is done for any given IPv6 address
(32 for IPv4), no matter how many excluded addresses are provided.

This sounds great, but here are the real numbers on performance
improvement: If you're excluding fewer than 1000 addresses or netmasks, you
probably won't notice. I ran some tests using an exclude list of 1000 /24
networks and a target list of 65536 random IP addresses. The new matching
algorithm was about 40% faster, but the old one didn't take much more than
1 second to read and check those addresses.

$ /usr/bin/time ./nmap-new -iL 65k.ips -n -sL --excludefile exclude.txt |
tail
0.53user 0.07system 0:00.61elapsed 99%CPU (0avgtext+0avgdata
12168maxresident)k
0inputs+0outputs (0major+23145minor)pagefaults 0swaps
<snip>
Nmap done: 65533 IP addresses (0 hosts up) scanned in 0.62 seconds

$ /usr/bin/time ./nmap-old -iL 65k.ips -n -sL --excludefile exclude.txt |
tail
0.91user 0.11system 0:01.03elapsed 99%CPU (0avgtext+0avgdata
12220maxresident)k
0inputs+0outputs (0major+23150minor)pagefaults 0swaps
<snip>
Nmap done: 65533 IP addresses (0 hosts up) scanned in 1.03 seconds

I did another more exhaustive scan, using 10000 individual random IP
addresses as an exclude list and "scanning" a /16 network (again, 65536
targets). This time the difference was far more noticeable: the new trie
matching still only took about 0.6 seconds, even with only 77% CPU usage,
but the old one took nearly 6 seconds!

$ /usr/bin/time ./nmap-new 199.141.0.0/16 -n -sL --excludefile exclude.txt
| tail
0.30user 0.16system 0:00.60elapsed 77%CPU (0avgtext+0avgdata
13372maxresident)k
0inputs+0outputs (0major+23406minor)pagefaults 0swaps
<snip>
Nmap done: 65531 IP addresses (0 hosts up) scanned in 0.61 seconds

$ /usr/bin/time ./nmap-old 199.141.0.0/16 -n -sL --excludefile exclude.txt
| tail
5.60user 0.16system 0:05.84elapsed 98%CPU (0avgtext+0avgdata
13584maxresident)k
0inputs+0outputs (0major+23485minor)pagefaults 0swaps
<snip>
Nmap done: 65531 IP addresses (0 hosts up) scanned in 5.84 seconds

Memory use is pretty close to the same for both, with a slight benefit to
the new method.

One compatibility note: the new trie matching is not used with IPv4 octet
ranges or wildcards. To maintain compatibility, those exclude
specifications are still matched using the old system, but I imagine most
users doing large scans with many exclusions are using CIDR notation
instead.

The changes were made in r37496 - 37498, with additional cleanup of headers
and interfaces following in r37499 - 37501.

Happy scanning!

Dan

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

Current thread:

Improving address exclusion matching Daniel Miller (Oct 31)
- Re: Improving address exclusion matching David Fifield (Nov 03)