How to efficiently classify IP addresses in real time using a self made IP DB
Barak Amar and Yael Fraiman

At HUMAN, we process a lot of data that we collect from websites’ users, to classify and analyze the behavior of the users and determine if they are legitimate or not. A key component of any such analysis involves processing the user’s IP address. Knowing more information about the request source IP can have many benefits, from geographic location, service provider, verifying if the IP appeared in bad reputation lists we maintain, and more. As the data we needed to categorize grew and the number of requests went up, we started looking into a better solution than simply iterating over each IP and matching it against a list of networks. We were already familiar with MaxMind as a solution for GeoLocation, when we implemented our own auto-update mechanism and put the legwork into researching the most suitable library for faster lookup.
Some of the benefits of the MaxMind DB file format are:
- The file format holds the same structure used for binary lookup, usually it means that the reader library will map the file into memory and use it directly to look up the IP information. This makes loading and searching the database very fast.
- Multiple reader libraries that read from this file format (different programming languages and multiple implementations)
- Has spec MaxMind DB File Format Specification
This solution allowed us to process a large amount of traffic with a fast lookup and a small memory footprint. Internally we use multiple databases and a small cache layer to eliminate lookups. We use it to store, distribute and efficiently manage IP databases in our network, from IP information that we extract, collect or that our algorithms generate based on traffic we analyze.
We found this process to be extremely efficient, and as it is fairly simple to demonstrate the use of it, we wanted to share the steps we did, by provide a detailed example. In the following example we will take the Amazon AWS IP Address Ranges and:
- Build an application that will collect the data
- Build a database from the data
- Use the database for enrichment purposes
Grabbing the raw IP information
The following code (collect.go) will collect IP ranges information from Amazon and build a CSV with a service and region associated for each range:
<span class="token keyword">package</span> main
<span class="token keyword">import</span> <span class="token punctuation">(</span>
<span class="token string">"encoding/csv"</span>
<span class="token string">"encoding/json"</span>
<span class="token string">"io/ioutil"</span>
<span class="token string">"net/http"</span>
<span class="token string">"os"</span>
<span class="token punctuation">)</span>
<span class="token keyword">func</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token comment">// request amazon ip ranges data</span>
res<span class="token punctuation">,</span> <span class="token boolean">_</span> <span class="token operator">:=</span> http<span class="token punctuation">.</span><span class="token function">Get</span><span class="token punctuation">(</span><span class="token string">"https://ip-ranges.amazonaws.com/ip-ranges.json"</span><span class="token punctuation">)</span>
<span class="token keyword">defer</span> res<span class="token punctuation">.</span>Body<span class="token punctuation">.</span><span class="token function">Close</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
body<span class="token punctuation">,</span> <span class="token boolean">_</span> <span class="token operator">:=</span> ioutil<span class="token punctuation">.</span><span class="token function">ReadAll</span><span class="token punctuation">(</span>res<span class="token punctuation">.</span>Body<span class="token punctuation">)</span>
<span class="token comment">// parse json information</span>
<span class="token keyword">var</span> aws <span class="token keyword">struct</span> <span class="token punctuation">{</span>
SyncToken <span class="token builtin">string</span> <span class="token string">`json:"syncToken"`</span>
CreateDate <span class="token builtin">string</span> <span class="token string">`json:"createDate"`</span>
Prefixes <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token keyword">struct</span> <span class="token punctuation">{</span>
IPPrefix <span class="token builtin">string</span> <span class="token string">`json:"ip_prefix"`</span>
Region <span class="token builtin">string</span> <span class="token string">`json:"region"`</span>
Service <span class="token builtin">string</span> <span class="token string">`json:"service"`</span>
<span class="token punctuation">}</span> <span class="token string">`json:"prefixes"`</span>
<span class="token punctuation">}</span>
json<span class="token punctuation">.</span><span class="token function">Unmarshal</span><span class="token punctuation">(</span>body<span class="token punctuation">,</span> <span class="token operator">&</span>aws<span class="token punctuation">)</span>
<span class="token comment">// ouput as csv</span>
writer <span class="token operator">:=</span> csv<span class="token punctuation">.</span><span class="token function">NewWriter</span><span class="token punctuation">(</span>os<span class="token punctuation">.</span>Stdout<span class="token punctuation">)</span>
<span class="token keyword">defer</span> writer<span class="token punctuation">.</span><span class="token function">Flush</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
writer<span class="token punctuation">.</span><span class="token function">Write</span><span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">string</span><span class="token punctuation">{</span><span class="token string">"service"</span><span class="token punctuation">,</span> <span class="token string">"region"</span><span class="token punctuation">,</span> <span class="token string">"network"</span><span class="token punctuation">}</span><span class="token punctuation">)</span>
<span class="token keyword">for</span> <span class="token boolean">_</span><span class="token punctuation">,</span> p <span class="token operator">:=</span> <span class="token keyword">range</span> aws<span class="token punctuation">.</span>Prefixes <span class="token punctuation">{</span>
writer<span class="token punctuation">.</span><span class="token function">Write</span><span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">string</span><span class="token punctuation">{</span>p<span class="token punctuation">.</span>Service<span class="token punctuation">,</span> p<span class="token punctuation">.</span>Region<span class="token punctuation">,</span> p<span class="token punctuation">.</span>IPPrefix<span class="token punctuation">}</span><span class="token punctuation">)</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
We will also provide a short Dockerfile to use the code above:
<span class="token instruction"><span class="token keyword">FROM</span> golang:1-alpine</span>
<span class="token instruction"><span class="token keyword">RUN</span> mkdir /app</span>
<span class="token instruction"><span class="token keyword">WORKDIR</span> /app</span>
<span class="token instruction"><span class="token keyword">COPY</span> collect.go ./</span>
<span class="token instruction"><span class="token keyword">RUN</span> go build</span>
<span class="token instruction"><span class="token keyword">ENTRYPOINT</span> [<span class="token string">"/app/app"</span>]</span>
Creating IP DB for fast lookup
Using the collected information, the following code will read the CSV and build a MaxMindDB file populated with the IP classification information:
<span class="token comment">#!/usr/bin/env perl</span>
<span class="token keyword">use</span> strict<span class="token punctuation">;</span>
<span class="token keyword">use</span> warnings<span class="token punctuation">;</span>
<span class="token keyword">use</span> Text<span class="token punctuation">:</span><span class="token punctuation">:</span>CSV_XS<span class="token punctuation">;</span>
<span class="token keyword">use</span> Net<span class="token punctuation">:</span><span class="token punctuation">:</span>Works<span class="token punctuation">:</span><span class="token punctuation">:</span>Network<span class="token punctuation">;</span>
<span class="token keyword">use</span> MaxMind<span class="token punctuation">:</span><span class="token punctuation">:</span>DB<span class="token punctuation">:</span><span class="token punctuation">:</span>Writer<span class="token punctuation">:</span><span class="token punctuation">:</span>Tree<span class="token punctuation">;</span>
<span class="token keyword">my</span> <span class="token variable">%types</span> <span class="token operator">=</span> <span class="token punctuation">(</span>
service <span class="token operator">=></span> <span class="token string">'utf8_string'</span><span class="token punctuation">,</span>
region <span class="token operator">=></span> <span class="token string">'utf8_string'</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">my</span> <span class="token variable">$tree</span> <span class="token operator">=</span> MaxMind<span class="token punctuation">:</span><span class="token punctuation">:</span>DB<span class="token punctuation">:</span><span class="token punctuation">:</span>Writer<span class="token punctuation">:</span><span class="token punctuation">:</span>Tree<span class="token operator">-></span>new<span class="token punctuation">(</span>
database_type <span class="token operator">=></span> <span class="token string">'Feed-IP-Data'</span><span class="token punctuation">,</span>
description <span class="token operator">=></span> <span class="token punctuation">{</span> en <span class="token operator">=></span> <span class="token string">'Amazon IP data'</span> <span class="token punctuation">}</span><span class="token punctuation">,</span>
ip_version <span class="token operator">=></span> <span class="token number">6</span><span class="token punctuation">,</span>
map_key_type_callback <span class="token operator">=></span> <span class="token keyword">sub</span> <span class="token punctuation">{</span> <span class="token variable">$types</span><span class="token punctuation">{</span> <span class="token variable">$_</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span><span class="token punctuation">,</span>
record_size <span class="token operator">=></span> <span class="token number">24</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">my</span> <span class="token variable">$file</span> <span class="token operator">=</span> <span class="token variable">$ARGV</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">or</span> <span class="token keyword">die</span> <span class="token string">"Need to get CSV file on the command linen"</span><span class="token punctuation">;</span>
<span class="token keyword">print</span> <span class="token string">"==> "</span><span class="token punctuation">,</span> <span class="token variable">$file</span><span class="token punctuation">,</span> <span class="token string">"n"</span><span class="token punctuation">;</span>
open<span class="token punctuation">(</span><span class="token keyword">my</span> <span class="token variable">$fh</span><span class="token punctuation">,</span> <span class="token string">"<"</span><span class="token punctuation">,</span> <span class="token variable">$file</span><span class="token punctuation">)</span> <span class="token operator">or</span> <span class="token keyword">die</span> <span class="token string">"$file: $!"</span><span class="token punctuation">;</span>
<span class="token keyword">my</span> <span class="token variable">$csv</span> <span class="token operator">=</span> Text<span class="token punctuation">:</span><span class="token punctuation">:</span>CSV_XS<span class="token operator">-></span>new<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token variable">$csv</span><span class="token operator">-></span>column_names<span class="token punctuation">(</span><span class="token variable">$csv</span><span class="token operator">-></span>getline<span class="token punctuation">(</span><span class="token variable">$fh</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">while</span> <span class="token punctuation">(</span><span class="token keyword">my</span> <span class="token variable">$row</span> <span class="token operator">=</span> <span class="token variable">$csv</span><span class="token operator">-></span>getline<span class="token punctuation">(</span><span class="token variable">$fh</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token keyword">my</span> <span class="token variable">$network</span> <span class="token operator">=</span> Net<span class="token punctuation">:</span><span class="token punctuation">:</span>Works<span class="token punctuation">:</span><span class="token punctuation">:</span>Network<span class="token operator">-></span>new_from_string<span class="token punctuation">(</span> string <span class="token operator">=></span> <span class="token variable">$row</span><span class="token operator">-></span><span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">my</span> <span class="token variable">$metadata</span> <span class="token operator">=</span> <span class="token punctuation">{</span> service <span class="token operator">=></span> <span class="token variable">$row</span><span class="token operator">-></span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> region <span class="token operator">=></span> <span class="token variable">$row</span><span class="token operator">-></span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token punctuation">}</span><span class="token punctuation">;</span>
<span class="token variable">$tree</span><span class="token operator">-></span>insert_network<span class="token punctuation">(</span><span class="token variable">$network</span><span class="token punctuation">,</span> <span class="token variable">$metadata</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
close <span class="token variable">$fh</span><span class="token punctuation">;</span>
<span class="token keyword">my</span> <span class="token variable">$filename</span> <span class="token operator">=</span> <span class="token variable">$ARGV</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">or</span> <span class="token keyword">die</span> <span class="token string">"Need to get mmdb file on the command linen"</span><span class="token punctuation">;</span>
open <span class="token keyword">my</span> <span class="token variable">$ofh</span><span class="token punctuation">,</span> <span class="token string">'>:raw'</span><span class="token punctuation">,</span> <span class="token variable">$filename</span><span class="token punctuation">;</span>
<span class="token variable">$tree</span><span class="token operator">-></span>write_tree<span class="token punctuation">(</span> <span class="token variable">$ofh</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>
close <span class="token variable">$ofh</span><span class="token punctuation">;</span>
<span class="token keyword">print</span> <span class="token string">"$filename createdn"</span><span class="token punctuation">;</span>
And again, a Dockerfile to run the build database script
<span class="token instruction"><span class="token keyword">FROM</span> perl:5</span>
<span class="token instruction"><span class="token keyword">RUN</span> cpanm --notest --skip-satisfied MaxMind::DB::Writer Text::CSV_XS</span>
<span class="token instruction"><span class="token keyword">RUN</span> mkdir /out /app</span>
<span class="token instruction"><span class="token keyword">WORKDIR</span> /app</span>
<span class="token instruction"><span class="token keyword">COPY</span> build.pl ./</span>
<span class="token instruction"><span class="token keyword">VOLUME</span> <span class="token string">"/out"</span></span>
<span class="token instruction"><span class="token keyword">ENTRYPOINT</span> [<span class="token string">"perl"</span>, <span class="token string">"build.pl"</span>]</span>
IP classification as middleware
In the last step, here is an example of a small http server that will use the database to enrich the requests by placing the AWS service and region in dedicated HTTP headers
<span class="token keyword">package</span> main
<span class="token keyword">import</span> <span class="token punctuation">(</span>
<span class="token string">"fmt"</span>
<span class="token string">"log"</span>
<span class="token string">"net"</span>
<span class="token string">"net/http"</span>
maxminddb <span class="token string">"github.com/oschwald/maxminddb-golang"</span>
<span class="token punctuation">)</span>
<span class="token comment">// ClassificationRecord IP classification record</span>
<span class="token keyword">type</span> ClassificationRecord <span class="token keyword">struct</span> <span class="token punctuation">{</span>
Service <span class="token builtin">string</span> <span class="token string">`maxminddb:"service"`</span>
Region <span class="token builtin">string</span> <span class="token string">`maxminddb:"region"`</span>
<span class="token punctuation">}</span>
<span class="token comment">/// extracting the real user's IP address, assuming it is on a dedicated HTTP header: X-REAL-IP</span>
<span class="token keyword">func</span> <span class="token function">getRealIP</span><span class="token punctuation">(</span>r <span class="token operator">*</span>http<span class="token punctuation">.</span>Request<span class="token punctuation">)</span> <span class="token builtin">string</span> <span class="token punctuation">{</span>
rip <span class="token operator">:=</span> r<span class="token punctuation">.</span>Header<span class="token punctuation">.</span><span class="token function">Get</span><span class="token punctuation">(</span><span class="token string">"X-REAL-IP"</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> rip <span class="token operator">==</span> <span class="token string">""</span> <span class="token punctuation">{</span>
rip <span class="token operator">=</span> r<span class="token punctuation">.</span>RemoteAddr
<span class="token punctuation">}</span>
<span class="token keyword">return</span> rip
<span class="token punctuation">}</span>
<span class="token comment">// enrichClassification enrich request with classification information on headers</span>
<span class="token keyword">func</span> <span class="token function">enrichClassification</span><span class="token punctuation">(</span>reader <span class="token operator">*</span>maxminddb<span class="token punctuation">.</span>Reader<span class="token punctuation">,</span> next http<span class="token punctuation">.</span>HandlerFunc<span class="token punctuation">)</span> http<span class="token punctuation">.</span>HandlerFunc <span class="token punctuation">{</span>
<span class="token keyword">return</span> <span class="token keyword">func</span><span class="token punctuation">(</span>w http<span class="token punctuation">.</span>ResponseWriter<span class="token punctuation">,</span> r <span class="token operator">*</span>http<span class="token punctuation">.</span>Request<span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token comment">// log request real ip</span>
rip <span class="token operator">:=</span> <span class="token function">getRealIP</span><span class="token punctuation">(</span>r<span class="token punctuation">)</span>
log<span class="token punctuation">.</span><span class="token function">Printf</span><span class="token punctuation">(</span><span class="token string">"Request from %s"</span><span class="token punctuation">,</span> rip<span class="token punctuation">)</span>
<span class="token comment">// parse ip and enrich classification information</span>
addr <span class="token operator">:=</span> net<span class="token punctuation">.</span><span class="token function">ParseIP</span><span class="token punctuation">(</span>rip<span class="token punctuation">)</span>
<span class="token keyword">if</span> addr <span class="token operator">!=</span> <span class="token boolean">nil</span> <span class="token punctuation">{</span>
<span class="token keyword">var</span> record ClassificationRecord
<span class="token boolean">_</span> <span class="token operator">=</span> reader<span class="token punctuation">.</span><span class="token function">Lookup</span><span class="token punctuation">(</span>addr<span class="token punctuation">,</span> <span class="token operator">&</span>record<span class="token punctuation">)</span>
<span class="token keyword">if</span> record<span class="token punctuation">.</span>Service <span class="token operator">!=</span> <span class="token string">""</span> <span class="token punctuation">{</span>
r<span class="token punctuation">.</span>Header<span class="token punctuation">.</span><span class="token function">Set</span><span class="token punctuation">(</span><span class="token string">"X-IP-SERVICE"</span><span class="token punctuation">,</span> record<span class="token punctuation">.</span>Service<span class="token punctuation">)</span>
<span class="token punctuation">}</span>
<span class="token keyword">if</span> record<span class="token punctuation">.</span>Region <span class="token operator">!=</span> <span class="token string">""</span> <span class="token punctuation">{</span>
r<span class="token punctuation">.</span>Header<span class="token punctuation">.</span><span class="token function">Set</span><span class="token punctuation">(</span><span class="token string">"X-IP-REGION"</span><span class="token punctuation">,</span> record<span class="token punctuation">.</span>Region<span class="token punctuation">)</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
next<span class="token punctuation">.</span><span class="token function">ServeHTTP</span><span class="token punctuation">(</span>w<span class="token punctuation">,</span> r<span class="token punctuation">)</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
<span class="token keyword">func</span> <span class="token function">index</span><span class="token punctuation">(</span>w http<span class="token punctuation">.</span>ResponseWriter<span class="token punctuation">,</span> r <span class="token operator">*</span>http<span class="token punctuation">.</span>Request<span class="token punctuation">)</span> <span class="token punctuation">{</span>
fmt<span class="token punctuation">.</span><span class="token function">Fprintf</span><span class="token punctuation">(</span>w<span class="token punctuation">,</span> <span class="token string">"What would life be if we had no courage to attempt anything?n"</span><span class="token punctuation">)</span>
<span class="token comment">// use enriched information if found on request</span>
<span class="token keyword">if</span> service <span class="token operator">:=</span> r<span class="token punctuation">.</span>Header<span class="token punctuation">.</span><span class="token function">Get</span><span class="token punctuation">(</span><span class="token string">"X-IP-SERVICE"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> service <span class="token operator">!=</span> <span class="token string">""</span> <span class="token punctuation">{</span>
fmt<span class="token punctuation">.</span><span class="token function">Fprintf</span><span class="token punctuation">(</span>w<span class="token punctuation">,</span> <span class="token string">"Known service %sn"</span><span class="token punctuation">,</span> service<span class="token punctuation">)</span>
<span class="token punctuation">}</span>
<span class="token keyword">if</span> region <span class="token operator">:=</span> r<span class="token punctuation">.</span>Header<span class="token punctuation">.</span><span class="token function">Get</span><span class="token punctuation">(</span><span class="token string">"X-IP-REGION"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> region <span class="token operator">!=</span> <span class="token string">""</span> <span class="token punctuation">{</span>
fmt<span class="token punctuation">.</span><span class="token function">Fprintf</span><span class="token punctuation">(</span>w<span class="token punctuation">,</span> <span class="token string">"Known region %sn"</span><span class="token punctuation">,</span> region<span class="token punctuation">)</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
<span class="token keyword">func</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token comment">// load mmdb with classifications</span>
mmdb<span class="token punctuation">,</span> err <span class="token operator">:=</span> maxminddb<span class="token punctuation">.</span><span class="token function">Open</span><span class="token punctuation">(</span><span class="token string">"feed.mmdb"</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> err <span class="token operator">!=</span> <span class="token boolean">nil</span> <span class="token punctuation">{</span>
log<span class="token punctuation">.</span><span class="token function">Fatalln</span><span class="token punctuation">(</span>err<span class="token punctuation">)</span>
<span class="token punctuation">}</span>
<span class="token keyword">defer</span> mmdb<span class="token punctuation">.</span><span class="token function">Close</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token comment">// register routes and serve requests</span>
http<span class="token punctuation">.</span><span class="token function">Handle</span><span class="token punctuation">(</span><span class="token string">"/"</span><span class="token punctuation">,</span> <span class="token function">enrichClassification</span><span class="token punctuation">(</span>mmdb<span class="token punctuation">,</span> index<span class="token punctuation">)</span><span class="token punctuation">)</span>
log<span class="token punctuation">.</span><span class="token function">Fatalln</span><span class="token punctuation">(</span>http<span class="token punctuation">.</span><span class="token function">ListenAndServe</span><span class="token punctuation">(</span><span class="token string">":8080"</span><span class="token punctuation">,</span> <span class="token boolean">nil</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token punctuation">}</span>
Dockerfile to run the code above. For simplicity we embed the ‘feed.mmdb’ database file – but we can use it from a shared volume or download/update it using the server itself.
<span class="token instruction"><span class="token keyword">FROM</span> golang:1-alpine</span>
<span class="token instruction"><span class="token keyword">RUN</span> apk add --update git</span>
<span class="token instruction"><span class="token keyword">RUN</span> mkdir /app</span>
<span class="token instruction"><span class="token keyword">WORKDIR</span> /app</span>
<span class="token instruction"><span class="token keyword">COPY</span> use.go feed.mmdb ./</span>
<span class="token instruction"><span class="token keyword">RUN</span> go get -d . && go build</span>
<span class="token instruction"><span class="token keyword">EXPOSE</span> 8080</span>
<span class="token instruction"><span class="token keyword">CMD</span> [<span class="token string">"/app/app"</span>]</span>
Now we have a fast and simple solution that you can extend upon and use in order to classify your traffic based on the client IP.