Writing an IP Address Information Webservice in Ruby

or Writing an XML-RPC Webservice using Ruby and MySQL that can be used to determine Country Information from IP Addresses and to Impress the Opposite Sex, Along the Way Learning How RIPE Assigns Blocks of Addresses, How to Access Webservices from Javascript and More Three Letter Acronyms Than You Ever Cared to Know

In case the slightly baroque title hasn't given you clue about the content of this article, I describe the implementation of an XML-RPC webservice programmed in Ruby, which provides information about IP addresses.

You might find this interesting if you're trying to find information about determining the country of origin of a specific IP address, or if you're looking for an example of how to implement XML-RPC webservices in Ruby.

The webservice itself could be useful to determine where visitors to your website are coming from by looking up the IP addresses in your log files. I also explain how you can use this webservice directly from Javascript applications on your site by using the jsRPC library.

The service exists as described and is located on this (www.kuriositaet.de) server at /ip/ip_ws.rb. The XML-RPC method name for the service is getIPInfo and it returns the following struct:

registry => where this IP is registered (i.e ARIN, RIPE ...)
country => two letter ISO 3166 country code 
status => one of "ASSIGNED" or "ALLOCATED"

In case any piece of information is unknown, a "?" is returned in it's place. In case of invalid IP addresses, a fault is generated.

Where to get the information?

Strangely enough, finding out where to obtain the definitive information about IP space allocation was nearly the most difficult part of the whole project.

How IP Addresses are Allocated

IANA (the Internet Assigned Numbers Authority) allocates IP Addresses to Regional Internet Registries (RIR), who in turn assign addresses to ISP's (Internet Service Providers) acting as LIRs (Local Internet Registries) which assign the addresses to their customers. How IANA divvied up the IPv4 address space is described in this document: ftp://ftp.iana.org/assignments/ipv4-address-space.

Presently, there are five RIR's:

AfriNIC based in Mauritius and responsible for Africa.
APNIC, located in Brisbane, Australia and responsible for the Asia Pacific region
ARIN (American Registry for Internet Numbers) in Virginia, responsible for North America
LACNIC located in Uruguay, responsible for Latin America
RIPE NCC (Réseaux IP Européens) located in the Netherlands and responsible for Europe, the Middle East and Central Asia.

These five regional registries have formed the NRO (Number Resource Organization) to coordinate their efforts.

How Allocation Data is Published.

Information about address allocation is published in the RIR Statistics Exchange Format. In a nutshell, each registry publishes a file containing their allocations:

Each file is called delegated-<registry>-yyyymmdd

The <registry> value follows the internal record format and is
one of the specified strings from the set:

{apnic,arin,iana,lacnic,ripencc};

(...)

The most recent file will also be available under a name of
the form delegated-<registry>-latest. This can be a symbolic
or hard link to the delegated-<registry>-yyyymmdd file but must
be pointed at the most recent file when it is updated.

Each RIR will make its files available in a standard ftp
directory, defined as /stats/<registry>/*.

Each RIR also mirrors the data from all the other registries, so it's only necessary to connect to a single server.

Downloading the Files

With the above information, it's easy to download the files using Ruby. ftp downloading is implemented in Ruby's standard library net/ftp package.

First, we cobble together a list of files to download:

file_names=[
    "afrinic",
    "apnic",
    "arin",
    "lacnic",
    "ripencc"
]
file_names.map! {|file| "/pub/stats/#{file}/delegated-#{file}-latest"}

All that remains to be done is to pick a server to download the files from and provide a local directory to copy the files to. I'm in Europe, so I'll download from RIPE:

url = "ftp.ripe.net"
localdir = "tmp"
Net::FTP.open(url) { |ftp|
    ftp.login
    file_names.each { |file|
        ftp.get(file, localdir+"/"+file.slice(/[^\/]*$/))
    }
}

The example leaves out all checks to make sure the local directory exists, error handling and, all the other stuff that programming is actually about, for the sake of clarity.

File Format

Now that we've downloaded all necessary files, let's look at their format. Thankfully, the files are CSV formatted using a pipe "|" (ASCII 0x7c) as field separator. The only other special feature is line commenting using a hash (#).

The file starts out with some headers which we're not interested in. The format of the main records is:

registry|cc|type|start|value|date|status[|extensions...]

registry contains the name of the registry this IP is assigned to, one of the fields the webservice will return. cc is the ISO 3166 two letter country code (e.g. US for, well US or DE for Germany.) We need this information as well.

type can be one of {asn,ipv4,ipv6} depending on whether this record is an Autonomous System Number, IP version 4, or IP version 6 entry. Since we're not interested in routing we'll ignore all the ASN entries and since no one uses IPv6, we'll ignore all records for ipv6 as well.

start is the IP address this block starts at, and value is the number of hosts compromising this block. date gives information about when this block was first assigned by the RIR. Finally status provides information about whether the block is assigned or allocated. In short, blocks are "assigned" to the final instance using the block. "Allocation" is basically delegation to LIRs who will split up the block to assign or allocate the pieces.

Let's have a look at an individual record then. This is the first ipv4 record in the current (2006-01-20) delegated-arin-latest as of my writing this:

arin|US|ipv4|3.0.0.0|16777216|19880223|assigned

Using our newly gained knowledge, we can immediately see the block of ipv4 addresses described in this record has been assigned by arin (to GE coincidentally). The first address of this block is 3.0.0.0 and there are 16777216 further addresses following 3.0.0.0.

Why provide information about the number of hosts when there's always 16,777,216 addresses in a Class A network? The answer is simply that RIRs allocate CIDR blocks as well, so you can't rely on the class of network to determine the number of hosts.

From CSV into the Database.

In order for the webservice to be snappy (and to be hip with the crowd) we'll parse the downloaded files and load them into a MySQL database. Personally, I'd prefer PostgreSQL database, but MySQL is what comes with my host. The database table is completely straightforward:

CREATE TABLE ip_ranges (
    registry VARCHAR (10), -- max length is ripencc, afrinic, each 7
    cc CHAR (2),
    ip_type CHAR (4), -- only ipv4 for now
    ip_from INTEGER UNSIGNED,
    ip_to INTEGER UNSIGNED,
    first_date DATE,
    status CHAR (1), -- L = aLlocated, S= aSsigned, 
    import_status CHAR (1)
    -- import status is used for import. Typically set to null, all
    -- existing values in the table are set to "1" before new values
    -- are imported. New values are inserted with import_status=2
    -- If the import is successful, all rows with
    -- import_status==1 are deleted and import_status==2 are set to
    -- null. If the import fails, all rows with import_status=2 are deleted 
    -- and rows with import_status==1 are reset to null.
);

CREATE INDEX idx_ip_from  ON ip_ranges (ip_from);
CREATE INDEX idx_ip_to  ON ip_ranges (ip_to);

With a few exceptions which I'll discuss below, the table is just a one-to-one mapping of the fields from the records in the RIR files. records, cc, first_date correspond to the RIR file, as does status, though I decided to save a little space by mapping ASSIGNED and ALLOCATED to S and L.

The import_status field is to ensure that we don't mess up everything in case an import fails. Details are in the code comments, in case you're interested.

ip_from is the start IP address converted to a number. IP addresses are converted to numbers because that makes them easier to work with. Like when determining which block a given IP falls into. An IP address is just a series of four bytes. Take 192.0.34.166, currently the IP of www.example.com in decimal, hex and bits.

dotted quad 192 0 34 166

hex C0 00 22 A6

bits 11000000 00000000 00100010 10100110

If we treat: 11000000 00000000 00100010 10100110 like a number, we get: 3221234342. This is much easier to work with than 192.0.34.166.

Only ip_to is still missing, it's the value of the last IP address in this block, obtained by adding the value field from the RIR records to the start address and subtracting one. That was the last bit we needed to know in order to parse the RIR files:

# for reference:
# arin|US|ipv4|3.0.0.0|16777216|19880223|assigned

arr = lineFromRIRFile.split("|")
registry=arr[0]
cc=arr[1]
ip_type=arr[2]
start_ip=IPAddr.new(arr[3])
ip_from=tmp.start_ip.to_i
number=arr[4].to_i
ip_to=from+number-1
first_date = arr[5]
status = arr[6]=="allocated"?"L":"S"

With all the information extracted and nicely laid out in aptly named variables, all that remains to be done is pack the data into the database. For a quick one-off job, we could do this:

insert= "INSERT INTO ip_ranges (registry, cc, ip_type, ip_from,"+
    " ip_to, first_date, status, import_status)"+
    " VALUES (?,?,?,?,?,?,?,'2')"

db = get_mysql () # magic !
db.prepare (insert)
db.execute (registry, cc, ip_type, ip_from, ip_to, first_date, status)

get_mysql() is defined elsewhere and retrieves a Ruby MySQL driver. I'm using a prepared statement for the insert because it saves the hassle of quoting and such. The call to prepare compiles the insert statement and execute() executes the statement with all the values previously parsed from the record.

Unfortunately, actually inserting every single record like this takes a while. Any database worth it's while has some sort of bulk import tool that's faster than plain inserting. Even MySQL has one! In MySQL the bulk import command is LOAD DATA LOCAL INFILE. All we need is a CSV file, with fields separated by tabs and each field corresponding to each column in the table.

tmplt = "%s\t%s\t%s\t%s\t%s\t%s\t%s\t2" 
line = tmplt % [registry, cc, ip_type, ip_from, ip_to, first_date, status]

Once the import file is complete and saved in, say tmp/import.file, it can be imported into the database like this:

load = "LOAD DATA LOCAL INFILE 'tmp/import.file' INTO TABLE ip_ranges"
db.query(sql_insert)

Almost there.

Now that all the data is loaded into the database, we should be able to get information about addresses by converting the IP to a number, and issuing a select statement. Since we previously calculated the number value of 192.0.34.166 (www.example.com) to be 3221234342, we'll use that:

SELECT * FROM ip_ranges 
WHERE 3221234342 BETWEEN ip_from AND ip_to

Unfortunately though, that select doesn't return anything because 192.0.34.166 belongs to IANA and isn't assigned by the RIRs. Therefore it isn't contained in any of the files we imported. Which means we stumbled across a little bug, eh, limitation. Try again with www.google.com. Google has a bunch of addresses, I'll pick one at random: 66.249.93.99. Most likely you can't transform that into a number in your head, but I can! It's: 1123638627, so typing:

SELECT * FROM ip_ranges 
WHERE 1123638627 BETWEEN ip_from AND ip_to

yields:

registry cc ip_type ip_from ip_to first_date status import_status

arin US ipv4 1123631104 1123639295 2004-03-05 S 0

Creating the Ruby Webservice

Now we can spend our nights checking IP addresses, so long as we have access to the MySQL database. That's not very useful, though, so we'll provide some wrapper methods to access the data. As promised, we'll write a XML-RPC webservice in Ruby. Lucky for us, Ruby provides XML-RPC functionality as part of it's standard library.

require "xmlrpc/server"
require "ipaddr"

The first line of code includes the standard libraries we're using. Both xmlrpc/server and ipaddr should have come installed with your Ruby distribution if it's moderately fresh. First off, I'll define a generic function to return a Ruby hash representation of the the XML-RPC struct we defined at the beginning. In case you've forgotten, the webservice is supposed to return the registry, country and assignment status of the provided IP address.

def get_ip_information ip
    addr = IPAddr.new(ip)
    result = nil
    h = {
        "registry" => "?",
        "country" => "?",
        "status" => "?"
    }
    stmt =  "SELECT registry, cc, status "+
        "FROM ip_ranges "+
        "WHERE #{addr.to_i} BETWEEN ip_from AND ip_to"

    get_mysql { |db|

        db.query(stmt) { |result|
            result.each { |result|
                h["registry"]   =result[0]
                h["country"]    =result[1]
                h["status"] =result[2]=="S" ? "ASSIGNED" : "ALLOCATED"
            }
        }
    }

    h
end

The code first instantiates an IPAddr object that we'll use to check the IP for validity, and to convert it to a number. The return value is prepared in the variable h to contain ? values in case we run into a "limitation" like the www.example.com fiasco. get_mysql prepares the database driver, selects the registration information and fills in our result.

Next, we need to instantiate an XML-RPC server object and connect the webservice functionality to it:

server = XMLRPC::CGIServer.new

server.add_handler("getIPInfo") { |ip|
    get_ip_information ip
}

The add_handler function attaches the functionality for a named XML-RPC method to the server. In the example above, the server is instructed to perform the code block behind add_handler whenever it receives an XML-RPC call to getIPInfo. The value of the parameter in the XML-RPC method call (the IP address in our case) is passed through by way of the variable ip.

To keep things simple, the code block doesn't do much. It merely passed the IP address on to the get_ip_information function.

The value returned by the code block is the Ruby hash generated by the get_ip_information function which the XML-RPC server automatically converts to the proper XML-RPC struct type.

Finally, we'll get fancy and define a second handler for the getIPInfo method which doesn't require you to pass any parameter but automatically returns the IP information for the caller's address. We need to define a second handler, because the XMLRPC implementation checks the arity of the code block and would throws a METHOD_MISSING fault if it encounters an XML-RPC request containing the incorrect number of parameters.

server = XMLRPC::CGIServer.new

server.add_handler("getIPInfo") { 
    get_ip_information ENV["REMOTE_ADDR"]
}

Alternatively, Ruby allows you to require parameters optionally, sort of like variable argument lists in C, by prefixing the variable with an asterisks "*". If you do so, any or all variables get passed to the code block as an array.

s.add_handler("getIPInfo") { |*ip|
    if ip.length == 0 || ip[0].strip == ""
        ip = ENV["REMOTE_ADDR"]
    else
        ip = ip[0]
    end
    get_ip_information ip       
}

Just for fun, I'll add one final method: getIPAddr to determine the IP of the client making the RPC call.

s.add_handler("getIPAddress") {
    ENV["REMOTE_ADDR"]
}

Try it!

Everything is set up and ready to go. In order to try out the webservice, just point your XML-RPC client to http://www.kuriositaet.de/ip/ip_ws.rb and make calls to getIPInfo.

Or you can try the service right from this page. I'm using my jsRPC library in order to access webservices directly from within this page. For example, you can press on this button to get the information about your IP address:

jsRPC makes it very easy to integrate the service in Javascript. First, you need to include the library:

<script src="/js/all_scripts.js" type="text/javascript"></script>

Apart from that, all you need to know about the library is that it contains an object named XmlRpc which can create proxy objects that connect to webservices. For example, in order to create a proxy for our webservice, do this:

var rpc = XmlRpc.getObject("/ip/ip_ws.rb", ["getIPInfo", "getIPAddress"])

The URL of the service and an array of method names are passed to the getObject function of XmlRpc, and the call returns a Javascript object which responds to those functions.

All that's left to do now is plain old Javascript:

// create an "onclick" function for the button
function alertIPInfo1 () {

    // call to the webservice
    var info = rpc.getIPInfo()

    // call to another method of the webservice
    var ip = rpc.getIPAddress()

    //assemble results and alert()
    var str = "Your address is: "+ip+"\n"
    str += info.status + " by '" + info.registry + "' in " + info.country
    alert (str)
}

And finally a tiny bit of HTML for the button:

<!-- connect the button to the function -->
<input type="button" value = "look up my ip" onclick="alertIPInfo()">

In case you'd like to try another IP address than your own, here's a final example:

A quick peek at the code reveals it's similar to the previous example, though we can leave out all the initialization. First we create a function that retrieves the entered IP address to hook up to the onclick event of the button. Since the rpc object is already initialized, it can be reused.

function alertIPInfo2 () {
    var ip = document.getElementById("ip_field").value
    var str = "Please enter a valid IP"
    try {
        var info = rpc.getIPInfo(ip) //reuse the rpc object here
        str = "Information for: "+ip+"\n"
        str += info.status + " by '" + info.registry + "' in " + info.country
    } catch (e) {
        // worry about this some other time :(  
    }
    alert (str)
}

The error handling isn't really pretty. We don't differentiate between faults generated by the webservice indicating invalid addresses and network errors, but it's good enough for a start. Now all we need is a test entry field and a button to hook up to the code.

<input type=text id="ip_field"> 
<input type=button value = "look up IP" onclick="alertIPInfo2()">

dotted quad	192	0	34	166
hex	C0	00	22	A6
bits	11000000	00000000	00100010	10100110

registry	cc	ip_type	ip_from	ip_to	first_date	status	import_status
arin	US	ipv4	1123631104	1123639295	2004-03-05	S	0