Writing an IP Address Information Webservice in Ruby
or Writing an XML-RPC Webservice using Ruby and MySQL that can be used to determine Country Information from IP Addresses and to Impress the Opposite Sex, Along the Way Learning How RIPE Assigns Blocks of Addresses, How to Access Webservices from Javascript and More Three Letter Acronyms Than You Ever Cared to Know
In case the slightly baroque title hasn't given you clue about the content of this article, I describe the implementation of an XML-RPC webservice programmed in Ruby, which provides information about IP addresses.
You might find this interesting if you're trying to find information about determining the country of origin of a specific IP address, or if you're looking for an example of how to implement XML-RPC webservices in Ruby.
The webservice itself could be useful to determine where visitors to your website are coming from by looking up the IP addresses in your log files. I also explain how you can use this webservice directly from Javascript applications on your site by using the jsRPC library.
The service exists as described and is located on this
(www.kuriositaet.de
) server at
/ip/ip_ws.rb
. The XML-RPC method name for the service is getIPInfo
and it returns the following struct:
registry => where this IP is registered (i.e ARIN, RIPE ...)
country => two letter ISO 3166 country code
status => one of "ASSIGNED" or "ALLOCATED"
In case any piece of information is unknown, a "?" is returned in it's
place. In case of invalid IP addresses, a fault
is generated.
Where to get the information?
Strangely enough, finding out where to obtain the definitive information about IP space allocation was nearly the most difficult part of the whole project.
How IP Addresses are Allocated
IANA (the Internet Assigned Numbers Authority) allocates IP Addresses to Regional Internet Registries (RIR), who in turn assign addresses to ISP's (Internet Service Providers) acting as LIRs (Local Internet Registries) which assign the addresses to their customers. How IANA divvied up the IPv4 address space is described in this document: ftp://ftp.iana.org/assignments/ipv4-address-space.
Presently, there are five RIR's:
AfriNIC based in Mauritius and responsible for Africa.
APNIC, located in Brisbane, Australia and responsible for the Asia Pacific region
ARIN (American Registry for Internet Numbers) in Virginia, responsible for North America
LACNIC located in Uruguay, responsible for Latin America
RIPE NCC (Réseaux IP Européens) located in the Netherlands and responsible for Europe, the Middle East and Central Asia.
These five regional registries have formed the NRO (Number Resource Organization) to coordinate their efforts.
How Allocation Data is Published.
Information about address allocation is published in the RIR Statistics
Exchange Format
. In a nutshell, each registry publishes a
file containing their allocations:
Each file is called delegated-<registry>-yyyymmdd
The <registry> value follows the internal record format and is
one of the specified strings from the set:
{apnic,arin,iana,lacnic,ripencc};
(...)
The most recent file will also be available under a name of
the form delegated-<registry>-latest. This can be a symbolic
or hard link to the delegated-<registry>-yyyymmdd file but must
be pointed at the most recent file when it is updated.
Each RIR will make its files available in a standard ftp
directory, defined as /stats/<registry>/*.
Each RIR also mirrors the data from all the other registries, so it's only necessary to connect to a single server.
Downloading the Files
With the above information, it's easy to download the files using Ruby.
ftp
downloading is implemented in Ruby's standard library
net/ftp
package.
First, we cobble together a list of files to download:
file_names=[
"afrinic",
"apnic",
"arin",
"lacnic",
"ripencc"
]
file_names.map! {|file| "/pub/stats/#{file}/delegated-#{file}-latest"}
All that remains to be done is to pick a server to download
the files from and provide a local directory to copy the files to. I'm
in Europe, so I'll download from RIPE
:
url = "ftp.ripe.net"
localdir = "tmp"
Net::FTP.open(url) { |ftp|
ftp.login
file_names.each { |file|
ftp.get(file, localdir+"/"+file.slice(/[^\/]*$/))
}
}
The example leaves out all checks to make sure the local directory exists, error handling and, all the other stuff that programming is actually about, for the sake of clarity.
File Format
Now that we've downloaded all necessary files, let's look at their
format. Thankfully, the files are CSV
formatted using a
pipe "|" (ASCII 0x7c
) as field separator. The only other
special feature is line commenting using a hash (#
).
The file starts out with some headers which we're not interested in. The format of the main records is:
registry|cc|type|start|value|date|status[|extensions...]
registry
contains the name of the registry this IP is assigned to,
one of the fields the webservice will return. cc
is the ISO 3166
two letter country
code (e.g. US
for, well US or DE
for Germany.) We need this information
as well.
type
can be one of {asn,ipv4,ipv6}
depending on whether this record
is an Autonomous System Number
, IP version 4, or IP version 6
entry. Since we're not interested in routing we'll ignore all the ASN
entries and since no one uses IPv6, we'll ignore all records for ipv6
as well.
start
is the IP address this block starts at, and value
is the
number of hosts compromising this block. date
gives information about
when this block was first assigned by the RIR. Finally status
provides
information about whether the block is assigned or allocated. In
short, blocks are "assigned" to the final instance using the
block. "Allocation" is basically delegation to LIRs who will split up
the block to assign or allocate the pieces.
Let's have a look at an individual record then. This is the first ipv4
record in the current (2006-01-20) delegated-arin-latest
as of my
writing this:
arin|US|ipv4|3.0.0.0|16777216|19880223|assigned
Using our newly gained knowledge, we can immediately see the block of ipv4
addresses described in this record has
been assigned
by arin
(to GE coincidentally). The first address of
this block is 3.0.0.0
and there are 16777216
further addresses
following 3.0.0.0
.
Why provide information about the number of hosts when there's always 16,777,216 addresses in a Class A network? The answer is simply that RIRs allocate CIDR blocks as well, so you can't rely on the class of network to determine the number of hosts.
From CSV into the Database.
In order for the webservice to be snappy (and to be hip with the crowd) we'll
parse the downloaded files and load them into a MySQL
database.
Personally, I'd prefer PostgreSQL database, but MySQL
is
what comes with my host. The database table is completely straightforward:
CREATE TABLE ip_ranges (
registry VARCHAR (10), -- max length is ripencc, afrinic, each 7
cc CHAR (2),
ip_type CHAR (4), -- only ipv4 for now
ip_from INTEGER UNSIGNED,
ip_to INTEGER UNSIGNED,
first_date DATE,
status CHAR (1), -- L = aLlocated, S= aSsigned,
import_status CHAR (1)
-- import status is used for import. Typically set to null, all
-- existing values in the table are set to "1" before new values
-- are imported. New values are inserted with import_status=2
-- If the import is successful, all rows with
-- import_status==1 are deleted and import_status==2 are set to
-- null. If the import fails, all rows with import_status=2 are deleted
-- and rows with import_status==1 are reset to null.
);
CREATE INDEX idx_ip_from ON ip_ranges (ip_from);
CREATE INDEX idx_ip_to ON ip_ranges (ip_to);
With a few exceptions which I'll discuss below, the table is just a
one-to-one mapping of the fields from the records in the RIR files.
records
, cc
, first_date
correspond to the RIR file, as does
status
, though I decided to save a little space by mapping ASSIGNED
and ALLOCATED
to S
and L
.
The import_status
field is to ensure that we don't mess up everything
in case an import fails. Details are in the code comments, in case
you're interested.
ip_from
is the start
IP address converted to a number. IP addresses
are converted to numbers because that makes them easier to work with.
Like when determining which block a given IP falls into. An IP
address is just a series of four bytes. Take 192.0.34.166
, currently
the IP of www.example.com
in decimal, hex and bits.
dotted quad | 192 | 0 | 34 | 166 |
hex | C0 | 00 | 22 | A6 |
bits | 11000000 | 00000000 | 00100010 | 10100110 |
If we treat: 11000000 00000000 00100010 10100110
like a number, we
get: 3221234342
. This is much easier to work with than 192.0.34.166
.
Only ip_to
is still missing, it's the value of the last IP address in
this block, obtained by adding the value
field from the RIR records to
the start address and subtracting one. That was the last bit we needed to
know in order to parse the RIR files:
# for reference:
# arin|US|ipv4|3.0.0.0|16777216|19880223|assigned
arr = lineFromRIRFile.split("|")
registry=arr[0]
cc=arr[1]
ip_type=arr[2]
start_ip=IPAddr.new(arr[3])
ip_from=tmp.start_ip.to_i
number=arr[4].to_i
ip_to=from+number-1
first_date = arr[5]
status = arr[6]=="allocated"?"L":"S"
With all the information extracted and nicely laid out in aptly named variables, all that remains to be done is pack the data into the database. For a quick one-off job, we could do this:
insert= "INSERT INTO ip_ranges (registry, cc, ip_type, ip_from,"+
" ip_to, first_date, status, import_status)"+
" VALUES (?,?,?,?,?,?,?,'2')"
db = get_mysql () # magic !
db.prepare (insert)
db.execute (registry, cc, ip_type, ip_from, ip_to, first_date, status)
get_mysql()
is defined elsewhere and retrieves a Ruby MySQL
driver. I'm using a prepared statement for the insert
because it saves the hassle of quoting and such. The call to prepare
compiles the insert statement and execute()
executes the statement with
all the values previously parsed from the record.
Unfortunately, actually insert
ing every single record like this takes
a while. Any database worth it's while has some sort of bulk import
tool that's faster than plain insert
ing. Even MySQL has one! In MySQL
the bulk import command is LOAD DATA LOCAL INFILE
. All we
need is a CSV
file, with fields separated by tabs and each field
corresponding to each column in the table.
tmplt = "%s\t%s\t%s\t%s\t%s\t%s\t%s\t2"
line = tmplt % [registry, cc, ip_type, ip_from, ip_to, first_date, status]
Once the import file is complete and saved in, say tmp/import.file
,
it can be imported into the database like this:
load = "LOAD DATA LOCAL INFILE 'tmp/import.file' INTO TABLE ip_ranges"
db.query(sql_insert)
Almost there.
Now that all the data is loaded into the database, we should be able to
get information about addresses by converting the IP to a number, and
issuing a select statement. Since we previously calculated the number value of
192.0.34.166
(www.example.com
) to be 3221234342, we'll use that:
SELECT * FROM ip_ranges
WHERE 3221234342 BETWEEN ip_from AND ip_to
Unfortunately though, that select
doesn't return anything because
192.0.34.166
belongs to IANA and isn't assigned by the RIRs.
Therefore it isn't contained in any of the files we imported. Which
means we stumbled across a little bug, eh, limitation. Try again with
www.google.com
. Google has a bunch of addresses, I'll pick one at
random: 66.249.93.99
. Most likely you can't transform that into a
number in your head, but I can! It's: 1123638627
, so typing:
SELECT * FROM ip_ranges
WHERE 1123638627 BETWEEN ip_from AND ip_to
yields:
registry | cc | ip_type | ip_from | ip_to | first_date | status | import_status |
arin | US | ipv4 | 1123631104 | 1123639295 | 2004-03-05 | S | 0 |
Creating the Ruby Webservice
Now we can spend our nights checking IP addresses, so long as we have access to the MySQL database. That's not very useful, though, so we'll provide some wrapper methods to access the data. As promised, we'll write a XML-RPC webservice in Ruby. Lucky for us, Ruby provides XML-RPC functionality as part of it's standard library.
require "xmlrpc/server"
require "ipaddr"
The first line of code includes the standard libraries we're using. Both
xmlrpc/server
and ipaddr
should have come installed with your Ruby
distribution if it's moderately fresh. First off, I'll define a generic
function to return a Ruby hash
representation of the the XML-RPC
struct
we defined at the beginning. In case you've forgotten, the
webservice is supposed to return the registry
, country
and
assignment status
of the provided IP address.
def get_ip_information ip
addr = IPAddr.new(ip)
result = nil
h = {
"registry" => "?",
"country" => "?",
"status" => "?"
}
stmt = "SELECT registry, cc, status "+
"FROM ip_ranges "+
"WHERE #{addr.to_i} BETWEEN ip_from AND ip_to"
get_mysql { |db|
db.query(stmt) { |result|
result.each { |result|
h["registry"] =result[0]
h["country"] =result[1]
h["status"] =result[2]=="S" ? "ASSIGNED" : "ALLOCATED"
}
}
}
h
end
The code first instantiates an IPAddr
object that we'll use to check
the IP for validity, and to convert it to a number. The return value is
prepared in the variable h
to contain ?
values in case we run into a
"limitation" like the www.example.com
fiasco. get_mysql
prepares the
database driver, selects the registration information and fills in our
result.
Next, we need to instantiate an XML-RPC server object and connect the webservice functionality to it:
server = XMLRPC::CGIServer.new
server.add_handler("getIPInfo") { |ip|
get_ip_information ip
}
The add_handler
function attaches the functionality for a named
XML-RPC method to the server. In the example above, the server is
instructed to perform the code block behind add_handler
whenever it
receives an XML-RPC call to getIPInfo
. The value of the parameter in
the XML-RPC method call (the IP address in our case) is passed through
by way of the variable ip
.
To keep things simple, the code block doesn't do much. It merely passed the IP
address on to the get_ip_information
function.
The value returned by the code block is the Ruby hash
generated by the
get_ip_information
function which the XML-RPC server automatically
converts to the proper XML-RPC struct
type.
Finally, we'll get fancy and define a second handler for the getIPInfo
method which doesn't require you to pass any parameter but automatically
returns the IP information for the caller's address. We need to define a
second handler, because the XMLRPC implementation checks the
arity
of the code block and would throws a METHOD_MISSING
fault if it encounters an XML-RPC request containing the incorrect
number of parameters.
server = XMLRPC::CGIServer.new
server.add_handler("getIPInfo") {
get_ip_information ENV["REMOTE_ADDR"]
}
Alternatively, Ruby allows you to require parameters
optionally, sort of like variable argument lists in C
, by
prefixing the variable with an asterisks "*". If you do so, any or all
variables get passed to the code block as an array.
s.add_handler("getIPInfo") { |*ip|
if ip.length == 0 || ip[0].strip == ""
ip = ENV["REMOTE_ADDR"]
else
ip = ip[0]
end
get_ip_information ip
}
Just for fun, I'll add one final method: getIPAddr
to determine the IP
of the client making the RPC call.
s.add_handler("getIPAddress") {
ENV["REMOTE_ADDR"]
}
Try it!
Everything is set up and ready to go. In order to try out the
webservice, just point your XML-RPC client to
http://www.kuriositaet.de/ip/ip_ws.rb
and make calls to getIPInfo
.
Or you can try the service right from this page. I'm using my jsRPC library in order to access webservices directly from within this page. For example, you can press on this button to get the information about your IP address:
jsRPC makes it very easy to integrate the service in Javascript. First, you need to include the library:
<script src="/js/all_scripts.js" type="text/javascript"></script>
Apart from that, all you need to know about the library is that it
contains an object named XmlRpc
which can create proxy objects
that connect to webservices. For example, in order to create a proxy for our
webservice, do this:
var rpc = XmlRpc.getObject("/ip/ip_ws.rb", ["getIPInfo", "getIPAddress"])
The URL of the service and an array of method names are passed to the
getObject
function of XmlRpc
, and the call returns a Javascript
object which responds to those functions.
All that's left to do now is plain old Javascript:
// create an "onclick" function for the button
function alertIPInfo1 () {
// call to the webservice
var info = rpc.getIPInfo()
// call to another method of the webservice
var ip = rpc.getIPAddress()
//assemble results and alert()
var str = "Your address is: "+ip+"\n"
str += info.status + " by '" + info.registry + "' in " + info.country
alert (str)
}
And finally a tiny bit of HTML for the button:
<!-- connect the button to the function -->
<input type="button" value = "look up my ip" onclick="alertIPInfo()">
In case you'd like to try another IP address than your own, here's a final example:
A quick peek at the code reveals it's similar to the previous example,
though we can leave out all the initialization. First we create a
function that retrieves the entered IP address to hook up to the
onclick
event of the button. Since the rpc
object is already
initialized, it can be reused.
function alertIPInfo2 () {
var ip = document.getElementById("ip_field").value
var str = "Please enter a valid IP"
try {
var info = rpc.getIPInfo(ip) //reuse the rpc object here
str = "Information for: "+ip+"\n"
str += info.status + " by '" + info.registry + "' in " + info.country
} catch (e) {
// worry about this some other time :(
}
alert (str)
}
The error handling isn't really pretty. We don't differentiate between
fault
s generated by the webservice indicating invalid addresses and
network errors, but it's good enough for a start. Now all we need is a
test entry field and a button to hook up to the code.
<input type=text id="ip_field">
<input type=button value = "look up IP" onclick="alertIPInfo2()">