How to Geocode IP addresses in Displayr
Most communication between users and websites requires IP addresses. However, when profiling web traffic it is more useful to consider where users are physically located. Geocoding is the process of translating an IP address to a physical location. In this post I describe how to geocode IPs in Displayr.
To demonstrate geocoding, we’ll use the IP addresses of the universities listed on this website. We are using universities because they each have a defined location, enabling us to check if this matches their geocoding. The first task is to convert URLs to IP addresses using DNS. The 25 IP addresses are listed below. If you are analyzing traffic from a website then you’ll have a list of IP addresses already (rather than URLs).
Note that we are using IPv4 addresses but this works with IPv6 addresses as well.
Data input and output
To perform geocoding in Displayr, select the variable containing the IP addresses from the data tree (in the bottom-left of the screen) then navigate to Insert > More > Data > Geocode IPs.
A new categorical variable containing the countries deduced from the IP addresses is added to the data set. By dragging that variable onto the page we can create a table of percentages or counts as shown below.
Imprecision of geocoding
As a test of accuracy, below I plot the countries where correct encodings are blue and wrong encodings are red. The two wrong results are King Saud University of Saudi Arabia which was geolocated to the Netherlands and Utrecht University of the Netherlands which was encoded to Belgium.
When we say that some results are wrong, there are at least 2 possible explanations.
- The web servers for King Saud University genuinely are in the Netherlands. If this is the case the encoding is correct but our hypothesis that universities host their websites in their home country is wrong.
- Geocoding is an imprecise science. It works by looking up an IP address in a database. Databases use a variety of information sources to link IPs and locations, such as tracing web traffic and ownership of IPs. However there is no permanent mapping from an IP address to a location, so this is a “best efforts” service. Across many IPs it’s likely that several locations are inaccurate and you should draw your conclusions from the overall distribution rather than from specific IPs.
About Jake Hoare
After escaping from physics to a career in banking, then escaping from banking, I decided to go back to BASIC and study computing. This led me to rediscover artificial intelligence and data science. I now get to indulge myself at Displayr working in the Data Science team, sometimes on machine learning.