Well, here we are how many months later, and it’s my first post? My friends have been ribbing me – an online guy active in social media – not updating his blog for almost a year.
Sadly, I haven’t been all that active in social media, either, until the last couple of weeks. Checked out completely. Nowhere to be found in real-time. Gone from the immediate consciousness of those I hold near and dear (or at least, in the case of Twitter, the great founts and filters of information). No longer part of the noise. Like a black hole, I might be there but no one could detect me. Tsk, tsk.
The reality is, however, that for the last year I have been heads down moving deeply into machine learning, audience segmentation, behavioral targeting and recommendation engines for mobile advertising. You try pulling all that down and back into your tool set after 25 years way from a masters in robotics (which involved what was then called adaptive learning) even as you deliver product specs, running deep analytics, and building product. I dare you. While I ain’t done yet, but I’ve finally gotten to a point where I can take a breath and look up and see what’s going on in the world. However, it’s not like it used to be when it comes to my participation in social media. Deep analysis requires intense concentration (at least for me), and all the interruptions from Tweetdeck just kill my train of thought. So it’s beginning of end or end of day mostly, with an occasional day where I can just relax and range through lots of immediate information.
Sigh. The price I pay for working in an area that is intensely mathematical and has become my passion. The price I pay for reveling in the ability to build incredible products on huge data. A price, but well worth it.
So now that the mea culpa is past, what’s on the agenda for today. Given I have been working in the mobile space, it seems appropriate to start with the issue of geolocation. Geolocation data represents a relatively new input to data mining, but one that can provide a host of opportunities to identify and segment audiences. Admittedly, there are a number of services like Loopt, Foursquare, and others that make their bones on using geolocation to understand where you are and what might be of interest to you. But believe it or not, for the majority of businesses and even many technology companies, the whole idea of using geolocation data as elements of customer profiles is completely new. Many are still trying to wrap their heads around how to best set up a mobile website and integrate it into their overall marketing programs. The science of geolocation? Not even on the radar yet.
It has certainly been an eye opener for me to learn this field – it is deep, rich and complex. So I thought I would build a primer for those who, like me, had to start from scratch and understand how geolocation works and how you might use it to enhance your customer offerings. There is a lot to cover, so this will be another multipart series.
The Basics of Geolocation
Most people know that anyone with a mobile device can be geolocated. But what many people do not know is that a device can also be geolocated if you are online through information transmitted by the device’s browser (especially Google Chrome and FireFox). The combination of these technologies provides a powerful set of tools for tightly locating a device (meaning a radius of under 200 feet) even when GPS, the most fine-grained way of locating a device, is not turned on.
The core methods by which a device can be geolocated and which we will discuss in the next sections are:
- The Global Positioning System or GPS
- IP Address
- Assisted GPS
- Network Base Station Data
- Network (or Cell Tower) Triangulation
The Global Positioning System
For those few who have never been to a James Bond flick, watched Law and Order, or seen a TomTom commercial, GPS stands for Global Positioning System. But even if you know the term “GPS” you may not know how it works. So let’s start there.
GPS is a space-based satellite navigation system that provides location and time information in all weather, anywhere on or near the Earth, where there is an unobstructed line of sight to four or more GPS satellites. It consists of 24 satellites, is maintained by the United States government and is freely accessible to anyone with a GPS receiver.
GPS is accurate to a very tight radius – current technologies can get a horizontal accuracy of ~1 meter (3 feet) and a vertical accuracy of ~1.5 meters. But GPS accuracy for most mobile phones and pads is probably on the order of a 30-50 foot radius. Garmin, a maker of navigation systems, says its devices are accurate to 15 meters, for example.
Most mobile devices have a GPS receiver built in, although it is not turned on by default due to the fact it drains batteries very quickly. This default is, in fact, the biggest hurdle to accurately geolocating a device, since GPS is by far the most accurate mechanism available.
GPS satellites transmit two low power radio signals, which travel by line of sight. As a result, they can pass through clouds, glass and plastic but will not go through most solid objects such as buildings and mountains.
A GPS signal contains three different bits of information – a pseudorandom code, ephemeris data and almanac data.
The pseudorandom code is simply an I.D. code that identifies which satellite is transmitting information.
Ephemeris data is information GPS satellites transmit about their location (current and predicted), timing and ‘health’. This data is used by GPS receivers to enable them to estimate location relative to the satellites and thus position on earth.
Almanac data tells the GPS receiver where each GPS satellite should be at any time throughout the day. Each satellite transmits almanac data showing the orbital information for that satellite and for every other satellite in the system.
Each GPS satellite is located ~12,000 miles above the Earth and makes two complete rotations every day. GPS receivers in mobile devices attempt to locate four or more of these satellites, calculate the distance to each, and then use the information to geolocate a 3D position (latitude, longitude, altitude). Once the user’s position is determined, the GPS receiver can calculate other information, such as speed, bearing, track, trip distance and much more.
The calculation is based on trilateration, which is a mathematical model for determining the absolute or relative position of points using the geometry of circles, spheres, and triangles. Unlike triangulation, which is what most people think GPS uses to fix a location, it does not involve the measurement of angles. To emphasize this, I have chosen a slightly more technical diagram to represent the concept. Note that this calculation does not just involve calculating the intersection of the three radii (point B, which is what we are geolocating) – there are also components that relate to the relative positions of the three foci of the circles.
Sources of GPS Signal Errors
As we start talking about GPS accuracy and the accuracy of other geolocation technology, we need to understand what types of errors can enter into each system. For GPS, there are six types of signal errors that can occur. Fortunately even with them GPS is incredibly accurate. The table below summarizes the size of the potential effect of various errors, which are then described in more detail.
|Source of Error||Size of Error|
|Multipath Effect||+/- 1 meter|
|Atmospheric Effects||+/- 5 meters|
|Receiver Clock Errors||+/- 2 meters|
|Geometry Shading||+/- 2.5 meters|
|Ephemeris Errors||+/- 1 meter|
Signal Multipath Errors. Signal multipath errors are caused by the GPS signal reflecting off objects such as tall buildings or other large, highly reflective surfaces before it reaches the receiver. This increases the travel time of the signal, thus introducing errors into the calculation. The resulting error typically lies in the range of a few meters.
Atmospheric Delays. Atmospheric delays represent the largest potential source of GPS signal error. Satellite signals slow as they pass through the ionosphere and troposhere. While radio signals travel with the velocity of light in outer space, their propagation in the ionosphere and troposphere is slower. In the ionosphere in a large number of electrons and positive charged ions are formed by the ionizing force of the sun. These charged ions refract the electromagnetic waves from the satellites, resulting in an elongated runtime of the signals. In the troposphere, varying concentrations of water vapor further elongate the runtime of signals. These errors are mostly corrected by calculations in the GPS receivers, since typical variations of the velocity while passing through the atmosphere are well known for standard conditions.
Receiver Clock Errors. A receiver’s built-in clock is not as accurate as the atomic clocks onboard the GPS satellites. Therefore, it may have very slight timing errors.
Ephemeris Errors. Ephemeris errors occur when a satellite incorrectly reports its position.
Too Few Visible Satellites. The more satellites a GPS receiver can “see,” the better the accuracy. Buildings, terrain, electronic interference, or sometimes even dense foliage can block signal reception
Geometry Shading. Another factor influencing the accuracy of the reported position is “satellite geometry”. Satellite geometry describes the position of the satellites relative to each other from the view of the receiver. Ideal satellite geometry exists when the satellites are located at wide angles relative to each other. Poor geometry results when the satellites are located in a line or in a tight grouping. For example, if a receiver sees 4 satellites and all are arranged in the northwest, this leads to a “bad” geometry. In the worst case, no position determination is possible at all, when all distance determinations point to the same direction. Even if a position is determined, the error of the positions may be significant, although in practice it is usually no more than 2.5 meters. If, on the other hand, the 4 satellites are well distributed over the whole firmament the determined position will be much more accurate.
Next Installment: Geolocation using IP Address