Tuesday, April 6, 2010

Google. What They Know About You Explained.

With yet another Google Privacy story in the news today, I thought it would be interesting to examine exactly what Google knows about the average internet user and how they gather their information. To simplify the article I have excluded any Google services that require some kind of authentication such as Gmail or Google Apps. All the information discussed below is harvested from so called anonymous browsing.

Firstly, irrespective of your Internet browser or even the HTTP protocol, every time you visit a web site, you must send your IP address to the destination so that the receiver knows where to send the reply. If you are a home user your IP address is allocated on a semi random basis from a large pool managed by your ISP. Even so, publicly available online tools can be used to narrow down your location from your IP address to at least the nearest city to where you browse from. Many sites use this information to target advertising at you. This is most obvious if you browse a site in a different country but see advertisements in your local language specific to businesses or services from your country. Your ISP of course can use the IP address to uniquely identify you.

Again, even before you start using Google products or web sites, each time you access a new web page you send the web server certain information contained in “headers”. One of these is called the user agent which tells the website your browser type and operating system.

So, before you even type the first character into Google’s search engine, they already know more or less where you live, your operating system and your browser type. This of course is not limited to Google but applies to any web site. Once you do access www.google.com for the first time you receive a cookie, nominally to record your preferences, e.g. language, number of results per page etc, but which also contains a unique ID. The next time you access the Google site the cookie, which is stored on your hard drive, is read and the ID from the first visit retrieved. Each search you perform is recorded by Google together with the ID to build up profile of your browsing habits. The information is used to target advertising at you, with the targeting becoming more effective as your profile expands.

Things get even more interesting for users of Google’s Chrome internet browser. As Microsoft highlighted last week, the address bar of Chrome is also the search bar. Every key stroke typed into the address/search bar is sent to Google to allow for an auto suggest of the term you may want to search for or the site you wish to browse to. Guess what. Your Google cookie containing the unique ID is also sent to Google allowing them to record every site you visit.

Many commentators also flag Google Analytics as a way Google can record your internet activity even if your web usage habits prohibit use of the previously mentioned techniques. Google Analytics allow web masters to record usage statistics for their site. It works by including a small amount of java script on each page that sends information to Google about the user’s activity. From what I can see, it doesn’t send the Google cookie and so identifying a user is limited to IP address.

What does this mean in practise? In a worst case scenario, a law enforcement agency with complicity from Google and your ISP can build a complete record of your internet browsing including time and location information. Eric Schmidt, Google’s CEO, has previously stated that if you’ve nothing to hide then what’s the problem? This may be true if you live in a Western Democracy although many people would argue otherwise. It’s certainly not the case if you live in a country where the human rights’ record is less ideal.

In European countries at least, abuse of this data is prohibited by the European Data Protection Directive and the country based laws that reflect it. This is just as well as imagine your employer, or anyone else for that matter, being able to get hold of all your internet activity including that from your home PC.

It’s also worth pointing out that none of these data gathering techniques are unique to Google but as it is the biggest player in the internet space, they can gather the most data and consequently attract the most criticism when people complain about privacy issues.

In theory there are a few things you can do to limit the information that you leak to Google. The simplest is to delete you cookies on a regular basis which makes it much harder to track your activity. There are also services like GoogleSharing that anonymise your Google traffic, but in reality you can never completely hide your browsing patterns. If you want to use the internet to the full, it is necessary to accept that some of your privacy is lost.

1 comment:

  1. The original version of this post had a paragraph about how the anti-phishing feature of Firefox could be used by Google to determine the web sites visited by a user. That was of course incorrect; although the cookie and unique ID are sent to Google each time the anti-phishing database is updated.

    ReplyDelete