Top of the heap, bottom of the class?
In this week’s Guardian Technology I’ve written Top of the heap, which looks at the (impersonal) data that came out of the mistaken public release by AOL of the Google searches - half a million of them, covering several days, which have been fallen upon with glee by ’search engine optimisers’ (who look for key words, then spam your blog with them..).
Here’s the headline result: in those searches, when people clicked through on a result (as they did in just over 50% of cases), 42% of clicks were on the first result, 11% on the second. That’s more than half of clicks on the first two results.
And it also happens that Wikipedia turns up as the top or #2 result in loads of searches - as Nick Carr has noted (and the IPcentral weblog picked up). The open question: is that a good thing?
The AOL data is fascinating in its own right. I’m trying (and failing, so far*) to load it into my MySQL database on my laptop (the GUI program keeps falling over - there are 10 files, each 50,000 lines long) to have a play, but if you want to have a hack there are sites out there - like this and this - which will let you hunt and peck around.
Two caveats: it’s not filtered in any way. And also, you don’t know if the person who’s providing the site isn’t watching what *you’re* searching for. Bear that in mind and knock yourself out.
* And here’s why I’m failing. 10 files, yes? I load one into a text editor, because the MySQL GUI program I’ve got keeps crashing when I try to load the file. After a long time, the file loads.
Ask Applescript: set thecount to {count lines, count words}.
A long pause. And then the answer:
{3640128, 37297191}.
In other words, one-tenth of these searches comprises 3.64 million records, with more than 37 million words. Multiply by 10…. Perhaps I should let the sites out there handle it instead..
Update: fixed links. Also, you should go and read Andrew Brown’s piece on the AOL data in the Guardian. I can’t mention it often enough, really.
- These posts might be related (the database thinks..):
- The sole design flaw in the iPod shuffle (9 March 2005; score: 32.41%)
- Amazon reviews of 1984, by the post-1984 class: what sort of clock "strikes" anyway? (19 January 2007; score: 27.57%)
- The answer to one is "pants".. (17 June 2005; score: 27.25%)



