You could be seeing a great picture here
_

Charles on… anything that comes along

Thursday 31 August 2006

Filed under: — Charles @ 2:03 pm

Top of the heap, bottom of the class?

In this week’s Guardian Technology I’ve written Top of the heap, which looks at the (impersonal) data that came out of the mistaken public release by AOL of the Google searches - half a million of them, covering several days, which have been fallen upon with glee by ’search engine optimisers’ (who look for key words, then spam your blog with them..).

Here’s the headline result: in those searches, when people clicked through on a result (as they did in just over 50% of cases), 42% of clicks were on the first result, 11% on the second. That’s more than half of clicks on the first two results.

And it also happens that Wikipedia turns up as the top or #2 result in loads of searches - as Nick Carr has noted (and the IPcentral weblog picked up). The open question: is that a good thing?

The AOL data is fascinating in its own right. I’m trying (and failing, so far*) to load it into my MySQL database on my laptop (the GUI program keeps falling over - there are 10 files, each 50,000 lines long) to have a play, but if you want to have a hack there are sites out there - like this and this - which will let you hunt and peck around.

Two caveats: it’s not filtered in any way. And also, you don’t know if the person who’s providing the site isn’t watching what *you’re* searching for. Bear that in mind and knock yourself out.

* And here’s why I’m failing. 10 files, yes? I load one into a text editor, because the MySQL GUI program I’ve got keeps crashing when I try to load the file. After a long time, the file loads.
Ask Applescript: set thecount to {count lines, count words}.
A long pause. And then the answer:
{3640128, 37297191}.
In other words, one-tenth of these searches comprises 3.64 million records, with more than 37 million words. Multiply by 10…. Perhaps I should let the sites out there handle it instead..

Update: fixed links. Also, you should go and read Andrew Brown’s piece on the AOL data in the Guardian. I can’t mention it often enough, really.

5 Responses to “Top of the heap, bottom of the class?”

  1. pauldwaite Says:

    Nah, keep trying on your laptop, should be fun :) I’ll download it later on and see how my G3 iMac fares.

  2. joanna Says:

    I found lots of interesting insights in your Top of the Heap article. I particularly like the fact that people click on the 10th result rather than move to the next page of results, and the number of people who unwittingly run searches on ‘Search terms’ or whatever text pre-populates the search box - I’ve noticed this on websites I’ve managed in the past as well.

    I’d expect there to be a fairly strong correlation between how ‘web savvy’ a user is and the likelihood of them seeing past the first few search results. Over time, more savvy users might start ignoring the Wikipedia result in the top position, if they know what they’re going to get from the Wikipedia page and that it’s not what they’re after on that occasion.

  3. Rage on Omnipotent » Blog Archive » AOL search data Says:

    […] Interesting to see how the AOL search data breaks down. Thanks to Charles Arthur for the link. Interesting points - last page results are more important than you would think (because it is one click away) and that search engines appear to link to each other more often than you might think useful. […]

  4. Barbara Cookson Says:

    Your link to the IP central weblog seems to be wrong - but it was an interesting diversion

  5. » 10 comments Joanna Tidball: Web consultancy and copywriting Says:

    […] Top of the heap, bottom of the class? at Charles on…anything that comes along […]

Leave a Reply

Powered by WordPress