MonthAugust 2006

Top of the heap, bottom of the class?

In this week’s Guardian Technology I’ve written Top of the heap, which looks at the (impersonal) data that came out of the mistaken public release by AOL of the Google searches – half a million of them, covering several days, which have been fallen upon with glee by ‘search engine optimisers’ (who look for key words, then spam your blog with them..).

Here’s the headline result: in those searches, when people clicked through on a result (as they did in just over 50% of cases), 42% of clicks were on the first result, 11% on the second. That’s more than half of clicks on the first two results.

And it also happens that Wikipedia turns up as the top or #2 result in loads of searches – as Nick Carr has noted (and the IPcentral weblog picked up). The open question: is that a good thing?

The AOL data is fascinating in its own right. I’m trying (and failing, so far*) to load it into my MySQL database on my laptop (the GUI program keeps falling over – there are 10 files, each 50,000 lines long) to have a play, but if you want to have a hack there are sites out there – like this and this – which will let you hunt and peck around.

Two caveats: it’s not filtered in any way. And also, you don’t know if the person who’s providing the site isn’t watching what *you’re* searching for. Bear that in mind and knock yourself out.

* And here’s why I’m failing. 10 files, yes? I load one into a text editor, because the MySQL GUI program I’ve got keeps crashing when I try to load the file. After a long time, the file loads.
Ask Applescript: set thecount to {count lines, count words}.
A long pause. And then the answer:
{3640128, 37297191}.
In other words, one-tenth of these searches comprises 3.64 million records, with more than 37 million words. Multiply by 10…. Perhaps I should let the sites out there handle it instead..

Update: fixed links. Also, you should go and read Andrew Brown’s piece on the AOL data in the Guardian. I can’t mention it often enough, really.

Your search for “information farmer” yielded no results but your own, you egotist

I’ve realised what my job is now. I was looking at an interesting web page, thinking “now who can I get to write about this….”, when it occurred to me that what I do so much of the time is to roam around, trying to find things and then seeding them by putting the idea of writing about them in peoples’ heads and offering them money to write them up. Then the text gets harvested and put on show in the paper or online, and people pay us.

It’s information farming. Have to say it’s a lot less physically strenuous than the kind one would have been doing 100 years ago (see Bobbie’s interesting musing on the “what if you were born 100 years ago?” meme), though sometimes a bit stressful.

Interestingly, the search for “information farmer” does – once you get past the results saying “information, farmers…”, yield this page from Clickz – dating back to 2003.

It seems that the “information farmer” meme has not thrived, and instead has fallen on stony ground. Well, maybe this could be some water for it. As the Clickz writer, Rudy Grahn (and hasn’t his career taken off – why, it’s as if he’d released a version of Spirit In The Sky or something), notes,

What’s missing is someone, or something, to function as an information farmer. We need an agent who will transform the verbs of the Internet from “search” and “find” to “have” and “consume,” an agent who fills the pantry with plenty of what we need. I want the information farmer to save us from mounting a search expedition every time we need sustenance.

Like Ward, I believe we’re destined to have a world of information by manufacture. The rub is I also believe we’re destined to pay for it.

On that basis is Google the Tesco of the information world, squeezing the amount the producers can charge for their goods relentlessly while gaining more and more market share?

New battery, same old lifestyle

I just want you to know that I’m taking my life into my hands writing this. Yes, my laptop battery is one of the 1.8 million covered by the Apple “not our fault, honest, blame Sony” recall. (Odd how it’s Dell’s fault when they recall, but Sony’s fault when Apple recalls them.) At any moment the whole thing could burst into flames, like something out of Bleak House. (Isn’t that the one?)

I think actually it works out to a good deal for folk like me. The battery is about 20 months old and has been doing sterling work as I go up and down on trains, so that the life is down to two hours from having been discharged/recharged. Now I’ll get a new one, which will be in the prime of life and – one hopes – won’t burst into flames at an inopportune moment. There may be an opportune moment for a laptop battery to burst into flames (when it’s being used by a spy?), but I’d rather not be around for it.

In the meantime I’m reading Applepeels, the blog of an ex-Apple sales employee who was high up on the government side. Verrrry interesting.

The deaf, the Deaf and the hearing

An update: baby3 is doing well with his cochlear implant. He turns to his name; when outside if he hears the sound of an aircraft going by, he stops before he can see it and scans the skies, then makes the ‘airplane’ sign. He likes toys that only makes noise (rather than which also have lights to indicate when you’ve pressed something). He says “Mo” for ‘more’, and “Ma” for his mum, sometimes even “ma-ma”.

It’s a lot of progress for a child to make having been pretty much without hearing for his first 17 months; even now he’s only had “access to sound” (as it’s called) for 6 to 8 weeks.

Which is why it’s sort of depressing, though not surprising, to get comments like John’s. (And here’s John’s blog.) We didn’t take the decision to put baby3 through major surgery lightly. Nor did the support team; they really did evaluate whether he wouldn’t be better off with hearing aids very carefully. In the end, it’s truly about whether we think his life will be better with access to sound – we can even call it ‘hearing’ – or without. Profoundly deaf people are more prone to depression; but John is promulgating a point of view that, while valid, still strikes me as Luddite. baby3 has sign language; he uses it, we use it, but it’s not sufficient communication for us or him, because we don’t have the years it would take to learn it sufficiently to say what we need to say to him. We have two other children, and jobs, and lives to lead.

That’s not to say that the CI is the path of least resistance, to fit our domestic needs. It’s our choice of best future – for him. And every parent makes those choices, consciously (where should we live?) or unconsciously (what do you do when your child misbehaves?).

It’s equally uplifting though to see comments like Ivan’s, (and here’s Ivan’s blog) who has just received a CI. He gives an interesting insight; we only wish that baby3 could express more of what he’s going through, so we could help him more.

(BTW, one other point, people: if you post a comment and it doesn’t appear, it has been spam-trapped, almost certainly because the post you commented on is getting cobwebby. You should be getting a captcha to fill in, as a last chance to prove you’re a human, not a spambot. If you don’t, then try turning on Javascript or turning off popup blockers (no ads on my site) – then you’ll get the frame with the captcha to fill in. Entering the same comment won’t have any effect – it’ll get spam-trapped too.)

Computers that write news: imagine the paper of the future.. written by Google

At the Johnson King blog, Andrew Chatterton writes:

The recent news of a US business information outfit replacing some of the tasks done by its journalists with computers will undoubtedly send shock waves across newsrooms throughout the country. Financial journalists are first in the ‘firing line’ as new software can turn around an earnings story within 0.3 seconds of a company making its results public!

This will surely be a threat to all journalists, not just financial hacks. As software intelligence increases, it’s feasible that any type of press release could be turned into an article before you can say ‘copy and paste’.

Indeed could we see chief executives briefing some sophisticated software and a laptop over lunch at Claridges? Newsrooms filled only with PCs and one techy to see to the needs of these next generation journalists? Or maybe even software that has the ability to scan blogs and automatically turn them into news stories? One for Charles Arthur to ponder on perhaps…

Well, having been passed the baton.. I’m sure I’ve mentioned this, and that I did at the talk to Fullrun recently. It’s already trivial to imagine a computer-generated newspaper. Get Google News, capture the most common headlines, create a summary story from the stories that appear most often, categorised by politics/sport/science/technology/medicine/celebrities/wouldyoubelieveit!/heartwarming/reviews.

You could even generate different versions by twiddling the knobs – more celebs, less politics gives you the downmarket version, and so on. Then print it.

To be honest, the first time I saw Metro (the daily morning freesheet in London, though also in other UK cities, having come here like the Vikings from Scandinavia), I thought it was computer-generated, or at least computer-chosen. That was a couple of years ago; since then it’s got a little more personality, but not so much you’d spot it. And I think it would be tough to say what its political stance was, though its parentage (Associated Newspaper, ergo a sibling of the Daily Mail) does show up from time to time – such as today’s splash (front-page lead), headlined “Shoplifters to be spared jail”, which is factually incorrect as it’s reporting a recommendation that shoplifters not be sent to jail.

Two things, though. Obviously, you don’t want the Google Newspaper to be on the web. Else it would index itself, which would lead to recursion. Second, if it was successful, it would have no journalists, yet rely on journalism. A paradox, of sorts, though the existence of the Press Association means that such things as local papers with national reporting has solved that one for decades.

The Google Newspaper: is it evitable?

Bad Pitch blog explains how not to make bad pitches to journalists

Been a while since I mentioned the BPB. It’s still going strong, though (obviously; it’s not like it’s going to run out of source material in a hurry..)

A very good post though which backs up – completely independently – many of the things that I spoke to the Fullrun audience a few weeks back is up, called 10 Reporter Hacks. That’s “hacks” as in “ways to break in” (like computer hacking, yes?).

Headings include “All Hail Google [News]”, “Social Study”, “LinkedIn”, “RSS-s-s”, “Step Away From the Computer”, “Analyze This”, “Get Interpersonal”, “Source File” and a few others. Read it – you’ve got the time. It’s a short-cut guide for anyone just starting out in PR, or anyone who’s forgotten because of client demands what those strange “journalist” things on the ends of phones are really like outside the zoo.

The %! curse of overusing percentages

If you saw a race report and it said that X ran 100% faster than Y, would that instantly say to you that X ran the race in (say) 5 seconds, and that Y took 10 seconds?

How about if something about building B said it was 200% taller than building A? Would you instantly realise that building B is three times higher than A?

I don’t think so. Percentages are an odd beast: a piece of precision mathematics that get routinely misused. Often, people use them because they think that they sound scientific, and precise, but the reality – as with the buildings and running example – is that using percentages (if you use them properly) can downplay the impact of the raw numbers.

Things get even more confusing when you get figures like “A did 250% more business than last year”. What on earth does that mean? It should mean that A this year did 3.5 times more than last year. But often it doesn’t – people see “2.5x greater” and stick that into a percentage.

My own thoughts on percentages: if they not less than 100 (or precisely equal to it), then leave them alone and quote the actual factor of improvement.

Things get worse of course when you want to talk about decrement. If you had 100 and now you’ve got 1, how much have things got worse? 99%, you murmur, and you’re right. That’s OK when you’re talking about the price of a share. But if you’re talking about people, it works better just to say “Last year it had 1,000 employees; this year it has 10.” It’s stark. It’s accurate. And it says it so much better than the glib “Staff were reduced by 99% in the fiscal year.”

What am I saying? That using percentages greater than 100 leads – sometimes intentionally, sometimes accidentally – to obfuscation. It’s not clear to the reader.

And to return to the example from the top, what would you understand by “a 20% faster bootup time”? It means that rather than waiting (say) 10 seconds, you wait 8 seconds. (Of course if you wanted to run it to expand the number, you’d say that the bootup time used to be 25% slower. Wow, that sounds huge!) It’s still better to give the reader the numbers, I think. 10 seconds, 8 seconds. OK, quicker. The question becomes then, what do you do with those two seconds?

Of course, it might be 100 seconds vs 80 seconds. Even so, you know that the extra 20 seconds “gained” are going to be spent emptying spam from your mail folder..

Time for Gina Ford to meet a computer keyboard, I think

OK, up front: we’ve got three children and the second and third we brought up using Gina Ford’s methods, which essentially mean working to a schedule built around the baby’s natural sleep and waking rhythms (once summed up by Jenny Colgan as “7.00am: you wake baby. 7.02am: baby poos. 7.12am: you poo.” Jenny doesn’t follow la Ford’s methods..). For us, it made life eversomuch easier.

Which is why the Gina Ford-versus-Mumsnet thing is so wrong. You can read the Mumsnet discussion of it, and find yourself shaking your head. It would be so easy to fix, especially because Ford has – as the Mumsnet people say – supporters among the posters there.

All it takes is for Ford to do one or both of the following: (1) ignore all the negative postings: they’re done by people who aren’t going to be her supporters or customers at any time; (2) wait for it all to subside a bit and then post.

I know, that would all be a bit late-20th century – even early 21st – as a marketing strategy.

But I bet it’s cheaper than the lawyers. Good speech tends to drive out bad speech. That’s the lesson I keep drawing from things like this.

As the Mumsnet folks write,

Quite apart from the fact that Ms Ford’s legal moves now threaten our very existence, we think this case raises broader issues which anyone who cares about freedom of speech should worry about. Some of these relate to British libel law and how it applies to bulletin boards.

Yes; basically it’s the reality that people in forums talk like they’re in a pub but lawyers treat it as though it’s printed in a newspaper.

Oh, and this bit piles Pelion upon Ossa:

Mumsnet has sought to meet and resolve the concerns expressed by Ms Ford. Mumsnet has offered to mediate any dispute she has – both offers have been declined. We sincerely wish to avoid a personal battle with Ms Ford, a figure who, as we have made clear, is widely respected both by us and many of our members. (Quite apart from the fact that we can think of much better things to spend our very limited resources on, Ms Ford’s first legal complaint arrived just days after both Mumsnet founders gave birth and the dispute has already devoured any hope of maternity leave.)

Yeees, I would think that might be troubling. (Bright thought) Hey! Get Gina to look after the kids while you work on the legal response.

You can read the Gina Ford response on Mumsnet:

As I have repeatedly made clear to Mumsnet, I have no objection whatsoever to people discussing or disagreeing with my advice and methods concerning childcare. What has caused me so much upset has been the defamatory campaign waged against me as a person in which I have been described in the most vile and disgusting terms.

Mm, got to say, Gina – it comes with the territory. People get stuff wrong all over the place, and then don’t bother to update it (well, look at the comments – he updated it). Anyhow, people do get stuff wrong online then don’t bother to update it.

And the original poster even put an apology, which contains what must be one of the most ironic but must not be read as ironic passage:

She said: “I apologise profusely to any childcare guru that I may have offended by suggesting that they are involved in military action in Lebanon and her followers for suggesting that she/they strap their babies to weapons of mass destruction. I have read her book many times and I can confirm that this IS NOT suggested as part of any childcare guru’s recommended routine.

(Well, Ford’s done more than one book, but anyway.)

And you can read another version by the Mumsnet people (wow, those are some effective mothers) at the Guardian commentisfree site:

Ms Ford has used a pneumatic drill to crack a sesame seed. Instead of requesting the removal of offending posts, she has demanded the deletion of whole threads containing hundreds of voices; her lawyers have been consistently bullying and patronising; and even when we acceded to their every request (including introducing a new monitoring regime specially for posts relating to Ms Ford), they insisted they would go to court to seek damages and costs against us.

I guess the really interesting thing is that Gina Ford’s methods generate such strong feelings. Unsurprising in one way, because babies create such strong feelings. But why do people think that Ford’s method is about bending the baby to your will: it’s rather the opposite – you get bent around the baby, because one thing that’s sure is that babies all work pretty much the same..

Gmail is broken. Or its POP forwarding is.

Beats me, but since yesterday (around 5pm) Gmail has given up forwarding email by POP to me. I can see it if I go to the browser: tons of new mail. But it’s not being collected by Mail. It spins and comes back – nope, nothing there.

Annoying. And there’s no easy way to find out where the error lies either.

Anyone else troubled by this?

When Russell Brand met Keith Richards.. in full

Here’s the full text of the interview as given in the Observer Music Monthly of the meeting – trailed with pictures rather extensively beforehand – between comedian-columnist-host-of-the-moment Russell Brand and gerontocracy guitarist Keith Richards.

RB: Alright mate, I’m Russell.
KR: Hey man, I’m Keith.
RB: You look ever so well, particularly after what happened.
KR: That was nothing. (Pause.) You’re a DJ?
RB: I’m a comedian, Keith.
KR: Hey, me too.
Photographer: Can I photograph you [two] with the guitars?
KR: Be easier with a camera.
(Photographer takes pictures.)
(PR person arrives to take KR away to something else pre-gig.)
KR: See you later, man. Gotta go press some flesh. Enjoy the show.
RB: Bye, Keith. Good luck with the gig. And the flesh-pressing.

Yup, it’s time for journalists to look to their laurels, I guess. Although many of us would quail at the idea that we have to generate 2,000 words from such a huge interview. How on earth do you pare it down?