DateMonday 19 September 2005

Band rails against its own CD copy protection, find the stocks consensus, and The Onion – tomorrow’s news is last year’s satire!

  • Switchfoot band member apologises for copy protection on band’s CD
    They’re signed to Sony, which hasn’t – yet – blapped this post. But who are Switchfoot, anyway? Are they like Driveshaft, the non-existent band in Lost (who deserved to be nonexistent, judging by the quality of their “songs”)?
  • Consensus View
    The site where you can join coin-flippers all over the world! (Some are actually doing better than 50%. So far.) “When the lift boy says buy, it’s time to sell”, as one of the stock market veterans commented ahead of the 1929 crash. Still, probably a good place to find out which stocks to short. Or go long on. (Do people actually buy stocks any more?)
  • OK, compare and contrast. Spot the satire. No, go on.
    Fuck Everything, We’re Doing Five Blades
    The Gillette Mach3 was the razor to own. Then the other guy came out with a three-blade razor. Were we scared? Hell, no. Because we hit back with a little thing called the Mach3Turbo. That’s three blades and an aloe strip. For moisture. But you know what happened next? Shut up, I’m telling you what happened—the bastards went to four blades. Now we’re standing around with our cocks in our hands, selling three blades and a strip. Moisture or no, suddenly we’re the chumps. Well, fuck it. We’re going to five blades.

    From The Onion, “American’s finest news source”, in early 2004. Satire, right? But now: Gillette unveils 5-bladed razor with two lubricating strips – Sep. 14, 2005
    Five blades to chop your face off.. sorry, shave you closer… (Stole this link pairing from Holy Moly.) How frightening. Girls, you must be so jealous – still using one blade. Huh. Sorry, what is this “wax” stuff? And “depilatory cream”? It does what?? Next week: The Onion on New Orleans…?

The pain of music subscriptions, and is the Economist wrong about VoIP?

  • The Sounds of Silence
    Right now, looks to me like Microsoft and its music partners are floundering a bit. Apple gets so many basics right, while the Microsoft camp gets so many basics wrong. Is it strange then that iPod is so popular or that iTunes Music Store selling so much music? Basics, like simplicity, synchronization or selling a music lifestyle are among the many things Apple does right. Example: One of Windows Media Player’s default columns is “file path.” Exactly why would anyone want something like “c:\Documents and Settings\Joe User\My Music\My Yahoo! Music\Snow Patrol\07-0-Snow Patrol-Run.wma” using up 60 percent of the column space?

    Jupiter Research on the angst of Windows DRM vs iTunes. Anyone reading got a Napster or other subscription? Ever had problems with them? I’d love to know. (Seen at Microsoft Monitor)

  • Why Even The Economist Can Be Wrong
    So I’ll stick to the basics and re-iterate what everyone in the business knows, but reporters systematically forget about:

    • The PC is a big, hulking thing that you can’t carry around with you. In your pocket. Constantly on, for days at a time.
    • All PDAs are crap where it regards audio processing (the fact that mobile phones have dedicated, streamlined hardware for voice processing alone – which has been repeatedly optimized for over a decade – seems to be lost on most reporters).
    • A traditional telco’s biggest asset (besides its customer base and its technical know-how) is its infrastructure. Networks talk by interconnecting infrastructure, and as technology evolved, protocols have become the new infrastructure.

    Skype has no infrastructure of its own, and exists as a closed, parasitic entity atop other networks.
    And where am I going with this? Simple. You won’t be able to run Skype (or any of its competitors) in a cost-efficient way on your mobile phone anytime soon. Period.

    Interesting take on how Skype isn’t going to change the world; he reckons in five years we’ll have the same situation as with IM now. Unless – I’d counter – the network effect takes over. But that’s a big challenge. (Seen at The Tao of Mac)

  • The small iPod inspiration from the small room?
    Quoted

    “So … as I was sitting on the toilet this morning and I noticed the shiny white porcelain of the bathtub and the reflective chrome of the faucet on the wash basin … and then it hit me! Everybody perceives the iPod as ‘clean’ because it references bathroom materials!”

    A frog design consultant concludes that Apple design guru Jonathan Ives does his best thinking in the same room as everyone else.

    Hmm, they might be on to something here: the iPod’s white porcelain-like looks, the chrome.. it’s a bathroom appliance! And don’t forget, Jonathan Ive started out designing sinks.. (Seen at Good Morning Silicon Valley)

Using Applescript and NetNewsWire to find out how often to check feeds: a tale of struggle, woe, victory and a missing feature

(What follows is VERY long, and not much like comment, and is really more like programming. Actually, it is programming. But if you’re at all interested in getting your computer to work for you, then scripting is a good way to do it. Applescript in particular is powerful, because it can be used across many different applications. But obviously, it’s only on Apple machines. So you may want to just move on to the next post. Won’t blame you at all. Then again, if anyone does like this, let me know through the comments. There’s more where this came from.)

Unlike you and me, who might visit a website once or twice a day, or five or six times if we’re really obsessive, newsreaders tend to hammer servers. They’ll often visit them every 30 minutes, asking for any data that’s changed, even with sites that might not see a posting for days at a time.

Well, I’m here to tell you that you don’t have to. You can find out just how often sites update, and set the preferences accordingly – at least if you use NetNewsWire, which lets you hand-set how often a feed is checked. This relies on some fairly simple Applescripting (well, the concept is simple), lots of loops, and a bit of statistics.

Along the way we’ll also meet one infuriating bug and a missing feature that – we can only pray – will get put into a later version of NNW. But let me just say up here, NetNewsWire has an absolutely fantastic Applescript dictionary. If only Apple’s own applications were so useful.

First, how would we decide how often to check a site? Well, it would help to know how often the site updates. Who knows that? Well, you do, or your newsreader does. Every headline in every feed has either a “date published” (on the blog; ideal) or a “date arrived” (at your computer; less than ideal, but better than nothing), which can be accessed through Applescript.

What’s more, every feed also tells us the name of the blog it comes from. And depending how quickly you let posts expire from your feed, you should have enough data to get a fair idea of how often a blog updates. (I have mine set to keep stuff for three years. Most people don’t. Even so you can get useful info.)

Let’s sketch out how we’ll do this.

  • loop through the list of subscriptions. (We can count the number of subscriptions, and count the number of headlines within those subscriptions.) We want to ignore “groups” of feeds, and look at the feeds themselves; that’s OK, because NNW’s excellent scripting dictionary has a true/false test – is group – that you can use on a subscription to see if it’s a feed, or a group. We also want to ignore “smart” subscriptions, because those aren’t feeds in the normal sense. Happily, NNW can do that too: it has a true/false test called synthetic you can use on subscriptions.
  • for each subscription, loop through its headlines and find out what the interval is between them.

We can get the date published or date arrived of a feed: it’s
date published of headline i of subscription x
In Applescript, you can subtract dates from each other: the result is in seconds. (You can copy all the code below into Script Editor. It will need to be “wrapped in a tell block”, as scripters say: you have to preface the loose stuff with tell application "NetNewsWire" and suffix it with end tell. Also, has to be NNW – there’s no Applecript in NNW Lite, you cheapskate. Buy the whole thing!)

One gotcha to avoid here. Let’s say that the most recent post from the site we’re examining came in 4 hours ago. OK, that’s easy:
current date - (date published of headline 1 of subscription 20) (for example).
And the post before that? It was published 6 hours before that previous post. OK, so that’s 10 hours ago. So what’s the average time between posts? Let’s see, 4 hours and 10 hours, that’s 14 hours, divide by two (we’ve got two items), mean is 7 hours, right?

Wrong, of course. One post 4 hours ago, other post 6 hours before. The posts have a mean interval of 5 hours. So in order to make our average work, we should reset the “time since” counter – time1 – to the value of the date of the last post we’re looking from. That works.
So the rough loop we want is
set subcount to count subscriptions -- create a counter
repeat with asub from 1 to subcount -- increment the counter

tell subscription asub
set time2 to current date -- to start the process
-- do some check here to make sure it's not a group and not "synthetic"
set avtime to 0 -- the average time for the subscription to be updated
set time2 to current date -- this will be the one we use to subtract from the time of the post
set thecount to count headlines
repeat with i from 1 to thecount

set time1 to date published of headline i
set difftime to (time2 - time1)
set avtime to avtime + difftime
set time2 to time1 -- set the "current date" for averaging to the time of the most recent post we've looked at

end repeat -- of headlines
-- now work out the average
averagetime = difftime / thecount
end repeat -- of subscriptions

Another measure that’s useful is the standard deviation in times of posts. Some people post in bursts, others in a regular drip feed. The SD is the square root of the variance, defined as ((number of items * sum of the squares of the data) - (square of the sum of the data)) / (number of items squared).
To represent a “population”, the divisor should really be (number of items -1) * (number of items). I forget why, though I knew once.
A large SD (or variance) means the time of posting varies a lot. (This calculation produces some remarkable results, by the way.)

So to work that out we’re going to need an extra variable: diffsquared, which is just difftime * difftime.
Let’s also tidy up the beginning of the loop, where we find out if a subscription is a group or synthetic; remember that asub is a number we’re using to loop through the subscriptions.
if (synthetic of subscription asub is false) and (is group of subscription asub) is false then
.. then we’ll continue with our loop.

To capture the SD, we’ll have to square the time interval as we go along, and add that to a running total.
That’s easy: we’ll have a new variable diffsquared.
set diffsquared to 0
set time2 to current date -- this will be the one we use to subtract from the time of the post
set thecount to count headlines
repeat with i from 1 to thecount

set time1 to date published of headline i
set difftime to (time2 - time1)
set diffsquared to (difftime * difftime)
set avtime to avtime + difftime
set time2 to time1 -- set the "current date" for averaging to the time of the most recent post we've looked at

end repeat -- of headlines
-- now work out the average
averagetime = difftime / thecount
-- now work out the variance and standard deviation
set vartime to ((difftime * difftime * thecount) - diffsquared) / (thecount * thecount)
set sdtime to round ((vartime ^ 0.5) / 60)

OK, it’s going well, isn’t it? But this is where we run into a huge gotcha. And it really is massive, and befuddling. Remember where we want to find out when the headline was published? That’s the date published property of the headline. Ah, but some headlines don’t have that. Try the feed for Metafilter, for example. (If you’re not subscribed, its RSS feed is here).

In my aggregator, Metafilter happens to be subscription 36. What happened was that I would ask for date published of headline i and Script Editor (which runs Applescripts you’re testing) would puke, saying "The variable time1 is not defined.". Well, uh? Yes, I’d asked it to be the date published… oh, right. This must be a wacky feed without a date published. No problem – we’ll just ask for the date arrived. Let’s try that: date arrived of headline 1 of subscription 36: gave a date. Fine. And to get round the problem of date published not being there in some feeds, I’d use what’s called a “try block”: if something fails, you give Applescript an alternative which means the script can keep running. So my new bit of “catching code” became
try
set time1 to date published of headline i
on error
set time1 to date arrived of headline i
end try

That should do it, right? I often try such additions as micro-scripts in their own window. So I set up
tell application "NetNewsWire"
try
set time1 to date published of headline 1 of subscription 36
on error
set time1 to date arrived of headline 1 of subscription 36
end try
end tell
time1
— which will show you what has been recorded as the value of time1
(Unfortunately, if you’re playing along at home, there isn’t an easy way to let you find out what number subscription Metafilter is in your list. If you drag Metafilter to be the top subscription, that’ll be easy enough – it’ll be subscription 1.)

If you run this script, you get that same error message – “The variable time1 is not defined.” Huh? I thought we had a “try block” to catch that. But no, time1 is not set. And here’s where anyone without quite a lot of experience with Applescript would be banging their head against a wall, and have to give up. The clue is in those words “not defined”. It doesn’t mean that time1 hasn’t been set to a value. Well, it sort of does. It means that time1 is that most elusive of Applescript variable values: missing value. This is about as elusive as the Higgs boson, top quark and Lord Lucan combined. Missing value is bad news, because it doesn’t throw an error, yet you can’t extract any value from it.

Hmm. So what to do now? Perhaps test to see what class time1 has when it’s missing. Is it an integer, a date, something else? Anything that might be different from a not-missing value. It’s a desperate, last throw of the dice, but, well, you know, desperate times…
tell application "NetNewsWire"
set time1 to date published of headline 1 of subscription 36
class of time1
end tell
–> “The variable time1 is not defined.”

Sod it. Hmm, OK then: what if we collect the date published and date arrived, and test the first and use the second if the first doesn’t work?
tell application "NetNewsWire"
set time1 to {date published, date arrived} of headline 1 of subscription 36
class of item 1 of time1
end tell
–> “application”

ALL RIGHT! Even though the test has completely fallen through the floorboards – it’s saying that time1 is just a property of NetNewsWire – that’s OK! We can finally test for this sucker. Wait, let’s check: what’s the class of date arrived?
class of item 2 of time1 –> “date”
WOO HOO! OK, now we’re there. We’ll collect both the date published and date arrived, and test the first, and if the answer is application, use the second, because that works even with screwy feeds.

So, what’s left to do? Oh, yeah, not much point doing these calculations if we don’t find out which feeds they’re for. So, insert something to collect the name – called givenName – of the subscription. Then when the calculation loop for each feed is done, we can add the details – the feed name, the average posting delay, the standard deviation – to a list, which will be available to us at the end.
Here’s the script. I cut off the number of posts that get examined to calculate the mean to 50, but you could let it go more. The result is output as a big long (potentially enormously long list) to the “Results” pane of your Script Editor. Don’t go away yet, because there’s a kicker – oh, what a kicker – in the postscript.
set subdetails to {}
tell application "NetNewsWire"
set subcount to count subscriptions
repeat with asub from 1 to subcount

if (synthetic of subscription asub is false) and (is group of subscription asub) is false then

tell subscription asub
set avtime to 0
set diffsquared to 0
set time2 to current date
set thecount to count headlines
if thecount > 50 then set thecount to 50
-- latest fifty entries should give a good enough idea; could go to the top if really wanted
set thename to display name
repeat with i from 1 to thecount

set time1 to {date published, date arrived} of headline i
if class of item 1 of time1 is application then -- if it's a missing value; note no quote marks around 'application'
set time1 to date arrived of headline i
else
set time1 to date published of headline i
end if
set difftime to (time2 - time1)
set diffsquared to (difftime * difftime)
set avtime to avtime + difftime
set time2 to time1

end repeat
end tell
set averagetime to round (avtime / (thecount * 60))
set avhrs to round (averagetime / 60) rounding down
set avmins to (averagetime - (avhrs * 60))
set vartime to ((difftime * difftime * thecount) - diffsquared) / (thecount * thecount)
set sdtime to round ((vartime ^ 0.5) / 60)
--set sdhrs to round (sdtime / 60)
set avgmsg to (averagetime & " mins (" & avhrs & " hrs " & avmins & " mins); SD: " & sdtime & " mins") as string
set end of subdetails to {thename, avgmsg}
end if
end repeat
end tell
subdetails

That last line will show you the details of who posts how often. Mine, for example, gave things like
{{“ongoing”, “608 mins (10 hrs 8 mins); SD: 78 mins”}, {“Daring Fireball Linked List”, “339 mins (5 hrs 39 mins); SD: 16 mins”}, {“Mini-Microsoft”, “7600 mins (126 hrs 40 mins); SD: 4275 mins”}, {“Waxy.org Links”, “269 mins (4 hrs 29 mins); SD: 1 mins”}, {“MetaFilter”, “83 mins (1 hrs 23 mins); SD: 0 mins”}, {“kottke.org remaindered links”, “175 mins (2 hrs 55 mins); SD: 6 mins”}, {“Good Morning Silicon Valley”, “299 mins (4 hrs 59 mins); SD: 14 mins”}, {“Paul Thurrott’s Internet Nexus”, “608 mins (10 hrs 8 mins); SD: 8 mins”}, {“Technology Pundits”, “2974 mins (49 hrs 34 mins); SD: 4293 mins”}}

Yes, that’s right – waxy.org posts almost regular as clockwork every 4 hours; Metafilter is almost dead on the 83-minute mark; and Mini-Microsoft is all over the place. Ah, but then again, perhaps MetaFilter just appears to be regular, because of the “date arrived/date published” thing.

And the last flourish I wanted to do was to get the script to tinker with how often I’d check those feeds, in line with their posting regularity. Obviously, you’ll want to check them twice as often as the posting frequency (because you’re likely to be halfway through the delay); you might increase it a bit to allow for big standard deviations. So I looked through NetNewsWire’s scripting dictionary for the way to script that…

But it’s not there. There’s no way to change the feed’s updating except by hand. Waaaaaahhhhh! Brent, this is my feature request. Please. You know it makes sense.