Monday, March 29, 2010

There's a Lesson Here, but I Can't Remember What

Perhaps you are graced with a mind like a steel trap. I have always had a mind like a steel colander.

I frequently read stuff I wrote, and think, "That's clever. Too bad I can't remember ever knowing that, much less writing about it."

Just now, I was reading an RHCE-prep guide that was explaining pr. I thought, "pr? Geez. There's some ancient history. Next they'll be explaining FORTRAN line-printer-carriage-control codes."

This starts me reminiscing.

"There was an old, Software Tools filter, in RATFOR, that would interpret those codes, called asa (a reference to the American Standards Association, a progenitor of ANSI's). At some point, it was ported to C/Unix. I should see if it's on my Ubuntu desktop."
(As an aside, and before I forget to say it, Kernighan and Plauger's Software Tools is the best book ever written about software engineering.)
It's not there. I think, "Well, okay, I'll install it.

I try apt-cache search and don't find it. Rats.

I google for an Ubuntu version. Nothing. A Linux version? Nothing. Humph.

Well, surely it was in UNIX Version 7. I remember some work Tom Christiansen put in, collecting Perl implementations of old, V7 commands. Maybe he found an implementation of asa(1) that I can just port.

Except I can't remember what he called his collection. I go back to googling, this time for Tom's collection. After a bunch of failed tries, I finally get a hit. You guessed it: a column by Jeff Copeland and, um, me -- Software Ptools -- which I have no recollection of ever having written. How embarrassing.

I should have given up right there, while I was behind, but Noooo .... ("What would you pay? But wait! There's less!")

Had we provided the name of Tom's project? Sho 'nuff: "Perl Power Tools." Maddeningly, the link in our column has gone dead. The universe hates me and there's no beer in the fridge.

I scroll down, hoping for another link. Ooh! Look! There's code! We implement a V7 command, right there in the column, to contribute to PPT ourselves.

We implement asa(1)

Oh, ow.

(I've now learned that the entire Perl Power Tools project has been moved to the CPAN by Casey West.)

Sunday, March 14, 2010

Estimating: The Envelope, Please.

How much does it cost Amazon to ship me a Kindle book? About a nickel.

How much did it cost us to get letters saying we're going to get census forms? About $50 million.

How do I get these? Back-of-the-envelope calculations.

Back-of-the-envelope calculations are the quick calculations we do, from simple assumptions, to give us a sense of rough sizes. They may not let us tell whether the answer is 5 or 9, but they can let us see the answer isn't 5 billion -- a 5 followed by 9 zeros.

My sister, Jo, the Tattooed Lady, wondered out loud, this week, "... just how many millions of dollars it cost The US Commerce Dept (read 'us, the taxpayers'), to send everybody in the US a letter this week that says that they will be sending us a census report to fill out. 'Ooooooo. Look out!!!! Here it comes!!!' "

Let's do a back-of-the-envelope calculation. (No pun intended.) It's not hard.

How much does it cost to send a letter? A first-class stamp costs $0.44. The USPS loses money, which is why they want to cut back to 5-day-a-week delivery. So the real cost of processing and delivering a letter is something like $0.50. Could it be $0.30? Or $0.72? Maybe. But it's less than five bucks and more than a farthing.

What's the cost of producing each letter -- printing, stuffing, and so on? At Kinko's, they'd charge you somewhere between a nickel and a dime. Ditto for the public library. Real money, but we're still talking a total cost of around half a buck per letter.

They sent one to each household, and America has over 100 million of those.

We paid at least fifty million bucks for those letters. $50,000,000 . As Jo says, "Here it comes."

But what did we pay to draft the letter, translate it into a bunch of other languages, and get all that approved and processed through our Federal bureaucracy? Probably not even an extra fifty million.

Here's a second example: What sort of profit is Amazon making on Kindle books? I wondered this last year when I bought my Kindle.

Let's see .... Once they've payed the publisher for the book, they probably get a machine-readable version for next-to-nothing -- maybe free. Converting to the Kindle data format is probably done by a piece of software that they wrote once, and amortize across all their books, which means that probably doesn't contribute much either. Amazon's big cost is probably delivery -- what they pay Sprint to get it to us.

So how much is that? Hand me that envelope.

They'll sell me a subscription to a blog for about $2/month. The content is free if I have a browser, and I can't imagine they're trying to make a lot of money from these, either. The $2 is probably Amazon's delivery cost.

The kind of person who reads a blog on his Kindle is a junkie who, I'll guess, might read it three times a day. That's 30*3 = 90, or about a hundred deliveries a month: two cents a day. Books are bigger, but they come over so fast that I bet connection-set-up and -tear-down costs dominate the price.

Amazon's sells new releases at $9.99. This calculation says almost all of that is profit. Their delivery cost, I guessed, was under a nickel a copy.

How close did I come? In a January press release, Amazon revealed it was "less than six cents."

When the government can send us useless letters by Kindle (or email), they'll cost us far less.

"But what could the government do with its vast inventory of surplus envelopes?" the politicians will ask.

Two suggestions come to mind.

Thursday, March 11, 2010

Collatz Conjecture

I like this shell script, by Kyle Anderson.

I found out about it because Paul Hummer's created a Northern Colorado Linux Blog aggregator.

Thanks, Paul! And Kyle.

Tuesday, March 9, 2010

Generating Arbitrary Numbers

Sometimes, "arbitrary" and "random" aren't synonyms. Here's an example of how to generate the former without their being the latter.

One nice thing about knowing people who make me think is that it gives me things to post about. For example, Hal Pomeranz, Ed Skoudis, Tim Medin, and Paul Asadoorian have a weekly blog, called Command Line Kung Fu, that compares and contrasts command-line tricks for different operating systems.

I only every use Linux, so I read Hal's stuff and skip the Windows and DOS stuff. Even with this, every week or two Hal's post makes me think, "Wait! Here's something he didn't mention!" (typically because it's slightly off-topic).

In this week's column they generate random time intervals.

Here's Hal's punchline:
[...] in larger enterprises you might have hundreds or thousands of machines that all need to do the same task at a regular interval. Often this task involves accessing some central server-- grabbing a config file or downloading virus updates for example. If a thousand machines all hit the server at exactly the same moment, you've got a big problem. So staggering the start times of these jobs across your enterprise by introducing a random delay is helpful. You could create a little shell script that just sleeps for a random time and then introduce it at the front of all your cron jobs like so:

0 * * * * /usr/local/bin/randsleeper; /path/to/regular/cronjob
(The column sketches how to implement 'randsleeper'.)

Yep. This works fine.

But as it stands, the cronjob could kick off one job at 9:59, and the next one at 10:00. What if I want to spread my machines across the hour, but want each machine to use a fixed timeslot, so the elapsed time between runs is a full hour for any given machine?

Here's one way:
  1. Pick an arbitrary machine-specific number, like the IPV6 address, or the MAC address of the ethernet card,
  2. Convert it to an integer.
  3. Take it mod the time interval.
  4. Use that number for the time to start the job.
Here's code to do that, which, as always, I grow, bit-by-bit, on the command line, by getting a little piece right, recalling that piece, and adding another step.

  • Step 1:
Get a unique, but arbitrary, machine-specific identifier (the MAC address of the first NIC).
$ ifconfig | awk '/HWaddr/ {print $NF; exit 0}'
00:1e:c9:3d:c0:0c
  • Step 2:
Strip the colons
$ ifconfig | awk '/HWaddr/ {print $NF; exit 0}' | sed 's/://g'
001ec93dc00c
And interpret the result as a hex number. (The shell requires hex numbers begin with "0x", so I'll just tack that on.)
$ echo $(( 0x$(ifconfig | awk '/HWaddr/ {print $NF; exit 0}' | sed 's/://g') ))
132225286156
  • Step 3:
Mod it by the number of seconds in an hour, to get an arbitrary second.
$ echo $(( 0x$(ifconfig | awk '/HWaddr/ {print $NF; exit 0}' | sed 's/://g') % (60*60) ))
556
  • Step 4:
Always sleep until that many seconds after the hour, then kick off the job.
$ crontab -l > Cronjobs
$ echo "0 * * * * sleep \$(( 0x\$(ifconfig | awk '/HWaddr/ {print \$NF; exit 0}' | sed 's/://g') % (60*60) )); /path/to/regular/cronjob" >> Cronjobs
$ crontab Cronjobs
Ta-da.

(For step four, I'd probably actually kick off cron -e and paste the line in; otherwise there are just too many ugly backslashes to get wrong.)

Warning: This will not work if your machines' MAC addresses cluster around the same value, mod 556. :-)

Tuesday, March 2, 2010

Better Safe Than Sorry: Writing Code that Writes Safer Code

I write code that writes code. A lot. On the command line. It's safer.

Hal Pomeranz and co-conspirators have another fine post up about command-line programming. In it, they write a clever loop to rename a list of numbered attachments.

Here's Hal's code:

$ cat id-to-filename.txt | while read id file; do mv attachment.$id "$file"; done
(His input file is a two-column list, like this:
$ cat id-to-filename.txt
...
43567 sekrit plans.doc
44211 pizza-costs.xls
...
And, actually, Hal takes the list from stdin, with a less-than sign. Blogger whines and eats my posts when I use those -- it thinks I'm opening an unclosed HTML tag. What a pain.)

The quotes are there because without them the code tries to do this:

mv attachment.43567 sekrit plans.doc
which gets the mysterious message back
mv: target `plans.doc' is not a directory
$
Uh-oh.

When this happens, I usually don't know what the message means. Figuring it out eats time. Plus, with my luck, some files have been moved but others haven't. Recovering from that eats even more time.

Here's what I type instead:
  • First step: I write code that says what I'd like to do.
$ cat id-to-filename.txt | while read id file; do echo "mv attachment.$id $file"; done

...
mv attachment.43567 sekrit plans.doc
mv attachment.44211 pizza-costs.xls
...

Often, when I do this, I scan the output, notice something's going to go wrong, and fix it.

"Oh. Oops. I need quotes. I'm an idiot."

Note that no files were moved; my code's only echoing commands.
  • Next step: I recall my command-line, with an up-arrow, and add fixes. I keep doing that until the commands I see are the ones I actually want.
$ cat id-to-filename.txt | while read id file; do echo "mv attachment.$id '$file' "; done

...
mv attachment.43567 'sekrit plans.doc'
mv attachment.44211 'pizza-costs.xls'
...
Look okay? Yep.
  • Last step: I recall the previous command, one final time, and pipe it to a subshell, which executes the commands my code writes.
$ cat id-to-filename.txt | while read id file; do echo "mv attachment.$id '$file' "; done | bash

$
When I'm nervous about what I'm doing, I even try out the first line by itself, like this:
$ cat id-to-filename.txt | while read id file; do echo "mv attachment.$id '$file' "; done | head -1 | bash

$
I check the result, and if I've done the right thing I go ahead and run the rest.
$ cat id-to-filename.txt | while read id file; do echo "mv attachment.$id '$file' "; done | sed 1d | bash

$
"Never write code on the command line when you can write code that writes code on the command line," I always say.

Monday, March 1, 2010

NFS Made Easier

Automounting disks is magic. Autodetecting what to automount is magic-er.

Last weekend I set up my pogoplug as an NFS server, and installed and configured autofs to look for specific directories on the pogoplug. This weekend, I revisited that configuration and learned I was working too hard.

In /etc/auto.master, the entry "/net -hosts" says, "When I type 'ls /net/foo', do these steps:"
  1. look for a host named 'foo.com' ,
  2. ask foo.com what filesystems it's exporting
  3. mount them under /net/foo.com
  4. now do the 'ls'
No modifying /etc/auto.misc every time you want to automount a new machine: the machine just appears. There is an /etc/auto.net, but it's a script that autofs uses to ask a remote host what it's exporting.

Using remote filesystems could be even easier and more transparent (I could, for example, imagine having upstart manage the whole process, and having autofs be installed by default) but not much.