computer science miscellanea

I should probably mention….

…that I again number myself among the ranks of those receiving regular remuneration in exchange for the sweat of their brows.  You’d think that as a unemployed layabout I’d have been writing blog posts like mad (and perhaps a novel or two), but it just didn’t seem to work out that way.

The most striking thing about the experience was how mentally exhausting it was.   I kept up my spirits though application of the hot irons of optimism, but by the end of January the whole thing was starting to wear a bit thin.

Now, two months into the new position, I’m finally beginning to emerge from the cloud of blue devils into something resembling my usual snarky, obsessive persona.  I’m back at the same company that laid me off (with a satisfyingly ironic bump in grade), in the same cubicle, with the same accretion of technological paraphernalia that had built up around me over the previous decade.  The work, at least, is somewhat different, and should keep me interested for the next few years.

My Ph.D. studies are starting to pick up in pace, as well.  I opted for a machine learning class this quarter, as it’s highly relevant to my intended areas of research.  This will mean that I finally have to get around to developing some better statistical chops, but to my surprise I also have to dust off my embarrassingly rusty calculus (“You’re computer scientists,” said the professor, “of course you hate calculus.”).

academia computer science reflections

Magister Scientiae

Ramon Lull’s Arbor Scientiae Somewhat to my surprise, I find that I have finished my M.S. and the degree will be awarded in six days. At last, I can be addressed as Wohlgelehrter Herr Magister! My calculations had led me to believe that I had two more quarters before completion, but I certainly wasn’t going to argue when the department sent me a graduation notice.  The bureaucrats have been propitiated, so it’s all over but the ceremony.

Next stop: doctorate. I’ve officially transferred into the Ph.D. program. Until Friday I could still say that I was still on the fence, but iacta alea est. I’d assumed that switching tracks wouldn’t be a particularly big deal, but I find that it’s a significant mental and emotional shift.

There have certainly been research elements to the master’s program, but at its heart it is structured around classes. Until now, it’s been picking the most interesting items off of a menu and running with them. From here on out, I’m building my own curriculum, and research is central.

It really is like starting out fresh, but from a higher level of sophistication. The professor who has been acting as my advisor is leaving, so I’ll need to establish a new relationship, one that will be the best match for my research interests.

Research interests! That impending choice has certainly been haunting my sleep for the past few months. To my generalist instincts, it feels a bit like selecting the color of my straightjacket. There are too many interesting pathways to select just one! Still, this is the way that the game is played, and I have some ideas. Once they’ve coalesced, expect some discussion in this forum.

I’ll have a brief, much-needed respite before I pick things up again for the summer term. Of course, work continues—it will be a bit of a dance balancing office and academia, but I’ve survived this long. What’s three or four more years?

books computer science miscellanea

Calliope and the Spambot

Calliope, Muse of Epic Poetry Randomly-generated spam email can have a certain “found art” quality to it. I’ve seen plenty of articles over the past few years gleefully musing over some chance juxtaposition in the inbox. See, for example, this article from The Register. A sample:

If you get it overnight, you can lose it just as quick
When Mumma dead family done.
Take heed of reconciled enemies and of meat twice boiled

The algorithms that generate these messages are quite simple, for the most part. The most common is the Markov chain.  A program of this type first takes a corpus of text and analyzes it to generate a table of probabilities that a given word follows another. To create a first-order Markov chain based on words in the corpus, the program repeatedly asks and answers the following question: given a certain word, what are the most likely words to follow it in the source text? It then randomly picks one of those following words, weighting its choice by the calculated probabilities.  After that, it picks the next word using the word it just generated as the base. A second-order chain bases its probabilities on the previous two words, and so on. Increasing the order of the chain can produce more authentic-seeming phrases.

One of the most common methods of content-filtering spam is Bayesian analysis, which uses a related algorithm to analyze the probability that a particular message is spam, based on the frequency of words in other messages already received and identified. If you are a spammer, the care and feeding of your spambot, your bulk mailer, is matter of great concern. You need to produce messages that have enough randomness to slip through recipients’ spam filters, but that look like they could be a valid messages. Project Gutenberg was an early source of texts for these Markov text generators, resulting in bathetic, surprisingly pseudo-literary nonsense.

I received the following message this morning, the text of which I reproduce in its entirety:

Summer bees were saying
That desire has ever built, have approached
How can they get the point of how a world
Pallid waste where no radiant fathomers,
From there. Toward . . .
demonstrating their talent for comedy?stroke
Glimmering of light:
Rise, to the muffled chime of churchbell choir.
Reshaping magnified, each risen flake
XVII. Greenland
Silent patch of ultimate paint. You are
marked with a dark stroke from the left, encroached
A matter of getting all that right . . .
What I have in my hands, these flowers, these shadows,
Come, swallows, it’s good-bye.
Place of absorbing snow, itself to be
With a hand freed from weight,
Is the moon to grow
Suddenly, in a savage, dreadful bend,

With minor editing (particularly the punctuation), this could almost be passed of as something from a modern poetry review . . . and here’s why: rather than generate its text word-by-word, the bulk mailer worked line-by-line from actual poems. (The line “XVII. Greenland” is a good clue.) A bit of Googling revealed that most of these lines can be found on a particular page of poems about winter on the website of the University of Chicago Press. The unfortunate question-mark in line 6 is an em-dash on the source page.

PIPO: Poetry In; Poetry Out.

This recalls to mind one of my favorite pieces of randomly-generated text.  In 2004, a group of SFWA members set out to show that a company called “PublishAmerica” is not a “traditional” publisher (that is, that they do not engage in any sort of editorial quality-control over their books).  To this end, this group produced a very good candidate for the worst novel ever written: Atlanta Nights.  Each chapter was written by a different person to be as terrible as possible.  Chapter 34 was actually machine-generated using the rest of the book as the source material.  The pseudonymous author-of-record, “Travis Tea”, now has his own web site.

[Image of Calliope, Muse of epic poetry, courtesy of Wikipedia.]

blogging computer science zenoli

This post is about meta-blogging

Having made the decision to pursue a doctorate, I’m boning up in preparation for the qualifying examinations. For the past several weeks I’ve been marinating my brain in computer science, tenderizing it by bashing it repeatedly with a large stack of textbooks. Three more weeks, pass or fail, and I’ll finally have a chance to emerge from this haze of NP-completeness and routing algorithms.

I strongly disapprove of excessive meta-blogging and the cringing, pathetic whinging that often accompanies a shift in a blogger’s posting schedule. A good blog should be about something besides itself. Nonetheless, I don’t think I’ve set expectations for my own schedule, which would seem to be a reasonable courtesy. (As a further courtesy, any further inclinations toward meta-blogging will be sternly quashed and allow to emerge only once a quarter.)

My general goal for Zenoli has been to make on average two to three posts per week, with a minimum of one and a maximum of four. I expect to continue with that schedule for the rest of 2007, though until late April the frequency will be at the low end of that range. I may toss up a few shorter bits that are more timely, and probably shouldn’t moulder in my queue.

Thus far, writing this blog has been quite a curious experiment. I’ve enjoy developing these small essays, and watching my own approach toward my writing change. Most significantly, thanks to this site I have corresponded a bit with some very interesting people; a sincere thank you to everyone who has written and commented. I’d also like to thank those whose exceptional examples of thoughtful blogging serve as both a moveable feast for the intellect and a spur toward the heights.

computer science mathematics mental exercises

Quickly Convert Binary to Decimal in Your Head

[Update, April 13, 2007: Thanks to Herr Ziffer for catching a confusing typographical error.]

I can’t believe I’d never seen (or figured out) this quick method for converting a binary number to a decimal number in your head. All you need to be able to do is double numbers and occasionally add one.

  1. Start at the first ‘1’ on the left, and start with the mental number one
  2. Move one digit right. If that digit is a zero, multiply your mental number by two. If it is a one, multiply your mental number by two and add one.
  3. Repeat step 2 for every digit of the binary number

Here’s an example. We’ll use the binary number 1101010 1011010:

  • 1011010 – We start at the first one. Our mental total: 1
  • 1011010 – Next digit is a zero; we double our mental number: 1 x 2 = 2.
  • 1011010 – Next digit is a one; we double our mental number and add one: 2 x 2 + 1 = 5
  • 1011010 – Another one; double and add one: 5 x 2 + 1 = 11
  • 1011010 – Zero; double: 11 x 2 = 22
  • 1011010 – One; double and add one: 22 x 2 + 1 = 45
  • 1011010 – And finally a zero; double: 45 x 2 = 90

The rest of this post is a little more technical, so if you glazed over when reading the above, it now may be time to soothe your tired mind.

Discrete finitite automaton to identify binary numbers divisible by threeI happened across this trick while contemplating a three-state discrete finite automaton that identifies binary numbers divisible by three. The automaton starts in state 0, and like the above procedure starts at the left side of the number. The number of the state can be thought of as the remainder of the number as read so far, mod 3. Every time a zero or a one is read, the automaton follows the arrow with that label from its current state. If it ends in state 0, the number is evenly divisible by three. Once I understood why the DFA actually works, the mental calculation became glaringly obvious.

For even more fun, the regular expression (0*(1(01*0)*1)*)* will also match binary numbers divisible by three.

Exciting! Now you have something to talk about the next time you go to a cocktail party.

books computer science reflections

Italo Calvino on Computer Science

Railroad Bridge in Coatesville, PA(Never fear, part 2 of “Towards a New Salon” is in preparation, and should be posted soon.)

Wishing to refresh my memories of its contents, I have been trying to locate my copy of Six Memos for the Next Millennium—without, alas, success. The vagrant book yet wanders.

With a bit of wandering on my own, I managed to scrape up a copy for immediate reference. The public library system in my county has eighteen libraries. While you can request a volume from another library, I typically prefer to visit them in person. Often, the smallest locations will have the only copy of some surprisingly obscure volume. Over the years, I’ve found an excuse to visit thirteen of them. That number has now increased by one, as the library computer system indicated that the Coatesville library had the only copy of Memos.

Coatesville is disheveled, depressed city, wounded by the withering of the steel industry; the downtown looks beaten and nearly abandoned. The atmosphere of the library is close and depressing, almost dank. I was happy to locate the Calvino and depart.

I had never driven Route 82 before, and as I headed out of the city to the north I was struck by the massive stone arches of the railroad bridge. I’ll have to head back with my camera one of these days. (I did find a lovely photo of the bridge’s reflection on flickr, which illustrates this post.)

I’d forgotten how much I enjoyed Memos the first time I read it. This time, I was quite surprised to come across a passage about computer science in the essay on “Lightness.”

I look to science to nourish my visions in which all heaviness disappears. Today every branch of science seems intent on demonstrating that the world is supported by the most minute entities, such as the messages of DNA, the impulses of neurons, and quarks, and neutrinos wandering through space since the beginning of time . . . . .

Then we have computer science. It is true that software cannot exercise its powers of lightness except through the weight of hardware. But it is software that gives the orders, acting on the outside world and on machines that exist only as functions of software and evolve so that they can work out ever more complex programs. The second industrial revolution, unlike the first, does not present us with such crushing images as rolling mills and molten steel, but with “bits” in a flow of information traveling along circuits in the form of electronic impulses. The iron machines still exist, but they obey the orders of weightless bits.

And so the library’s computer system led me to a city vitiated by the departure of the steel industry.

(Photo of the Coatesville railroad bridge courtesy Cyber Insekt.)

computer science

Two Stanzas on Euclid's GCD

Here’s an double tanka on Euclid’s algorithm (for finding the greatest common divisor of two numbers), executable as a Scheme program. My notes indicate that I wrote it during a tedious teleconference.

(define gcd
 (lambda (m n) (if (zero?
  n) m (gcd
   n (remainder m n))))) ; Look!
    ; Falling leaves unveil the tree.

      ; Two integers stand.
       ; They divide, m's ghost remains.
        ; As autumn passes,
         ; n succeeds m, floating down.
          ; Zero reveals the great truth.
computer science

What is Computer Science, anyway?

There’s a predictable kerfluffle of an exchange on Slashdot over the state of computer science education, prompted by yet another article declaring the death of computer science. Per normal, the Slashdot discussion is fairly equal parts tedious, inane, obvious, and shrill.

Much of it boiled down to rehashing the differences between computer science (too much math, according to the Real-World Programmers, not enough according to the electrical engineers), programming (vocational training for those who can’t cut it in computer science, according to the computer scientists), electrical engineering (for those who can’t understand or appreciate algorithmic analysis, again according to the computer scientists), and computer engineers (nobody seems to be quite sure where they fit).

Instead of leaping into the fray, I’ll just mention my favorite quote about what comp sci is not. Edsgar Dijkstra is said to have said,

Computer science is no more about computers than astronomy is about telescopes.

There are fascinating classes of ideas out there that we couldn’t start to discuss until computers had been invented . . . computer science is the exploration of those new frontiers.

(Scan from How it Works: The Computer is courtesy of gillicious.)

computer science sleep

. . . While Visions of Merkel Hash Trees Danced in Their Heads

One of the regular requirements of grad school is total immersion in a particular topic. During the most intense periods, the subject matter often finds its way into my dreams. While I won’t attempt to describe them in detail, I will just say that having dreams where dynamic Bayesian networks play a critical role is at least a bit disconcerting.

I’ve just finished powering through twenty papers on multicast security, and rather than blog further about it, it’s probably better simply to go to sleep and try not to dream about hash functions.