Randomly-generated spam email can have a certain “found art” quality to it. I’ve seen plenty of articles over the past few years gleefully musing over some chance juxtaposition in the inbox. See, for example, this article from The Register. A sample:
If you get it overnight, you can lose it just as quick
When Mumma dead family done.
Take heed of reconciled enemies and of meat twice boiled
The algorithms that generate these messages are quite simple, for the most part. The most common is the Markov chain. A program of this type first takes a corpus of text and analyzes it to generate a table of probabilities that a given word follows another. To create a first-order Markov chain based on words in the corpus, the program repeatedly asks and answers the following question: given a certain word, what are the most likely words to follow it in the source text? It then randomly picks one of those following words, weighting its choice by the calculated probabilities. After that, it picks the next word using the word it just generated as the base. A second-order chain bases its probabilities on the previous two words, and so on. Increasing the order of the chain can produce more authentic-seeming phrases.
One of the most common methods of content-filtering spam is Bayesian analysis, which uses a related algorithm to analyze the probability that a particular message is spam, based on the frequency of words in other messages already received and identified. If you are a spammer, the care and feeding of your spambot, your bulk mailer, is matter of great concern. You need to produce messages that have enough randomness to slip through recipients’ spam filters, but that look like they could be a valid messages. Project Gutenberg was an early source of texts for these Markov text generators, resulting in bathetic, surprisingly pseudo-literary nonsense.
I received the following message this morning, the text of which I reproduce in its entirety:
Summer bees were saying
That desire has ever built, have approached
How can they get the point of how a world
Pallid waste where no radiant fathomers,
From there. Toward . . .
demonstrating their talent for comedy?stroke
Glimmering of light:
Rise, to the muffled chime of churchbell choir.
Reshaping magnified, each risen flake
XVII. Greenland
Silent patch of ultimate paint. You are
marked with a dark stroke from the left, encroached
A matter of getting all that right . . .
What I have in my hands, these flowers, these shadows,
Come, swallows, it’s good-bye.
Place of absorbing snow, itself to be
With a hand freed from weight,
Is the moon to grow
Suddenly, in a savage, dreadful bend,
With minor editing (particularly the punctuation), this could almost be passed of as something from a modern poetry review . . . and here’s why: rather than generate its text word-by-word, the bulk mailer worked line-by-line from actual poems. (The line “XVII. Greenland” is a good clue.) A bit of Googling revealed that most of these lines can be found on a particular page of poems about winter on the website of the University of Chicago Press. The unfortunate question-mark in line 6 is an em-dash on the source page.
PIPO: Poetry In; Poetry Out.
This recalls to mind one of my favorite pieces of randomly-generated text. In 2004, a group of SFWA members set out to show that a company called “PublishAmerica” is not a “traditional” publisher (that is, that they do not engage in any sort of editorial quality-control over their books). To this end, this group produced a very good candidate for the worst novel ever written: Atlanta Nights. Each chapter was written by a different person to be as terrible as possible. Chapter 34 was actually machine-generated using the rest of the book as the source material. The pseudonymous author-of-record, “Travis Tea”, now has his own web site.
[Image of Calliope, Muse of epic poetry, courtesy of Wikipedia.]