Word detectives

How linguistic experts figure out who wrote what.

Apr 19, 2024

WHEN I DO a Ross Rules workshop, I like to ask the group, “Who are the best writers in this office?”

People look at each other and start offering answers. “Mark always gets straight to the point.” “Janet is concise without being rude.” “He’s easy to understand—you don’t have to read anything twice.” “She has a good vocabulary and doesn’t just say the same things everybody else does.”

I then ask, “If you didn’t know who sent the email, or wrote the brief, could you figure it out from the writing itself?”

“One or two people for sure,” is a typical answer. “The others I’d be guessing.”

Enter forensic linguistics, the science of of language analysis—i.e., the study of grammar, syntax, vocabulary, and linguistic pattern. It’s commonly used to gain information relevant to a criminal investigation. Determining authorship can identify, or exclude, who wrote a disputed will, an anonymous threat, a ransom note. Quirks like spelling errors, capitalizations, word frequency, and phrasing all provide guidance.

(Note that I’m not referring here to forensic graphology, or handwriting analysis, which focuses on the characteristics of letter formation. Forensic graphologists examine the physical attributes of written script—the size, shape, and structure of letters, the spacing, the pressure applied to the paper—to help identify or exclude a writer and even provide insight into that person’s personality and emotional state.)

THE SCIENCE of forensic linguistics, in rudimentary form, dates back at least to 1927. A kidnapping in New York state involved a ransom note. Duncan McLure, the uncle of the young teacher who was kidnapped, was the only member of the family to spell his name “McLure” rather than “McClure.” The ransom letter he received, purportedly from the kidnappers, was addressed to him with his preferred spelling, indicating that the writer was familiar with the two versions. McLure confessed to having written the note himself.

If that conviction was a no-brainer, the Ted Kaczynski case involved more nuance. The so-called Unabomber was wanted by the FBI for a series of bombings that killed or injured more than a dozen people over many years. The then-unidentified terrorist sent a lengthy manifesto outlining his beliefs to the New York Times and the Washington Post. FBI officials were reluctant to see it published—policy was not to disclose evidence—until someone pointed out that readers might find clues to the writer’s identity.

Good call. One reader who recognized the prose in “Industrial Society and its Future” was the Unabomber’s brother. Phrases such as “industrial-technological system” and “cold-hearted bureaucrats” sounded familiar to him. An FBI profiler studied the manifesto, comparing the words and prose style with other known writings by Ted Kaczynski. His findings led a federal court to issue a search warrant for the recluse’s Oregon cabin, where evidence of Kaczynski’s crimes was abundant. Thereafter the Unabomber could only dream of mailing letter bombs from his cell at ADX Florence, the supermax prison in Colorado, before he committed suicide.

It’s unlikely the Unabomber would have been caught if hadn’t provided a sample of his prose.

IF YOU WANT to make it really easy for forensic linguists, write a bunch of anonymous letters, supposedly from different sources, and mail them to yourself. When a British actress involved in a criminal investigation started getting such letters, police suspected that she had written them to distract the police from the case she was involved in.

A forensic linguist was hired to scrutinize the letters. He quickly found cohesion among them. The phrase “your every move is being monitored’ turned up in several of them. The word trace was used where most people would use track (as in “we’re tracking your vehicle”). This minor confusion of track and trace was also found in personal letters written by the actress. It didn’t take long to determine that the supposed victim of these threatening anonymous letters was the perp.

PERHAPS THE MOST widely known case in which forensic linguistics played a role is that of Derek Bentley. A 19-year-old British kid with developmental disorders, Bentley was convicted of the murder of a policeman during a robbery in 1952 and hanged in 1953. Forty years later, after strenuous work by family members who believed he was innocent, the case was re-opened.

The main evidence used to convict Bentley was a statement he supposedly made right after being arrested. Under oath, the police testified that the statement was the verbatim account of a monologue Bentley had delivered without interference.

In analyzing the statement, a forensic linguist found words and phrasings that were far more commonly used by a policeman than a lay person—especially a teenager with poor verbal skills. The statement also contained several denials Bentley had supposedly made. A suspect making an uncoerced statement doesn’t generally tell what didn’t happen while recounting what did. This suggested that Bentley’s confession represented edited responses to questions rather than an uninterrupted monologue.

Use of the word then was a particular tipoff. Someone with Bentley’s upbringing and developmental issues was far more likely to continue a narrative by saying, “Then I. . .” rather than “I then. . .” The latter phrasing was typically used by policemen delivering testimony under oath. Courtroom cop talk.

The forensic linguist became convinced that the authorities had been “co-authors” of the document. His work was persuasive enough to destroy the validity of the statement that led to Bentley’s execution, and Bentley was posthumously pardoned.

WHEN BILL CLINTON was running for a second term as president, a novel of the day told the inside story of a presidential contender from the viewpoint of a campaign insider. Primary Colors, by “Anonymous,” became a bestseller, and people sought to discover the author’s identity. A Vassar College professor used the skills of a forensic linguist to narrow down the contenders. By tallying word frequencies in the novel, and comparing them with word frequencies in the work of several possible authors, he identified the writer as Joe Klein, a magazine journalist. Klein at first denied it, but later admitted authorship.

Similarly, when The Cuckoo’s Calling, a novel, was published, in 2013, someone at the Sunday Times of London suspected that the stated author, Robert Galbraith, might actually be a pen name of J. K. Rowling, she of the Harry Potter phenomenon. The Times hired a forensic linguist to investigate. The literary detective searched adjacent words, most commonly used words, relative use of long and short words, and various other markers. After doing the same search with several Harry Potter books, he concluded that Rowling was almost certainly the author of the novel. Rowling eventually fessed up.

Comparing the writing in this novel to some Harry Potter titles, it didn’t take a curious editor long to determine that the real author was J.K. Rowling.

CAN YOU FOIL the word detectives? Is it possible to confound a forensic linguist? In poker, when a player consciously tries to randomize his betting so as not to convey information about his hand, he actually creates new betting patterns that quickly become detectable through data analysis. A writer who tries to disguise her prose style will also find that it’s almost impossible to do. Your writing is like your fingerprint or your iris. It’s unique. Given a large enough sample size, forensic linguists can determine that nobody else writes exactly the way you do.

Artificial intelligence seems certain to make forensic linguistics ever more precise and dependable at determining who wrote what—until, that is, AI also allows the writer to fully erase her individuality behind algorithmically generated randomness. AI is already adept at predicting what you’re going to type next. It knows how you write, and it expects you to write the way you usually do. Start writing differently and it will quickly incorporate the new mode into the old and continue to help you along.

In forensic linguistics, then, as in many other areas of life, technological advances make the world ever better, while also making it worse.

5 Comments

Sheilagh McEvenue

Apr 19

It's funny. I detect a good writer about this time every Friday.

Expand full comment

1 reply by Gary Ross

Carole Chaski PhD

Apr 22

Gary, was this generated by AI? It is the typical overview of war stories that non-scientist "forensic linguists" provide as "evidence" that their guessing and cherry picking is actually science. So this article is exactky what you would expect a ChatGPT response to be. I think and hope you're better than that, aren't you?

3 more comments...

The Ross Rules

Word detectives

How linguistic experts figure out who wrote what.

It’s unlikely the Unabomber would have been caught if hadn’t provided a sample of his prose.

Comparing the writing in this novel to some Harry Potter titles, it didn’t take a curious editor long to determine that the real author was J.K. Rowling.