Tuesday, May 18, 2010

Computers detecting sarcasm? Please!

A colleague sent me a link to a paper on detecting sarcasm in Amazon product reviews. (See also popsci or slashdot.)

I would think sarcasm is humanly subtle, but their computer algorithm got 77% precision and 81% recall on new sentences in an evaluation set. Without getting technical, about 80% correct.

Like many natural language processing (NLP) algorithms, there are many wrinkles involved. They take a sentence like

Garmin apparently does not care much about product quality or customer support

and extract patterns like

[company] CW does not CW much
does not CW much about CW CW or
not CW much
about CW CW or CW CW

where "CW" is a "content word" (see another of their papers!). A sufficiently common CW is an HFW, of course (high frequency word). Then they prune away patterns that don't help much or are rare.

They also used punctuation (number of !, ?, quotes), capitalized words, and sentence length, but they didn't help as much as the patterns.

More hilarious to me is the dry academic tone presented with sarcastic examples. To wit:

A number of sentences that were classified as sarcastic
present excessive use of capital letters, i.e.:
"Well you know what happened. ALMOST NOTHING HAPPENED!!!"
(on a book), and "THIS ISN'T BAD CUSTOMER
SERVICE IT'S ZERO CUSTOMER SERVICE".
These examples fit with the theoretical framework of sarcasm
and irony ..

Oh, well, I'm glad they fit with the theoretical framework! :)

By comparison, always predicting "not sarcastic" would get good precision (can't tell how good from their paper) and 0% recall due to the "sparseness of sarcastic utterances" as they academically say. That is, most sentences are not sarcastic. Tell that to a parent of teenagers.

0 comments:

Post a Comment