A colleague sent me a link to a paper on detecting sarcasm in Amazon product reviews. (See also popsci or slashdot.)
I would think sarcasm is humanly subtle, but their computer algorithm got 77% precision and 81% recall on new sentences in an evaluation set. Without getting technical, about 80% correct.
Like many natural language processing (NLP) algorithms, there are many wrinkles involved. They take a sentence like
Garmin apparently does not care much about product quality or customer support
and extract patterns like
[company] CW does not CW muchdoes not CW much about CW CW ornot CW muchabout CW CW or CW CW
where "CW" is a "content word" (see another of their papers!). A sufficiently common CW is an HFW, of course (high frequency word). Then they prune away patterns that don't help much or are rare.
They also used punctuation (number of !, ?, quotes), capitalized words, and sentence length, but they didn't help as much as the patterns.
More hilarious to me is the dry academic tone presented with sarcastic examples. To wit:
A number of sentences that were classified as sarcasticpresent excessive use of capital letters, i.e.:"Well you know what happened. ALMOST NOTHING HAPPENED!!!"(on a book), and "THIS ISN'T BAD CUSTOMERSERVICE IT'S ZERO CUSTOMER SERVICE".These examples fit with the theoretical framework of sarcasmand irony ..
Oh, well, I'm glad they fit with the theoretical framework! :)
By comparison, always predicting "not sarcastic" would get good precision (can't tell how good from their paper) and 0% recall due to the "sparseness of sarcastic utterances" as they academically say. That is, most sentences are not sarcastic. Tell that to a parent of teenagers.
0 comments:
Post a Comment