Natural Language Processing of nearly 4,000 U.S. diplomatic cables reveals fraying relations with traditional allies, and a few other surprises
Christopher Mims 04/11/2011
–>
Software capable of determining the positive or negative sentiment of sentences written by humans has been unleashed on 3,891 U.S. diplomatic cables released by WikiLeaks, and the results are a systematic, if preliminary, analysis of which countries are our besties and which are in the doghouse.
The analysis was part of a class project (pdf) by a pair of computer science undergraduates at Stanford, Xuwen Cao and Beyang Li. By looking at how often a country was mentioned, as well as whether or not it was cast in a positive or negative light, Cao and Li identified four clusters to which countries could belong: countries we don’t like that never come up (red), countries we don’t like that we talk about on occasion (teal), and countries often cast in a negative light that diplomats just c… >>>