A Sentimental Analysis

He says he spent most of 2011 in Mumbai, setting up a foreign office, sleeping in a Marriott and working in an office block. His employer was a networking company headquartered in San Jose, California, with over a hundred billion dollars in market capitalization; a company so big, he said, that any digital bit sent over a wire passes through one of its products. His job title was forgettable, something involving quality and management and assurance. What he really did, as he put it, was train employees how to read. The employees, Indian college-educated twenty-somethings, already knew how to read, of course. What he meant was that he trained these employees how to read in a directed way, using, as he put it, heuristics. The arrangement was simple: his employees sat in a room and read the hundreds of thousands of words published every day about the company and its products. They read product reviews in technical journals, news reports, Wall Street Journal op-eds, Amazon reviews, bulletin board discussions, Twitter and Facebook posts, even the occasional short story or novel. Most of the material came through automated searches, either from Google alerts or a number of in-house bots that parsed RSS feeds, monitored Twitter timelines, and scraped news groups for keywords and proper names associated with the company. The employees, guided by heuristics he designed along with company management, rated each text with a number. The numbers ranged from negative one to positive one—negative one indicated that the article was of a negative opinion on the subject of the piece; zero indicated neutral, purely factual information; and one indicated a positive opinion. More than one person sometimes reviewed a document and the readers’ ratings were then averaged. Most documents were short and received only a single number, but longer book-length reports and extensive technical reviews were rated on a chapter-by-chapter basis. The numbers were then entered into a database, and another offshore office went about analyzing the analyses, producing an internal ticker tracking worldwide opinion of the company and its products. The index could be broken down into sub-indices: one for each of the thousands of products the company sold globally. For a short time, the company had a series of tickers measuring world opinion of several of the company’s top executives, a practice that has been discontinued, since, as was to be expected, the executives spent more time monitoring their own ratings than the company’s.

He remembers his year in Mumbai with some excitement, though not because of the company or the city. The company paid well, it’s true, but he rarely crossed beyond the glass walls of the office or hotel. He wasn’t too stupid or complacent to miss the colonial dynamic that governed his day-to-day. For his employees, he reasoned, maybe rationalized, it was temporary work, less alienating than usual—after all, they read all day for a living wage, a job he wouldn’t have minded at twenty-two. None of this matters to him in hindsight. The real reason he remembers that year fondly, he is embarrassed to say, is because it reminds him of his favorite film, Three Days of the Condor, the conspiracy thriller starring Robert Redford and Faye Dunaway. Not that the film’s CIA plot had anything to do with him or his Mumbai office. It was the film’s opening scenes, the ten or fifteen minutes set in an Upper East Side townhouse, where bookish analysts read and summarized spy novels for higher-ups at Langley. The CIA, guided by its own version of the paranoid critical method, believed that the stories might contain coded messages for KGB agents. Even if this weren’t true, the agency believed its analysts might learn some creative spycraft from a novelist. The analysts in the film look more like A-plus students than whisky-soaked spies: concentrated, spectacled, tweedy, their reading interrupted by debates of plot points and plausibility. So it was with the Mumbai office, he said, with the language of public opinion swapped for that of fictional espionage. Like spies, the Mumbai readers’ attentions were tuned to a narrow frequency, filtering product reviews and news reports into actionable intelligence.

Most likely, a CIA office like the one depicted in Three Days of the Condor did not exist. However, according to Washington Post reporter Pete Earley, the Russians, after seeing the film, set up a department called the Scientific Research Institute of Intelligence Problems of the First Chief Directorate (NIIRP). Earley wrote: Some two thousand employees there sifted through hundreds of newspapers and magazines each day from all over the world, looking for information that might prove helpful to the intelligence service.

Four decades later, in 2015, the New York Times Magazine reported on a similar office in St. Petersburg, Russia. This time, instead of readers, the Russian office is stocked with writers. The writers specialize in online disinformation: hoaxes, smear campaigns, trolling, pro-Kremlin propaganda. Some of campaigns involve little more than crude Russian jingoism, while others are more elaborate, including fake YouTube videos and cloned news sites. Unlike the NIIRP, the trolling agency’s most well known designation is simple: the Internet Research Agency. Hiding behind fictional handles and Internet proxies, paid by funds routed through shell companies, the Agency employees have a task inverse to that of the office in Mumbai. Rather than detect sentiment, they are paid to produce it. But like the Mumbai office, the Internet Research Agency’s para-literary work is an expression of capital: in Mumbai’s case that of a multinational US corporation, and in Russia’s that of an oligarchy. Both practices—industrialized reading and writing—are grounded in emotions, emotions dictated and shaped by markets in goods and opinions. In Mumbai, readers are quantifying—objectifying—the subjective. And in a Russia, writers are fictionalizing the subjective, inventing sentiment where none exists, producing opinions without subjects.

The technical term for much of what the Internet Research Agency produces is opinion spam. Opinion spam isn’t limited to just trolling; it also includes positive reviews, reviews written by bots and employees. Within the field, it is difficult to know how to define opinion spam. Most would agree that paying an operative in a troll farm to spread propaganda for an oligarch would qualify as opinion span. But if an author asks her friends to positively review her book on Amazon, is that opinion spam? What is a false opinion? And if a definition can be agreed upon, can a human, let alone a computer, ever reliably detect its falsity?

***

Sentiment analysis is a fairly recent branch of computer science—the first papers are from around 2002—involving the computational identification and quantification of opinions. For sentiment analysis, opinion is a problem, to be decomposed into sub-problems, which involve objective identification, classification, targeting, authorship. In this way, sentiment analysis is a kind of structuralism, one at odds with language’s inherent ambiguity. As computer scientist Bing Lui wrote in his book on sentiment analysis, If we cannot structure a problem, we probably do not understand the problem. What, then, is the structure of the problem of opinion?

For a field with such a passion for categorization, it is surprising to see the words opinion and sentiment used in the literature as interchangeably as they sometimes are. (One even finds this slippage in the labels given to the discipline itself: often the field is called sentiment analysis, but sometimes it is also called opinion mining.) If any distinction is made between the two words, it is that opinions contain sentiments; though sometimes one will read of sentiments containing opinions. Let’s assume, for the moment, that opinions contain sentiments, and move on to what sentiment analysis excels at: classification. As described in Lui’s book, in its simplest form, an opinion can be classified as a double:

(g, s)

The pair contains s as the sentiment and g as the target. Given the example, The song is moving, the target is said to be the song, and is moving is said to be the sentiment about that target. With a few more pages of exposition, the computer scientist arrives at the most complex formula for an opinion, a quintuple:

[e(i), a(ij), s(ijkl), h(k) , t(l) ]

In the elaborated version, e is the name of the entity, similar to the subject g in the previous example. Each entity has one or more aspects, a. Each aspect has sentiments s, which belong to a holder h at a particular time t. So sentiment s(ijkl) was held by person h(k) about aspect a(ij) of entity e(i) at time t(l). According to Lui, all five parts of the quintuple are necessary in order to understand an opinion. We must know the target or entity of the opinion, but we must also know about which aspect of the entity the holder is speaking. (A reviewer might praise a computer’s screen, but not its keyboard, for example.) We need to know the sentiment expressed about this attribute, but we also need to know who is expressing this opinion and at what time. (It is significant if a single holder of an opinion changes his or her opinion over time or gives a single opinion at a particular time.) The reason for breaking opinions into clearly defined parts is that all of these opinions must be stored in a database for later querying and mining. And it must be stressed, as Bing Lui does in his book, that the field of sentiment analysis is not about understanding a single person’s subjectivity. Sentiment analysis is only useful when done at scale, when aggregating thousands or millions of opinions. Unlike a novelist who might scrutinize a review’s every turn of phrase, a corporation employing sentiment analysis wants to employ a kind of distant reading, an averaging out of thousands of opinions into a coherent worldview. It is as if scale allows the computer scientist to sidestep the semantic problems of subjectivity, or at least transform them into problems that are no longer semantic, but sociological.

Even if a computer can break down an opinion into quintuples, extracting a target entity, its aspects, and the sentiments about those aspects from a sentence, one still must understand those sentiments. In sentiment analysis, a sentiment is often rated on a range between negative one and one. Other systems might rate sentiments with a number between one to ten, but no matter which system is used, sentiment analysis is based on polarity: good and bad, with a neutral rating placed equidistantly between the two.

Good and bad—the one and the zero of computational opinion, are the ur-sentiment words, the alpha and omega. Given these two polar starting points one could, using a thesaurus, plot every other word in a constellation between them. We start with good and then look up its synonyms, and then, in turn, look up the synonyms of those synonyms, and then the synonyms of those synonyms, creating a vast graph network of sentiment. For our dictionary, we could use WordNet, a reference tool developed by the psychologist and cognitive scientist George M. Miller, which boasts, among other features, synsets—clusters of synonymous words. Distance in the network graph from the origin seed words determine any given word’s polarity: those closest to good are rated closest to the number one, those farthest are rated closest to negative one, and those equidistant from both are rated zero. Computer scientists soon discovered that when using this method one does not have to speak the language in question in order to determine the polarity of a word. Using a random walker, the average distance from a given word to each polarity can be calculated, and as long as basic positive and negative words are known in a language, one can approximate any word’s sentiment.

Many variations on this method have been invented, but none takes into account that a word may have different meanings in different domains. It is usually a good thing if battery life is considered long, but if a camera takes a long time to focus, it would be considered negative. More subtly, apparently neutral statements can carry positive and negative weight, such as when a customer says, A crack formed in the lens during the first month of use. This is a statement of fact, but the common understanding of this sentence, at least within the domain of lens manufacturing, is that it expresses a negative fact. Although sentiment analysis limits itself to the quantification of subjectivity, a factual statement such as this becomes of interest when it is part of a review. And, finally, there is the most difficult problem of all: sarcasm. If sarcasm is defined as a statement that means its exact opposite, usually for humorous effect, then computer scientists are faced with a kind of meaning that cannot be gleaned from grammatical, domain, or dictionary contexts.

One way to solve these problems is to have humans read and tag the documents. This is what was happening in the Mumbai office; if wages and the number of documents are low enough, human readers can still be more cost effective and accurate than computers. For many corporations, however, this is not an option. Billions of records stored in a social media company’s database would prove too difficult for human parsing. So while human readers would more likely detect sarcasm, in most scenarios they would be unable to provide little more than spot checks on terabytes of data.

***

What if you unleashed the sentiment bots on the literary canon, where they could parse and chart every sentence somewhere between the good-bad polarities? Recently, a literary academic, Matthew L. Jockers, did exactly this, line-graphing A Portrait of the Artist as a Young Man and other literary works as if they were Amazon reviews. His software churned through the books sentence by sentence, and a second algorithm smoothed the thousands of data points into a single curve. Looking at Jockers’s charts, you can see the jagged, mountainous horizons of sentiment, starting off high, as is the case with Portrait, and then dropping low, climbing and rising again and again, until finally coming to rest somewhere slightly above the middle. That’s usually the case: the stories end above the zero point—a happy ending—with some stories happier than others.

Jockers’s inspiration for the project was Kurt Vonnegut’s rejected master’s thesis, in which the novelist proposed graphing the narrative arcs of various literary works. Jockers quotes Vonnegut: The fundamental idea is that stories have shapes which can be drawn on graph paper, and that the shape of a given society’s stories is at least as interesting as the shape of its pots or spearheads. Shapes like spearheads, or, in this case, like mountain ranges or skylines, a rising and falling of expectations, disappointment following expectation, joy following grief, the give and take of narrative arcs.

After processing 40,000 novels, Jockers claimed to have found six archetypal plots. As Jockers wrote in his blog, he wasn’t the first literary theorist to have done this kind of analysis—William Foster-Harris claimed to have reduced literature to three plots; Ronald B. Tobias claimed to have found 20; Georges Polti came up with 36. But none of these researchers had Jockers’s tools; none could process 40,000 novels. They worked by reading novels and drawing subjective conclusions. So Jockers ran the numbers on 40,000 novels and produced six pairs of charts (maybe seven), each pair diagramming the archetypal plot with a specific example that most closely fit the archetype. Those example novels are: The Return of the Native by Thomas Hardy, Intensity by Dean Koontz, Chasing Fire by Nora Roberts, Simple Genius by David Baldacci, A Creed in Stone Creek by Linda Lael Miller, and My Antonia by Willa Cather.

It seemed so neat and obvious, with billions of literary words reduced to a few hills and valleys, hard data used to verify soft sentiments. As it turned out, little of the research was accurate. As Annie Swafford, a scholar at SUNY New Paltz, wrote on her blog, Jockers’s techniques were deeply flawed. Even if one were to ignore the previously mentioned problems posed by computational interpretations of contextual sentiment, the method Jockers used to smooth out his graphs could easily produce artifacting. Simply put, since every sentence in a novel is being graded, Jockers needs to smooth these thousands of data points into one curve. He could do so using a number of filtering techniques; the one he chose is called a low-pass filter. Low-pass filters would not only smooth the curve, however; they might also introduce waves into the curve that have no origin in the data, a.k.a. ringing artifacts. Thus, what might appear to be a narrative that descends into a valley and then ascends to a summit could be transformed into the opposite by changing the parameters of the filter. What Jockers had produced, in short, had little to do with the sentiment arcs of the story and more to do with the filters used to create those arcs. What appeared archetypal was, more likely, an error, a bug, maybe even an aberration.

***

Within computer science, certain techniques in sentiment analysis are sometimes called naïve. For example, there is Naïve Bayes Classification, a statistical technique that naïvely assumes that the likelihood of each word occurring in a phrase is statistically independent of all the other words in that phrase. (Usually word occurrence probability is not statistically independent—Oscar for Best Picture is much more likely than Oscar for Best Sandwich.) Occasionally, entire categories of solutions, such as WordNet graphs, are described as naïve. But in computer science the word naïve has a meaning more specific and less derogatory than in common usage. It means: We know the problem is complicated, but let’s see how far we can go with a simplistic solution. For those readers coming from the humanities—a collection of disciplines that often embraces complexity—a willingness to be naïve might seem odd, perhaps naïve in itself. But in computer science there is a general curiosity, maybe even a pleasure, in the surprising effectiveness of naiveté. Quite often, it works just fine.

But how can sentiment analysis be said to work? How does one test the assumptions of a researcher like Jockers or the field of sentiment analysis in general? Aren’t we are speaking about measuring opinions? Furthermore, what is behind this drive towards quantification? Is it naïve to assume that literature can be quantified, reviews and criticism can be quantified, that subjectivity itself can be quantified? Is sentiment analysis simply part of rationalism’s manifest destiny, yet another member of quantification’s vanguard? Even more simply, what good is sentiment analysis to anyone?

Sentiment analysis is a tool used in the production and marketing of consumer goods. It quantifies aspects of the consumer; it is a form of consumer intelligence. More broadly, sentiment analysis can be applied to any subject matter for which public opinion is important: a politician’s popularity, a film’s box office, or a political campaign’s effectiveness. It has even been used as a predictive technique; researchers have discovered that when sentiment analysis is applied to Twitter discussions about a soon-to-be-released film, one can reliably predict that film’s future box office. So, sentiment analysis may seem naïve, it might even verge on junk science, but, more often than not, its results are accurate enough.

Beyond questioning the effectiveness of its techniques, one can also direct criticism at sentiment analysis’s ends. Quantification, when applied to the consumer or the citizen, is commonly understood as belonging to the humanist project. After all, humanism has been, at least since Vitruvian Man, centered on the measurement of humankind. And what better way to serve consumers and citizens than to measure their opinions? A naïve humanism would champion sentiment analysis as the latest progressive step toward understanding the body politic. In this way, sentiment analysis is a sibling to the census, television ratings, public opinion polling, and consumer feedback surveys. To answer truthfully, to reveal ourselves to the state and to capital, is, for the naïve humanist, in our best interest.

But we do not all share the same interests. Seen from a more radical view, sentiment analysis—like humanism in general—is part of a continuum of post-Enlightenment techniques for the measurement and control of the body politic. The most similar example is to be found in the history of opinion polling, a collection of techniques that ostensibly assists in understanding voters, but which campaign strategists often use to manipulate those same voters. (See the United States, where the GOP uses polling to discover which moral and religious issues will drive single-issue voters to the polls.) However, unlike polling or the census, sentiment analysis is done without our consent. It takes place invisibly on Facebook and Twitter and Amazon every day. We contribute to the corpus of corporate public sentiment almost reflexively.

One could withdraw from participating in these networks, but instead of retreating from technology, one might propose a more active form of resistance. Perhaps the undermining of quantification could be based on the conscious deployment of those qualities most at odds with sentiment analysis: ambiguity, spam, sarcasm, double entendre, and deception. In a way, the Russian Internet Research Agency, of all places, was on to something. Rather than preserve sentiment’s supposed sanctity—as a romantic would—why not instead engage in the short-circuiting of quantification itself? Not through pro-Putin propaganda, of course, but rather through a spamming of the very systems by which our opinions are measured. Just as cryptography is in fact an arms race between encryption and decryption strategies, one could imagine a similar race between those who wish to quantify our sentiments and those who wish to obscure them.

John Menick | Art | Writing | About | Newsletter | Instagram

A Sentimental Analysis