We live in a digital world where our access to information is unprecedented and our ability to shape or mould opinion is as easy as clicking a ‘like’ button. Imagine the power inherent in that kind of online presence. Now imagine being able to harness that power…
Consider this – of the 6 billion people on the planet, 4.8 billion have a mobile phone. That means 4.8 billion customers can potentially express their opinion, literally at the click of a button. Now consider a more shocking statistic – only 4.2 of the 6 billion own a toothbrush. But back to the priorities of an entire generation of consumers. Can any company or brand really afford to ignore the opinion of that many people with any kind of spending power?
Determining the sentiment of any opinion is a complex and multi step process. Meaning is subjective. What I say is not necessarily what I mean. What you hear, or even read, is not necessarily what I said. Imagine then how difficult it is to extract any kind of worthwhile sentiment or opinion from an unstructured digital source… Except, once the processes are in place, extracting meaning from an unstructured source is as easy as… well, as easy as brushing one’s teeth.
The first step in building any kind of model is to determine the actual nuts and bolts that we are interested in measuring. Is it a product, a service, a brand? A monitoring solution needs to be established that recognizes names, terms and concepts and then applies natural language processing (NLP) to associate sentiment and other related attributes. It must be trained using machine learning techniques. The most accurate solutions train the model using the largest possible data sets and utilise human input, to recognize and classify brand or company-specific sentiment.
We don’t only want to look for polarity of sentiment (positive / negative / neutral) but also the emotional footprint (happy / angry / sad). These relatively simple basics must be fully optimised before more complex vernacular like irony and sarcasm can even be attempted. Even humans have a less than stellar accuracy rate for something so complex. And all of this is only relevant to the question being asked or the brand in question.
These hybrid machine learning / human input models will become more accurate the larger the sets, the better the human input and the longer they run. They are taught to identify relevant comments and then classify them according to why they feel positive or negative towards something. What are they unhappy about? Why do they prefer the competitor? The model is ‘taught’ to identify slang, sarcasm, double negatives, irony, shorthand and punctuation.
There is a risk that the context can be lost, and then a positive comment can be misclassified as negative. It is best to have a model that determines the overall sentiment of a document / blogpost / tweet, as well as the sentence level sentiment. In this approach, context is retained and the chances of misclassification are limited because if the classifier knows the overall sentiment of a document, then disambiguating becomes easier. During training, valuable information is passed in both directions, which means the model will, over time, become incredibly robust.
The near instantaneous sentiment gleaned in this way from unstructured, unsolicited sources means that companies now potentially have their fingers on the pulse of the collective customer. Reaction times to marketing campaigns, new offers or addressing unsatisfactory service can be immediate.
Consider the words of the man considered by some to be the father of modern advertising – John Wannemaker: “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.” With a well trained, robust sentiment model you will know exactly which half.
* mobile phone vs toothbrush statistic from www.digitalbuzzblog.com/social-media-statistics-stats-2012-infographic/