A harsh day, indeed! It all begun February 14, 1929, when the Saint Valentine’s Day Massacre took place in Chicago.
Seven mob associates as part of a prohibition era conflict between two powerful criminal gangs in Chicago: the South SideItalian gang led by Al Capone and the North Side Irish gang led by Bugs Moran. Former members of the Egan’s Rats gang were also suspected of having played a significant role in the incident, assisting Capone (source: Wikipedia).
Perhaps we should update the definition: the famous South African runner Pistorius allegedly shot down his own girlfriend last night. What really happened is still unclear.
And, yes, today is Valentine’s day, before I forget. It is supposed to be a day devoted to love and wine, not to death. But everybody knows that love and death get along quite well. Let alone wine…
So, what do all the people think about today? Actually, we have a mean to estimate the global “sentiment” by analyzing the tweets that are posted every minute. This is called in fact “sentiment analysis“, an analysis technique that is getting very popular recently. To cut the story short, we want to tag a tweet as “positive” or “negative” by interpreting its content, literally!
We are lucky we don’t have to reinvent the wheel. We can set up a quite basic but still effective analyzer with a couple of tools:
- Python-twitter: http://code.google.com/p/python-twitter/
- Basic_sentiment_analysis: http://fjavieralba.com/basic-sentiment-analysis-with-python.html
The idea is quite simple: every 30 seconds we get the first 100 tweets with a certain hashtag. The, we analyze then assigning a sentiment score to each of them, and then we plot the score as a function of time. The numerical value is simply:
- Negative score = -1
- Positive score = +1
I added some words to the dictionaries shipped with Basic_sentiment_analysis. I decided not to modify the code, further improvements are possible but beyond the scope of this post.
At the same time, I analyzed #pistorius and #valentine hashtags, fetching 100 tweets every 30 seconds. Now take a look at the results (over few minutes, though): which is which? name it!
This is the code is used:
#!/usr/bin/env python import time import sys import twitter from sentiment_analyzer import * consumer_key = "xx" consumer_secret = "xx" access_token_key = "xx" access_token_secret = "xx" api = twitter.Api(consumer_key=consumer_key, consumer_secret=consumer_secret, access_token_key=access_token_key, access_token_secret=access_token_secret ) #print api.VerifyCredentials() #statuses = api.GetUserTimeline() #print [s.user.name for s in statuses] pattern = "#lhc" res_max = 1000 if len( sys.argv ) > 1: pattern = "#" + sys.argv if len( sys.argv ) > 2: res_max = int( sys.argv ) splitter = Splitter() postagger = POSTagger() dicttagger = DictionaryTagger([ 'dicts/positive.yml', 'dicts/negative.yml', 'dicts/inc.yml', 'dicts/dec.yml', 'dicts/inv.yml']) if __name__ == "__main__": while( True ): results = api.GetSearch( pattern, per_page=res_max ) #print "Found", len(results), "tweets about", pattern alltexts = [ res.AsDict()['text'] for res in results ] tot_score = 0 for tweet in alltexts: if tweet.startswith( "RT @" ): continue #print ">>>", tweet #print splitted_sentences = splitter.split( tweet ) pos_tagged_sentences = postagger.pos_tag(splitted_sentences) dict_tagged_sentences = dicttagger.tag(pos_tagged_sentences) score = sentiment_score(dict_tagged_sentences) tot_score += score #print "Total sentiment:", tot_score print pattern + ":" + str(tot_score)