What if the stock market could be predicted by Twitter? Sounds pretty cool. When I started investing I looked into this and it still makes sense to me. For example, McDonald’s; it’s logical that McDonald’s success is tied to the perception of the public to it. If the public says negative things about it, then perhaps bad times are on the horizon. However, the research I’m writing about is done with the goal of predicting the market and not a single stock, which I’d say is much more difficult. A stock’s future is unknown, but the factors that affect it are more or less known (earnings, growth, efficiency, debt, etc) and finite. The factors that affect the entire market are, for me, mostly unknown and infinite (well, not infinite literally, just a figure of speech). In a great market swing, sentiment analysis might be well applied as the market is susceptible to emotional reactions. In smaller swings, how much of a weight can be attributed to sentiment? Too hard to say, too many variables.
Twitter is, among other things, an aggregator of sentiments, and the stock market is prone to mass emotional crisis; so if one causes the other, the market can be predicted by predicting people’s mood. The premise of emotional influence in the market’s valuation is disputed by efficient market proponents, who consider that in a market where information is fluid and public, stocks are priced according to what the seller and the buyer can agree on and there’s no exaggeration. Well, that’s a a whole other idea that would be cumbersome to develop in a post about Twitter data mining. Just for the sake of it, this research is done on the basis that there’s an inefficiency in the market where emotional response overtakes rational arguments.
The research I’m discussing is done in  and in  and the results are interesting.  explores the idea through two methods of sentiment analysis, namely OptionFinder and Google-profile of mood states (GPMS). Both data mine tweets searching for a given set of words, or lexicon, and only diverge in how granular the information they provide is. OptionFinder can only indicate a positive vs a negative mood, while GPMS delves deeper indicating 6 emotional states (calm, alert, sure, vital, kind and happy). Both search for given words that precede states of mind, like ‘I feel’, ignoring all the blank tweets and spam, like the ones directing to an url and then partition the tweets to be analysed in bulk.
The time tested was from 28/02/2008 to 19/12/2008 and the tweets mined are from approximately 2.7M users. My question is: how were these 2.7M users chosen? Were there any criteria or was it complete random? If it is complete random, then a lot of the tweets are noise, like 12 year-olds being happy about going to a concert, if on the other hand there is a criteria, then it influences the results. From the article, there’s no mention of handpicking tweets beyond ignoring spam and blanks.
OptionFinder and GPMS get exciting results in Fig. 1. As the authors noticed, there’s a buildup in vital before the election and a surge in calm afterwards. Happiness spikes in thanksgiving. Next, the article compares these results to the Dow Jones seeking a correlation. The article concludes that there’s only causality between the GPMS’s Calm dimension and Dow Jones. Everything else doesn’t seem to influence the market.
What’s interesting is that the people’s sentiment is 3 days into the future in relation to the market. Another cool thing is that it seems that as calmer the people are, the more the Dow Jones moves up, which begs the question: are the people calm because of external conditions or because of the market? Given that the calm graph is lagged 3 days, that question isn’t easily answered with this data.
 reaches the same conclusion in that it is best to lag sentiment by 3 days, which isn’t odd since it cites it. Although the article states that GPMS is good, it also points out that it is proprietary of Google, so it creates another algorithm that makes use of the data mining to reach better returns in the market.
This article is poorer than . For instance, no dates were indicated for when the testes were made. The results can be incredibly tendentious, giving a dumb buying spree behavior to standard. For all I know, these results could be for 1941. All in all, it is worth reading if you don’t have much else to do.
In the end, if this worked its authors would be rich, and I’m not sure that’s happened. Who knows?
Bollen, Johan, Huina Mao, and Xiaojun Zeng. “Twitter mood predicts the stock market.” Journal of Computational Science 2.1 (2011): 1-8.
 Chen, Ray, and Marius Lazer. “Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement.” (2013).