Message Boards Message Boards

[WSS16] Using Wikipedia Edits History to Predict the Future Stock Price

Posted 8 years ago

Project Description

Behavioral economics already proved that emotions affect individual decision making. This project will be testing whether the measurement of the mood of Wikipedia edits history correlates to the change of stock price.

The project can be divided into four parts: sentiment analysis of Wikipedia edits history , generating the financial data from NYSE, visualizing the correlation between sentiment analysis and stock price, predicting the future stock price by providing a sentiment analysis value. 10 sample companies will be analyzed in this project. These 10 sample companies are from the top 10 controversial companies list on 2015 CRN report and Entrepreneur. These 10 sample companies are: British Petroleum, Oracle Corporation, VMware, Hewlett-Packard, HSBC, Sony, JetBlue, General Motor, Microsoft and Target. The time span of WikiPedia edits history and financial data is from January 1st, 2014 to July 1st, 2016. The sentiment analysis has three levels: positive, negative and neutral. Each level will be assigned a numerical value: 1 for positive, -1 for negative and 0 for neutral. Hence, the analysis will be focusing on the correlation between average sentiment value of WikiPedia edits history and stock price of given companies.

If there exists a strong correlation between the measurement of the mood of Wikipedia edits history and the change of stock price, then companies should be able to predict the future stock price change and get benefit from it.


Example (Microsoft)

1. Sentiment Analysis

MicrosoftComment = 
  Map[Last, 
   StringSplit[
    Rest@StringSplit[
      Import["https://en.wikipedia.org/w/index.php?title=Microsoft&\
offset=&limit=455&action=history"], "( cur | prev )"], ". . "]];
MicrosoftCommentNew = 
  Reverse[Most[
    StringTrim /@
     (Map[
       StringReplace[#, 
         "( undo )" | "(" | ")" | "\[RightArrow]" | "\[LeftArrow]" -> 
          ""] &, MicrosoftComment])]];
MicrosoftDate = 
  Map[Take[#, 1] &, 
   StringSplit[
    Rest@StringSplit[
      Import["https://en.wikipedia.org/w/index.php?title=Microsoft&\
offset=&limit=455&action=history"], "( cur | prev )"], ". . "]];
MicrosoftDateList = 
  Reverse[Most[
    Interpreter[
      "Date"][(StringRiffle /@ (((StringSplit /@ MicrosoftDate)[[All, 
           All, {2, 3, 4}]])[[All, 1]]))]]];
MicrosoftSentiment = 
  Map[Classify["Sentiment", #, IndeterminateThreshold -> 0] &, #] & /@
     Merge[Association /@ 
      DeleteCases[
       MapAt[StringTrim, #, 2] & /@ 
        Rule @@@ 
         Transpose[{MicrosoftDateList, MicrosoftCommentNew}], _ -> 
        ""], Identity] /. {"Negative" -> -1, "Neutral" -> 0, 
    "Positive" -> 1};
MicrosoftSentimentNew = Map[Mean[#] &, MicrosoftSentiment]

2. Financial Data

MicrosoftFinancialData = 
 DeleteDuplicates[
  Select[Quiet[
    First[#] -> 
       Subtract @@ 
        Reverse[FinancialData["NYSE:MSFT", #][[All, 2]]] & /@ ({#, # +
           Quantity[1, "Days"]} & /@ MicrosoftDateList)], 
   Head[#[[2]]] =!= Subtract &]]

3. Prediction

MicrosoftPredSent =  
  KeyTake[MicrosoftSentimentNew, 
   Intersection[Keys[<|MicrosoftFinancialData|>], 
    Keys[MicrosoftSentimentNew]]];
MicrosoftPredFin = 
  KeyTake[MicrosoftFinancialData, 
   Intersection[Keys[<|MicrosoftFinancialData|>], 
    Keys[MicrosoftSentimentNew]]];
MicrosoftPredict = 
  Predict[Values[MicrosoftPredSent ] -> Values[MicrosoftPredFin]];
MicrosoftPredict[1]

4. Plot

DateListPlot[{MicrosoftPredSent, MicrosoftPredFin}, 
 PlotLabel -> "Microsoft Plot", 
 PlotLegends -> {"WikiPedia Sentiment Analysis", "Stock Price"}]

enter image description here


Summary of results and conclusions

The Wikipedia sentiment value positively correlates to the stock price change for most of the companies for the most time. For example, the plots of Hewlett-Packard, HSBC and Sony illustrate that the correlation between the Wikipedia sentiment value and the stock price change is strong. One interesting trend of the plots is that the more history edits the company has, the more strong the correlation between the Wikipedia sentiment value and the stock price change is. Given the existing data of Wikipedia sentiment value and financial data as the training set, the predict[] function of Wolfram language is able to use the knowledge of machine learning to predict future stock price change by giving the average sentiment value of Wikipedia edits history. For the above example, the stock price change will be 0.11925, which means the price of Microsoft stock will increase 0.11925 per share, when the average sentiment value of Microsoft of one day in the future is 1.


Open Problems

The first open problem of this project is the lack of the information how strong the correlation between the Wikipedia sentiment analysis value and the stock price change. The second open problem is how accurate the prediction of the stock price change based on the given average sentiment value. The last open problem is whether the above method works for every company or it only works for controversial companies.

POSTED BY: Cheng Shi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract