Group Abstract Group Abstract

Message Boards Message Boards

0
|
17.8K Views
|
26 Replies
|
8 Total Likes
View groups...
Share
Share this post:

Need help with Market Sentiment Analysis

Posted 7 years ago
26 Replies
POSTED BY: Jonathan Kinlay
POSTED BY: Jonathan Kinlay
Posted 7 years ago
POSTED BY: Rohit Namjoshi
Posted 7 years ago

Hi Roman,

Sorry for the late response.

Yes, that is the code I used. I copied it from your post into a notebook, evaluated and tested with

WSJSentimentIndicator[DateObject[{2012, 1, 3}]]

The first time I evaluated, it failed because Classify returned an empty association. Subsequent evaluations worked fine. Not sure why - perhaps it has to retrieve some data from Wolfram servers and that timed out? If it happens frequently, the code can easily be modified to retry the Classify.

POSTED BY: Rohit Namjoshi
Posted 7 years ago

Hi Roman,

My bad. The second DateString format should be {"MonthNameShort", " ", "DayShort", ", ", "Year"}.

For eight days:

datelist = Table[DateObject[{2012, 1, n}], {n, 3, 10}];
WSJSI = Flatten[First@WSJSentimentIndicator[#] & /@ datelist]
tsWSJSI = TimeSeries[Transpose[{datelist, WSJSI}]]
Histogram[tsWSJSI, PlotLabel -> Style["Histogram of WSJ Sentiment indicator", Bold]]

enter image description here

POSTED BY: Rohit Namjoshi

Hi Roman,

Glad you got it working. Rohit is right, the format changed (which happens quite frequently, with news sites, unfortunately).

Let me know if you run into other issues.

Jonathan

POSTED BY: Jonathan Kinlay
Posted 7 years ago
POSTED BY: Rohit Namjoshi
Posted 7 years ago
POSTED BY: Rohit Namjoshi
Posted 7 years ago
POSTED BY: Rohit Namjoshi
POSTED BY: Jonathan Kinlay
POSTED BY: Jonathan Kinlay

The quantiles are found as follows:

percentiles = Quantile[tsSIchange, {1/3, 2/3}];
bottompercentile = 
  Flatten[Position[tsSIchange["Values"], 
    x_ /; x < percentiles[[1]]]];
toppercentile = 
  Flatten[Position[tsSIchange["Values"], x_ /; x > percentiles[[2]]]]

So we set the strategy returns equal to the S&P 500 Index returns, except in the bottom 1/3 quantile of the sentiment indicator, where we reduce leverage by 1/leveragefactor, and in the top 1/3 quantile of the sentiment indicator, where we increase leverage and returns by multiplying by the leverage factor.

What we are saying is: we will only adjust our strategy away from the market portfolio when the sentiment indicator is in the bottom 1/3 or top 1/3 quantile. In those periods we halve or double our exposure, respectively.

POSTED BY: Jonathan Kinlay

More generally:

leveragefactor = 2.0;
strategyreturns = tsSPXreturns["Values"];
strategyreturns[[bottompercentile]] = (1/leveragefactor)*
   strategyreturns[[bottompercentile]];
strategyreturns[[toppercentile]] = 
  leveragefactor*strategyreturns[[toppercentile]];
tsVTDSPX = 
  TimeSeries[
   Transpose[{datelist, 
     1000*FoldList[Times, 1, 1 + tsSPXreturns["Values"]]}]];
tsVTDstrategy = 
  TimeSeries[
   Transpose[{datelist, 
     1000*FoldList[Times, 1, 1 + strategyreturns]}]];
POSTED BY: Jonathan Kinlay

Hi Rohit, so, according to Dr.Kinlay's answer, i assume, that code for tsVTDSPX will be the same as for tsVTDStrategy, just without quantiles:

strategyreturns = tsSPXReturns["Values"];
strategyreturns[[bottompercentile]] = (1/2)*
   strategyreturns[[bottompercentile]];
strategyreturns[[toppercentile]] = 2*strategyreturns[[toppercentile]];
tsVTDstrategy = 
 TimeSeries[
  Transpose[{datelist, 1000*FoldList[Times, 1, 1 + strategyreturns]}]]
marketreturns = tsSPXReturns["Values"];
tsVTDSPX = 
 TimeSeries[
  Transpose[{datelist, 1000*FoldList[Times, 1, 1 + marketreturns]}]]

With the leverage factor 2. However it is not working for me - i get transpose error.

Hello Rohit! The code which you proposed worked fine! I tried it on the WSJ articles for the last 3 years and out of 779 values, I had output for 710, with 69 Null values, which is fine. The rest I calculated manually afterwards, because complex computations too much time. I am now going through the last part of the paper - construction of the Trading Algorithm. Do you have any idea, what stands for tsVTDSPX and tsStrategy?

period = QuantityMagnitude@
   DateDifference[First@datelist, Last@datelist, "Year"];
AnnStd = Sqrt[252]*
   StandardDeviation[
    Transpose@{tsSPXreturns["Values"], strategyreturns}];
cf = {tsVTDSPX[Last@datelist]/1000 - 1, 
   tsVTDstrategy[Last@datelist]/1000 - 1};
CAGR = -1 + (1 + cf)^(1/period);
IR = CAGR/AnnStd;

Print[Style["News Sentiment Strategy", "Subsection"]];
P1 = Style[
   NumberForm[
    TableForm[{CAGR, AnnStd, IR}, 
     TableHeadings -> {{"CAGR", "Ann. StDev.", 
        "IR"}, {Style["SP500 Index", Bold], 
        Style["Strategy", Bold]}}], {6, 2}], FontSize -> 14];
P2 = DateListPlot[{tsVTDSPX, tsVTDstrategy}, Filling -> Axis, 
   PlotLegends -> {"S&P500 Index", "Strategy"}, 
   PlotLabel -> Style["Value of $1,000", Bold], ImageSize -> Medium];
Print[P1];
Print[P2];

In the next computation, Dr.Kinlay defines tsStrategy as tsSPXReturns, however, as for tsVTDSPX I have no idea.

Dear Dr. Kinlay, I am replicating the last part of your study now. If you will have time, could you please explain, what tsVTDSPX and tsVTDStrategy functions are?

period = QuantityMagnitude@
   DateDifference[First@datelist, Last@datelist, "Year"];
AnnStd = Sqrt[252]*
   StandardDeviation[
    Transpose@{tsSPXreturns["Values"], strategyreturns}];
cf = {tsVTDSPX[Last@datelist]/1000 - 1, 
   tsVTDstrategy[Last@datelist]/1000 - 1};
CAGR = -1 + (1 + cf)^(1/period);
IR = CAGR/AnnStd;

Print[Style["News Sentiment Strategy", "Subsection"]];
P1 = Style[
   NumberForm[
    TableForm[{CAGR, AnnStd, IR}, 
     TableHeadings -> {{"CAGR", "Ann. StDev.", 
        "IR"}, {Style["SP500 Index", Bold], 
        Style["Strategy", Bold]}}], {6, 2}], FontSize -> 14];
P2 = DateListPlot[{tsVTDSPX, tsVTDstrategy}, Filling -> Axis, 
   PlotLegends -> {"S&P500 Index", "Strategy"}, 
   PlotLabel -> Style["Value of $1,000", Bold], ImageSize -> Medium];
Print[P1];
Print[P2];

Thank you very much Rohit! I will try run the workaround solution tonight and I will check out the WebExecute!

Posted 7 years ago

Hi Roman,

I took a closer look at why it fails. It is the Import that occasionally returns partial results. The return value has the preamble text, the header text for the archive e.g. "News Archive for Jun 25, 2019" followed by blank lines, followed by "Most Popular Articles". The entire archive section is missing. It is not a bug in Import, I can reproduce it with a simple Ruby program.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("https://www.wsj.com/news/archive/20120105"))
txt = doc.xpath("//*[@id='root']/div/div/div/div[2]/div/div/div[2]/div[1]/div/div").text

puts txt

My guess is that the ads and other asynchronous JavaScript have to run before the archive section is rendered.

To work around this, use a helper function and retry the Import operation until it succeeds. A better way would be to use WebExecute if you are on version 12. You could experiment with that.

As far as performance, the Import is quite slow, taking ~15s to complete. With the workaround you can just increase maxRetries and leave it running overnight.

Workaround:

ClearAll[downloadArchiveWords, WSJSentimentIndicator]

downloadArchiveWords[date_] := 
  Module[{end = "Most Popular Articles", url, archive},
   url = StringJoin["https://www.wsj.com/news/archive/", DateString[date, {"Year", "Month", "Day"}]];
   archive = Import[url];
   archive = 
    StringDrop[archive, 
     StringPosition[archive, 
       DateString[date, {"MonthNameShort", " ", "DayShort", ", ", "Year"}]][[1, 2]]];
   archive = StringTake[archive, -1 + StringPosition[archive, end][[1, 1]]];
   {ToLowerCase[DeleteStopwords[TextWords[archive]]], archive}];

WSJSentimentIndicator[date_, maxRetries:_?Positive:5] := 
 Module[{retries, archivewords, archive, WSJSI},
  retries = 0;
  archivewords = "";
  While[Length@archivewords == 0 && retries < maxRetries,
   {archivewords, archive} =  downloadArchiveWords[date]; retries++; Pause[1]];
  WSJSI = #Positive/(#Negative + #Positive) &@Counts[Classify["Sentiment", archivewords]] // N;
  {WSJSI, archivewords, archive}];
POSTED BY: Rohit Namjoshi

Roman, Yes sure, no problem.

POSTED BY: Jonathan Kinlay

Thank you very much Rohit! I now understand why I have problems with computation - Mathematica just gives me fractured output every time. For example:

{0.6781, #Positive/(#Negative + 
#Positive), #Positive/(#Negative + #Positive), 0.7890, 0.6905, #Positive/(#Negative + \
#Positive), #Positive/(#Negative + #Positive)}

I have to repeat calculation 5-6 times to get the full output. I just wanted to run the analysis based on the data of a couple of years but I barely can run it for 8 day (calculation takes around 5min and not always gives full output). I think that the problem is either in my PC, even though I have i7 and 16 RAM, or maybe the code is too heavy. Do you know, if it is possible to add some line of the code to

WSJSI = Flatten[First@WSJSentimentIndicator[#] & /@ datelist]
tsWSJSI = TimeSeries[Transpose[{datelist, WSJSI}]]
Histogram[tsWSJSI, PlotLabel -> Style["Histogram of WSJ Sentiment indicator", Bold]]

to make it repeat the calculation over and other until success?

Thank you very much for your time.

Dear Dr Kinlay, i will be writing a course paper in international finance soon and I would like to replicate your market sentiment analysis but within the different timeframe and also with the use of twitter/reddit articles data. May I ask you, whether it is possible for me to use your code and reference your article in my paper? Thank you very much for your time. Roman

Thank you very much Rohit!

And the module code you did't change?

It looks like that for you?

WSJSentimentIndicator[date_ ] := 
 Module[{d = date, archive, archivewords, WSJSI}, 
archive = 
 Import[StringJoin["https://www.wsj.com/news/archive/", 
   DateString[d, {"Year", "Month", "Day"}]]];
archive = 
 StringDrop[archive, 
  StringPosition[archive, 
    DateString[
     d, {"MonthNameShort", " ", "DayShort", ", ", 
      "Year"}]][[1, 2]]];
archive = 
 StringTake[
  archive, -1 + 
   StringPosition[archive, "Most Popular Articles"][[1, 1]]];
archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]];
 WSJSI = #Positive /(#Negative + #Positive) &@
     Counts[Classify["Sentiment", archivewords]] // N;
  {WSJSI, archivewords, archive}]

Thank you very much Rohit!

And the module code you did't change?

It looks like that for you?

WSJSentimentIndicator[date_ ] := 
 Module[{d = date, archive, archivewords, WSJSI}, 
archive = 
 Import[StringJoin["https://www.wsj.com/news/archive/", 
   DateString[d, {"Year", "Month", "Day"}]]];
archive = 
 StringDrop[archive, 
  StringPosition[archive, 
    DateString[
     d, {"MonthNameShort", " ", "DayShort", ", ", 
      "Year"}]][[1, 2]]];
archive = 
 StringTake[
  archive, -1 + 
   StringPosition[archive, "Most Popular Articles"][[1, 1]]];
archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]];
 WSJSI = #Positive /(#Negative + #Positive) &@
     Counts[Classify["Sentiment", archivewords]] // N;
  {WSJSI, archivewords, archive}]

Thank you very much for your reply Jonathan,

I find your article very interesting and inspiring. I am not a proficient Wolfram user, unfortunately. However, I study finance now, and I would really like to learn how you did this sentiment analysis.

With the help of Rohit, I went through the first part of code, and now I have some troubles with the WSJSentimentIndicator:

WSJSentimentIndicator[date_] := 
 Module[{d = date, archive, archivewords, WSJSI}, 
  archive = 
   Import[StringJoin["http://www.wsj.com/public/page/archive-", 
     DateString[d, {"Year", "-", "MonthShort", "-", "DayShort"}], 
     ".html"]];
  archive = 
   StringDrop[archive, 
    StringPosition[archive, 
      DateString[d, {"MonthName", " ", "DayShort", ", ", "Year"}]][[1,
      2]]];
  archive = 
   StringTake[
    archive, -1 + StringPosition[archive, "ARCHIVE FILTER"][[1, 1]]];
  archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]];
  WSJSI = #Positive/(#Negative + #Positive) &@
     Counts[Classify["Sentiment", archivewords]] // N;
  {WSJSI, archivewords, archive}]

So, if we update the code, it should look like this:

WSJSentimentIndicator[date_ ] := 
 Module[{d = date, archive, archivewords, WSJSI}, 
archive = 
 Import[StringJoin["https://www.wsj.com/news/archive/", 
   DateString[d, {"Year", "Month", "Day"}]]];
archive = 
 StringDrop[archive, 
  StringPosition[archive, 
    DateString[
     d, {"MonthNameShort", " ", "DayShort", " ", 
      "Year"}]][[1, 2]]];
archive = 
 StringTake[
  archive, -1 + 
   StringPosition[archive, "Most Popular Articles"][[1, 1]]];
archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]];
 WSJSI = #Positive /(#Negative + #Positive) &@
     Counts[Classify["Sentiment", archivewords]] // N;
  {WSJSI, archivewords, archive}]

However, the code which returns us the histogram does not work for me:

WSJSI = Flatten[First@WSJSentimentIndicator[#]&/@datelist]
Histogram[tsWSJSI, PlotLabel -> Style["Histogram of WSJ Sentiment indicator",Bold]]

If you will have time, could you please help me solve this one? I have a feeling, that the WSJSI does not account for the 'datelist' correctly.

Thank you very much for your help, your code works perfectly!!!

If you have time, could you please explain, how did you understood in which format the date should be written.

And also, how exactly does this line work?

archive = 
 StringTake[
  archive, -1 + 
   StringPosition[archive, "Most Popular Articles"][[1, 1]]]

Does it somehow brings us into the "Most Popular Articles" section and gates titles from there?

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard