Group Abstract

Message Boards

WOLFRAM COMMUNITY

17.8K Views

26 Replies

8 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Need help with Market Sentiment Analysis

Roman Ubaydullaev

Posted 7 years ago

POSTED BY: Roman Ubaydullaev

26 Replies

Sort By:

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 7 years ago

POSTED BY: Jonathan Kinlay

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 7 years ago

POSTED BY: Jonathan Kinlay

Rohit Namjoshi

Posted 7 years ago

POSTED BY: Rohit Namjoshi

Rohit Namjoshi

Posted 7 years ago

Hi Roman, Sorry for the late response. Yes, that is the code I used. I copied it from your post into a notebook, evaluated and tested with WSJSentimentIndicator[DateObject[{2012, 1, 3}]] The first time I evaluated, it failed because `Classify` returned an empty association. Subsequent evaluations worked fine. Not sure why - perhaps it has to retrieve some data from Wolfram servers and that timed out? If it happens frequently, the code can easily be modified to retry the `Classify`.

POSTED BY: Rohit Namjoshi

Rohit Namjoshi

Posted 7 years ago

Hi Roman, My bad. The second `DateString` format should be `{"MonthNameShort", " ", "DayShort", ", ", "Year"}`. For eight days: datelist = Table[DateObject[{2012, 1, n}], {n, 3, 10}]; WSJSI = Flatten[First@WSJSentimentIndicator[#] & /@ datelist] tsWSJSI = TimeSeries[Transpose[{datelist, WSJSI}]] Histogram[tsWSJSI, PlotLabel -> Style["Histogram of WSJ Sentiment indicator", Bold]]

Hi Roman,

My bad. The second DateString format should be {"MonthNameShort", " ", "DayShort", ", ", "Year"}.

For eight days:

datelist = Table[DateObject[{2012, 1, n}], {n, 3, 10}];
WSJSI = Flatten[First@WSJSentimentIndicator[#] & /@ datelist]
tsWSJSI = TimeSeries[Transpose[{datelist, WSJSI}]]
Histogram[tsWSJSI, PlotLabel -> Style["Histogram of WSJ Sentiment indicator", Bold]]

enter image description here

POSTED BY: Rohit Namjoshi

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 7 years ago

Hi Roman, Glad you got it working. Rohit is right, the format changed (which happens quite frequently, with news sites, unfortunately). Let me know if you run into other issues. Jonathan

POSTED BY: Jonathan Kinlay

Rohit Namjoshi

Posted 7 years ago

POSTED BY: Rohit Namjoshi

Rohit Namjoshi

Posted 7 years ago

POSTED BY: Rohit Namjoshi

Rohit Namjoshi

Posted 7 years ago

POSTED BY: Rohit Namjoshi

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 7 years ago

POSTED BY: Jonathan Kinlay

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 7 years ago

POSTED BY: Jonathan Kinlay

Roman Ubaydullaev

Posted 7 years ago

POSTED BY: Roman Ubaydullaev

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 7 years ago

The quantiles are found as follows: percentiles = Quantile[tsSIchange, {1/3, 2/3}]; bottompercentile = Flatten[Position[tsSIchange["Values"], x_ /; x < percentiles[[1]]]]; toppercentile = Flatten[Position[tsSIchange["Values"], x_ /; x > percentiles[[2]]]] So we set the strategy returns equal to the S&P 500 Index returns, except in the bottom 1/3 quantile of the sentiment indicator, where we reduce leverage by 1/leveragefactor, and in the top 1/3 quantile of the sentiment indicator, where we increase leverage and returns by multiplying by the leverage factor. What we are saying is: we will only adjust our strategy away from the market portfolio when the sentiment indicator is in the bottom 1/3 or top 1/3 quantile. In those periods we halve or double our exposure, respectively.

POSTED BY: Jonathan Kinlay

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 7 years ago

More generally: leveragefactor = 2.0; strategyreturns = tsSPXreturns["Values"]; strategyreturns[[bottompercentile]] = (1/leveragefactor)* strategyreturns[[bottompercentile]]; strategyreturns[[toppercentile]] = leveragefactorstrategyreturns[[toppercentile]]; tsVTDSPX = TimeSeries[ Transpose[{datelist, 1000FoldList[Times, 1, 1 + tsSPXreturns["Values"]]}]]; tsVTDstrategy = TimeSeries[ Transpose[{datelist, 1000*FoldList[Times, 1, 1 + strategyreturns]}]];

More generally:

leveragefactor = 2.0;
strategyreturns = tsSPXreturns["Values"];
strategyreturns[[bottompercentile]] = (1/leveragefactor)*
   strategyreturns[[bottompercentile]];
strategyreturns[[toppercentile]] = 
  leveragefactor*strategyreturns[[toppercentile]];
tsVTDSPX = 
  TimeSeries[
   Transpose[{datelist, 
     1000*FoldList[Times, 1, 1 + tsSPXreturns["Values"]]}]];
tsVTDstrategy = 
  TimeSeries[
   Transpose[{datelist, 
     1000*FoldList[Times, 1, 1 + strategyreturns]}]];

POSTED BY: Jonathan Kinlay

Roman Ubaydullaev

Posted 7 years ago

Hi Rohit, so, according to Dr.Kinlay's answer, i assume, that code for tsVTDSPX will be the same as for tsVTDStrategy, just without quantiles: strategyreturns = tsSPXReturns["Values"]; strategyreturns[[bottompercentile]] = (1/2)* strategyreturns[[bottompercentile]]; strategyreturns[[toppercentile]] = 2strategyreturns[[toppercentile]]; tsVTDstrategy = TimeSeries[ Transpose[{datelist, 1000FoldList[Times, 1, 1 + strategyreturns]}]] marketreturns = tsSPXReturns["Values"]; tsVTDSPX = TimeSeries[ Transpose[{datelist, 1000*FoldList[Times, 1, 1 + marketreturns]}]] With the leverage factor 2. However it is not working for me - i get transpose error.

Hi Rohit, so, according to Dr.Kinlay's answer, i assume, that code for tsVTDSPX will be the same as for tsVTDStrategy, just without quantiles:

strategyreturns = tsSPXReturns["Values"];
strategyreturns[[bottompercentile]] = (1/2)*
   strategyreturns[[bottompercentile]];
strategyreturns[[toppercentile]] = 2*strategyreturns[[toppercentile]];
tsVTDstrategy = 
 TimeSeries[
  Transpose[{datelist, 1000*FoldList[Times, 1, 1 + strategyreturns]}]]
marketreturns = tsSPXReturns["Values"];
tsVTDSPX = 
 TimeSeries[
  Transpose[{datelist, 1000*FoldList[Times, 1, 1 + marketreturns]}]]

With the leverage factor 2. However it is not working for me - i get transpose error.

POSTED BY: Roman Ubaydullaev

Roman Ubaydullaev

Posted 7 years ago

Hello Rohit! The code which you proposed worked fine! I tried it on the WSJ articles for the last 3 years and out of 779 values, I had output for 710, with 69 Null values, which is fine. The rest I calculated manually afterwards, because complex computations too much time. I am now going through the last part of the paper - construction of the Trading Algorithm. Do you have any idea, what stands for tsVTDSPX and tsStrategy? period = QuantityMagnitude@ DateDifference[First@datelist, Last@datelist, "Year"]; AnnStd = Sqrt[252]* StandardDeviation[ Transpose@{tsSPXreturns["Values"], strategyreturns}]; cf = {tsVTDSPX[Last@datelist]/1000 - 1, tsVTDstrategy[Last@datelist]/1000 - 1}; CAGR = -1 + (1 + cf)^(1/period); IR = CAGR/AnnStd; Print[Style["News Sentiment Strategy", "Subsection"]]; P1 = Style[ NumberForm[ TableForm[{CAGR, AnnStd, IR}, TableHeadings -> {{"CAGR", "Ann. StDev.", "IR"}, {Style["SP500 Index", Bold], Style["Strategy", Bold]}}], {6, 2}], FontSize -> 14]; P2 = DateListPlot[{tsVTDSPX, tsVTDstrategy}, Filling -> Axis, PlotLegends -> {"S&P500 Index", "Strategy"}, PlotLabel -> Style["Value of $1,000", Bold], ImageSize -> Medium]; Print[P1]; Print[P2]; In the next computation, Dr.Kinlay defines tsStrategy as tsSPXReturns, however, as for tsVTDSPX I have no idea.

Hello Rohit! The code which you proposed worked fine! I tried it on the WSJ articles for the last 3 years and out of 779 values, I had output for 710, with 69 Null values, which is fine. The rest I calculated manually afterwards, because complex computations too much time. I am now going through the last part of the paper - construction of the Trading Algorithm. Do you have any idea, what stands for tsVTDSPX and tsStrategy?

period = QuantityMagnitude@
   DateDifference[First@datelist, Last@datelist, "Year"];
AnnStd = Sqrt[252]*
   StandardDeviation[
    Transpose@{tsSPXreturns["Values"], strategyreturns}];
cf = {tsVTDSPX[Last@datelist]/1000 - 1, 
   tsVTDstrategy[Last@datelist]/1000 - 1};
CAGR = -1 + (1 + cf)^(1/period);
IR = CAGR/AnnStd;

Print[Style["News Sentiment Strategy", "Subsection"]];
P1 = Style[
   NumberForm[
    TableForm[{CAGR, AnnStd, IR}, 
     TableHeadings -> {{"CAGR", "Ann. StDev.", 
        "IR"}, {Style["SP500 Index", Bold], 
        Style["Strategy", Bold]}}], {6, 2}], FontSize -> 14];
P2 = DateListPlot[{tsVTDSPX, tsVTDstrategy}, Filling -> Axis, 
   PlotLegends -> {"S&P500 Index", "Strategy"}, 
   PlotLabel -> Style["Value of $1,000", Bold], ImageSize -> Medium];
Print[P1];
Print[P2];

In the next computation, Dr.Kinlay defines tsStrategy as tsSPXReturns, however, as for tsVTDSPX I have no idea.

POSTED BY: Roman Ubaydullaev

Roman Ubaydullaev

Posted 7 years ago

Dear Dr. Kinlay, I am replicating the last part of your study now. If you will have time, could you please explain, what tsVTDSPX and tsVTDStrategy functions are? period = QuantityMagnitude@ DateDifference[First@datelist, Last@datelist, "Year"]; AnnStd = Sqrt[252]* StandardDeviation[ Transpose@{tsSPXreturns["Values"], strategyreturns}]; cf = {tsVTDSPX[Last@datelist]/1000 - 1, tsVTDstrategy[Last@datelist]/1000 - 1}; CAGR = -1 + (1 + cf)^(1/period); IR = CAGR/AnnStd; Print[Style["News Sentiment Strategy", "Subsection"]]; P1 = Style[ NumberForm[ TableForm[{CAGR, AnnStd, IR}, TableHeadings -> {{"CAGR", "Ann. StDev.", "IR"}, {Style["SP500 Index", Bold], Style["Strategy", Bold]}}], {6, 2}], FontSize -> 14]; P2 = DateListPlot[{tsVTDSPX, tsVTDstrategy}, Filling -> Axis, PlotLegends -> {"S&P500 Index", "Strategy"}, PlotLabel -> Style["Value of $1,000", Bold], ImageSize -> Medium]; Print[P1]; Print[P2];

Dear Dr. Kinlay, I am replicating the last part of your study now. If you will have time, could you please explain, what tsVTDSPX and tsVTDStrategy functions are?

period = QuantityMagnitude@
   DateDifference[First@datelist, Last@datelist, "Year"];
AnnStd = Sqrt[252]*
   StandardDeviation[
    Transpose@{tsSPXreturns["Values"], strategyreturns}];
cf = {tsVTDSPX[Last@datelist]/1000 - 1, 
   tsVTDstrategy[Last@datelist]/1000 - 1};
CAGR = -1 + (1 + cf)^(1/period);
IR = CAGR/AnnStd;

Print[Style["News Sentiment Strategy", "Subsection"]];
P1 = Style[
   NumberForm[
    TableForm[{CAGR, AnnStd, IR}, 
     TableHeadings -> {{"CAGR", "Ann. StDev.", 
        "IR"}, {Style["SP500 Index", Bold], 
        Style["Strategy", Bold]}}], {6, 2}], FontSize -> 14];
P2 = DateListPlot[{tsVTDSPX, tsVTDstrategy}, Filling -> Axis, 
   PlotLegends -> {"S&P500 Index", "Strategy"}, 
   PlotLabel -> Style["Value of $1,000", Bold], ImageSize -> Medium];
Print[P1];
Print[P2];

POSTED BY: Roman Ubaydullaev

Roman Ubaydullaev

Posted 7 years ago

Thank you very much Rohit! I will try run the workaround solution tonight and I will check out the WebExecute!

POSTED BY: Roman Ubaydullaev

Rohit Namjoshi

Posted 7 years ago

Hi Roman, I took a closer look at why it fails. It is the `Import` that occasionally returns partial results. The return value has the preamble text, the header text for the archive e.g. "News Archive for Jun 25, 2019" followed by blank lines, followed by "Most Popular Articles". The entire archive section is missing. It is not a bug in `Import`, I can reproduce it with a simple Ruby program. require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML(open("https://www.wsj.com/news/archive/20120105")) txt = doc.xpath("//*[@id='root']/div/div/div/div[2]/div/div/div[2]/div[1]/div/div").text puts txt My guess is that the ads and other asynchronous JavaScript have to run before the archive section is rendered. To work around this, use a helper function and retry the `Import` operation until it succeeds. A better way would be to use WebExecute if you are on version 12. You could experiment with that. As far as performance, the `Import` is quite slow, taking ~15s to complete. With the workaround you can just increase `maxRetries` and leave it running overnight. Workaround: ClearAll[downloadArchiveWords, WSJSentimentIndicator] downloadArchiveWords[date_] := Module[{end = "Most Popular Articles", url, archive}, url = StringJoin["https://www.wsj.com/news/archive/", DateString[date, {"Year", "Month", "Day"}]]; archive = Import[url]; archive = StringDrop[archive, StringPosition[archive, DateString[date, {"MonthNameShort", " ", "DayShort", ", ", "Year"}]][[1, 2]]]; archive = StringTake[archive, -1 + StringPosition[archive, end][[1, 1]]]; {ToLowerCase[DeleteStopwords[TextWords[archive]]], archive}]; WSJSentimentIndicator[date_, maxRetries:_?Positive:5] := Module[{retries, archivewords, archive, WSJSI}, retries = 0; archivewords = ""; While[Length@archivewords == 0 && retries < maxRetries, {archivewords, archive} = downloadArchiveWords[date]; retries++; Pause[1]]; WSJSI = #Positive/(#Negative + #Positive) &@Counts[Classify["Sentiment", archivewords]] // N; {WSJSI, archivewords, archive}];

Hi Roman,

I took a closer look at why it fails. It is the Import that occasionally returns partial results. The return value has the preamble text, the header text for the archive e.g. "News Archive for Jun 25, 2019" followed by blank lines, followed by "Most Popular Articles". The entire archive section is missing. It is not a bug in Import, I can reproduce it with a simple Ruby program.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("https://www.wsj.com/news/archive/20120105"))
txt = doc.xpath("//*[@id='root']/div/div/div/div[2]/div/div/div[2]/div[1]/div/div").text

puts txt

My guess is that the ads and other asynchronous JavaScript have to run before the archive section is rendered.

To work around this, use a helper function and retry the Import operation until it succeeds. A better way would be to use WebExecute if you are on version 12. You could experiment with that.

As far as performance, the Import is quite slow, taking ~15s to complete. With the workaround you can just increase maxRetries and leave it running overnight.

Workaround:

ClearAll[downloadArchiveWords, WSJSentimentIndicator]

downloadArchiveWords[date_] := 
  Module[{end = "Most Popular Articles", url, archive},
   url = StringJoin["https://www.wsj.com/news/archive/", DateString[date, {"Year", "Month", "Day"}]];
   archive = Import[url];
   archive = 
    StringDrop[archive, 
     StringPosition[archive, 
       DateString[date, {"MonthNameShort", " ", "DayShort", ", ", "Year"}]][[1, 2]]];
   archive = StringTake[archive, -1 + StringPosition[archive, end][[1, 1]]];
   {ToLowerCase[DeleteStopwords[TextWords[archive]]], archive}];

WSJSentimentIndicator[date_, maxRetries:_?Positive:5] := 
 Module[{retries, archivewords, archive, WSJSI},
  retries = 0;
  archivewords = "";
  While[Length@archivewords == 0 && retries < maxRetries,
   {archivewords, archive} =  downloadArchiveWords[date]; retries++; Pause[1]];
  WSJSI = #Positive/(#Negative + #Positive) &@Counts[Classify["Sentiment", archivewords]] // N;
  {WSJSI, archivewords, archive}];

POSTED BY: Rohit Namjoshi

Jonathan Kinlay

Jonathan Kinlay, Systematic Strategies

Posted 7 years ago

Roman, Yes sure, no problem.

POSTED BY: Jonathan Kinlay

Roman Ubaydullaev

Posted 7 years ago

Thank you very much Rohit! I now understand why I have problems with computation - Mathematica just gives me fractured output every time. For example: {0.6781, #Positive/(#Negative + #Positive), #Positive/(#Negative + #Positive), 0.7890, 0.6905, #Positive/(#Negative + \ #Positive), #Positive/(#Negative + #Positive)} I have to repeat calculation 5-6 times to get the full output. I just wanted to run the analysis based on the data of a couple of years but I barely can run it for 8 day (calculation takes around 5min and not always gives full output). I think that the problem is either in my PC, even though I have i7 and 16 RAM, or maybe the code is too heavy. Do you know, if it is possible to add some line of the code to WSJSI = Flatten[First@WSJSentimentIndicator[#] & /@ datelist] tsWSJSI = TimeSeries[Transpose[{datelist, WSJSI}]] Histogram[tsWSJSI, PlotLabel -> Style["Histogram of WSJ Sentiment indicator", Bold]] to make it repeat the calculation over and other until success? Thank you very much for your time.

POSTED BY: Roman Ubaydullaev

Roman Ubaydullaev

Posted 7 years ago

Dear Dr Kinlay, i will be writing a course paper in international finance soon and I would like to replicate your market sentiment analysis but within the different timeframe and also with the use of twitter/reddit articles data. May I ask you, whether it is possible for me to use your code and reference your article in my paper? Thank you very much for your time. Roman

POSTED BY: Roman Ubaydullaev

Roman Ubaydullaev

Posted 7 years ago

Thank you very much Rohit! And the module code you did't change? It looks like that for you? WSJSentimentIndicator[date_ ] := Module[{d = date, archive, archivewords, WSJSI}, archive = Import[StringJoin["https://www.wsj.com/news/archive/", DateString[d, {"Year", "Month", "Day"}]]]; archive = StringDrop[archive, StringPosition[archive, DateString[ d, {"MonthNameShort", " ", "DayShort", ", ", "Year"}]][[1, 2]]]; archive = StringTake[ archive, -1 + StringPosition[archive, "Most Popular Articles"][[1, 1]]]; archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]]; WSJSI = #Positive /(#Negative + #Positive) &@ Counts[Classify["Sentiment", archivewords]] // N; {WSJSI, archivewords, archive}]

Thank you very much Rohit!

And the module code you did't change?

It looks like that for you?

WSJSentimentIndicator[date_ ] := 
 Module[{d = date, archive, archivewords, WSJSI}, 
archive = 
 Import[StringJoin["https://www.wsj.com/news/archive/", 
   DateString[d, {"Year", "Month", "Day"}]]];
archive = 
 StringDrop[archive, 
  StringPosition[archive, 
    DateString[
     d, {"MonthNameShort", " ", "DayShort", ", ", 
      "Year"}]][[1, 2]]];
archive = 
 StringTake[
  archive, -1 + 
   StringPosition[archive, "Most Popular Articles"][[1, 1]]];
archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]];
 WSJSI = #Positive /(#Negative + #Positive) &@
     Counts[Classify["Sentiment", archivewords]] // N;
  {WSJSI, archivewords, archive}]

POSTED BY: Roman Ubaydullaev

Roman Ubaydullaev

Posted 7 years ago

Thank you very much Rohit! And the module code you did't change? It looks like that for you? WSJSentimentIndicator[date_ ] := Module[{d = date, archive, archivewords, WSJSI}, archive = Import[StringJoin["https://www.wsj.com/news/archive/", DateString[d, {"Year", "Month", "Day"}]]]; archive = StringDrop[archive, StringPosition[archive, DateString[ d, {"MonthNameShort", " ", "DayShort", ", ", "Year"}]][[1, 2]]]; archive = StringTake[ archive, -1 + StringPosition[archive, "Most Popular Articles"][[1, 1]]]; archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]]; WSJSI = #Positive /(#Negative + #Positive) &@ Counts[Classify["Sentiment", archivewords]] // N; {WSJSI, archivewords, archive}]

Thank you very much Rohit!

And the module code you did't change?

It looks like that for you?

WSJSentimentIndicator[date_ ] := 
 Module[{d = date, archive, archivewords, WSJSI}, 
archive = 
 Import[StringJoin["https://www.wsj.com/news/archive/", 
   DateString[d, {"Year", "Month", "Day"}]]];
archive = 
 StringDrop[archive, 
  StringPosition[archive, 
    DateString[
     d, {"MonthNameShort", " ", "DayShort", ", ", 
      "Year"}]][[1, 2]]];
archive = 
 StringTake[
  archive, -1 + 
   StringPosition[archive, "Most Popular Articles"][[1, 1]]];
archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]];
 WSJSI = #Positive /(#Negative + #Positive) &@
     Counts[Classify["Sentiment", archivewords]] // N;
  {WSJSI, archivewords, archive}]

POSTED BY: Roman Ubaydullaev

Roman Ubaydullaev

Posted 7 years ago

Thank you very much for your reply Jonathan, I find your article very interesting and inspiring. I am not a proficient Wolfram user, unfortunately. However, I study finance now, and I would really like to learn how you did this sentiment analysis. With the help of Rohit, I went through the first part of code, and now I have some troubles with the WSJSentimentIndicator: WSJSentimentIndicator[date_] := Module[{d = date, archive, archivewords, WSJSI}, archive = Import[StringJoin["http://www.wsj.com/public/page/archive-", DateString[d, {"Year", "-", "MonthShort", "-", "DayShort"}], ".html"]]; archive = StringDrop[archive, StringPosition[archive, DateString[d, {"MonthName", " ", "DayShort", ", ", "Year"}]][[1, 2]]]; archive = StringTake[ archive, -1 + StringPosition[archive, "ARCHIVE FILTER"][[1, 1]]]; archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]]; WSJSI = #Positive/(#Negative + #Positive) &@ Counts[Classify["Sentiment", archivewords]] // N; {WSJSI, archivewords, archive}] So, if we update the code, it should look like this: WSJSentimentIndicator[date_ ] := Module[{d = date, archive, archivewords, WSJSI}, archive = Import[StringJoin["https://www.wsj.com/news/archive/", DateString[d, {"Year", "Month", "Day"}]]]; archive = StringDrop[archive, StringPosition[archive, DateString[ d, {"MonthNameShort", " ", "DayShort", " ", "Year"}]][[1, 2]]]; archive = StringTake[ archive, -1 + StringPosition[archive, "Most Popular Articles"][[1, 1]]]; archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]]; WSJSI = #Positive /(#Negative + #Positive) &@ Counts[Classify["Sentiment", archivewords]] // N; {WSJSI, archivewords, archive}] However, the code which returns us the histogram does not work for me: WSJSI = Flatten[First@WSJSentimentIndicator[#]&/@datelist] Histogram[tsWSJSI, PlotLabel -> Style["Histogram of WSJ Sentiment indicator",Bold]] If you will have time, could you please help me solve this one? I have a feeling, that the WSJSI does not account for the 'datelist' correctly.

Thank you very much for your reply Jonathan,

I find your article very interesting and inspiring. I am not a proficient Wolfram user, unfortunately. However, I study finance now, and I would really like to learn how you did this sentiment analysis.

With the help of Rohit, I went through the first part of code, and now I have some troubles with the WSJSentimentIndicator:

WSJSentimentIndicator[date_] := 
 Module[{d = date, archive, archivewords, WSJSI}, 
  archive = 
   Import[StringJoin["http://www.wsj.com/public/page/archive-", 
     DateString[d, {"Year", "-", "MonthShort", "-", "DayShort"}], 
     ".html"]];
  archive = 
   StringDrop[archive, 
    StringPosition[archive, 
      DateString[d, {"MonthName", " ", "DayShort", ", ", "Year"}]][[1,
      2]]];
  archive = 
   StringTake[
    archive, -1 + StringPosition[archive, "ARCHIVE FILTER"][[1, 1]]];
  archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]];
  WSJSI = #Positive/(#Negative + #Positive) &@
     Counts[Classify["Sentiment", archivewords]] // N;
  {WSJSI, archivewords, archive}]

So, if we update the code, it should look like this:

WSJSentimentIndicator[date_ ] := 
 Module[{d = date, archive, archivewords, WSJSI}, 
archive = 
 Import[StringJoin["https://www.wsj.com/news/archive/", 
   DateString[d, {"Year", "Month", "Day"}]]];
archive = 
 StringDrop[archive, 
  StringPosition[archive, 
    DateString[
     d, {"MonthNameShort", " ", "DayShort", " ", 
      "Year"}]][[1, 2]]];
archive = 
 StringTake[
  archive, -1 + 
   StringPosition[archive, "Most Popular Articles"][[1, 1]]];
archivewords = ToLowerCase[DeleteStopwords[TextWords[archive]]];
 WSJSI = #Positive /(#Negative + #Positive) &@
     Counts[Classify["Sentiment", archivewords]] // N;
  {WSJSI, archivewords, archive}]

However, the code which returns us the histogram does not work for me:

WSJSI = Flatten[First@WSJSentimentIndicator[#]&/@datelist]
Histogram[tsWSJSI, PlotLabel -> Style["Histogram of WSJ Sentiment indicator",Bold]]

If you will have time, could you please help me solve this one? I have a feeling, that the WSJSI does not account for the 'datelist' correctly.

POSTED BY: Roman Ubaydullaev

Roman Ubaydullaev

Posted 7 years ago

Thank you very much for your help, your code works perfectly!!! If you have time, could you please explain, how did you understood in which format the date should be written. And also, how exactly does this line work? archive = StringTake[ archive, -1 + StringPosition[archive, "Most Popular Articles"][[1, 1]]] Does it somehow brings us into the "Most Popular Articles" section and gates titles from there?

POSTED BY: Roman Ubaydullaev

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback