Community RSS Feed
https://community.wolfram.com
RSS Feed for Wolfram Community showing any discussions in tag Data Science sorted by active

[WSG21] Daily Study Group: Multiparadigm Data Scince
https://community.wolfram.com/groups//m/t/2244736
A new study group for Multiparadigm Data Science with the Wolfram Language begins Monday, Apr 19, 2021!
Making progress in an online course can be daunting when you have to study all alone. Join a cohort of fellow Wolfram Language users for a twoweek study group that works through the Wolfram U course "[Multiparadigm Data Science][1]". A certified instructor will guide each session by reviewing the lesson notebooks from the course, working through the code and answering questions.
Get support for starting on the path to earning Level 1 and Level 2 certifications in multiparadigm data science.
**Sign up here:** https://wolfr.am/UNdaIas0
[1]: https://www.wolfram.com/wolframu/multiparadigmdatascience/
Abrita Chakravarty
20210415T22:05:55Z

Hawaii weather stations list
https://community.wolfram.com/groups//m/t/2244206
Dear all,
I need a list of weather stations in Hawaii. Is it possible to extract that with Mathematica?
Thank you in advance for your kind support.
Alex Teymouri
20210414T20:20:42Z

Computational genealogy with the Wolfram Language
https://community.wolfram.com/groups//m/t/2241480
![enter image description here][2]
&[Wolfram Notebook][1]
[1]: https://www.wolframcloud.com/obj/427d481005a8402fb0deab997db2eac3
[2]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20210410at5.48.08PM.png&userId=228444
[Original]: https://www.wolframcloud.com/obj/rnachbar/Published/Genealogy%20With%20Wolfram%20Language.nb
Robert Nachbar
20210410T22:07:42Z

Looking for a similar function to Fold[ ]?
https://community.wolfram.com/groups//m/t/2241046
My current project has me creating yet another data mart using Microsoft Dynamics (a CRM solution) as the source. I have nearly 200 tables each having up to several hundred columns.
Unfortunately, the table and column names are a concatenation of words with no capitalization.
So, in order to help me build some documentation, and eventually the SQL for the data mart, I need to turn the gibberish into something the team (and our users) can easily read.
For example, I want to have a function to work like this:
strSeparateWords["fourscoreandsevenyearsago", {"four", "score",
"seven", "and", "seven", "years", "ago"}, "Capitalize" > True]
to return:
"Four Score And Seven Years Ago"
While I was able to write such a function, I cannot help but think that there is a better (more Wolframlike) solution.
Here is my solution:
Options[strSeparateWords] = {
"Capitalize" > False
};
strSeparateWords[string_String, word_String, opt : OptionsPattern[]] :=
Module[
{
replacement = If[OptionValue["Capitalize"], Capitalize[word], word]
},
StringTrim@StringReplace[string, word > " " <> replacement]
];
strSeparateWords[string_String, words_?matchListOfStringsQ,
opt : OptionsPattern[]] := Module[
{
retVal = string
},
Do[
retVal = strSeparateWords[retVal, word, opt]
,
{word, words}
];
retVal
];
I was hoping to find a function similar to Fold[] that works with single argument functions. In other words, I want a function that worked like this:
anotherFold[
f[initialValue, #] &,
{ a, b, c}
]
would return:
f[f[f[initialValue, a], b], c]
As you can see, I cannot even figure out a good name for such a function. But I think it would be useful to have a generic function like this.
I've searched the documentation and web for over an hour now. But, before giving up, I wanted to ask the Community.
Thanks, and have a great weekend.
Mike Besso
20210409T21:14:37Z

Pairwise Correlation of Financial Data
https://community.wolfram.com/groups//m/t/2242326
One of the regular tasks in statistical arbitrage is to compute correlations between a large universe of stocks, such as the S&P500 index members, for example.
Mathematica/WL has some very nice features for obtaining financial data and manipulating time series. And of course it offers all the commonly required statistical functions, including correlation. But the WL Correlation function is missing one vital feature  the ability to handle data series of unequal length. This arises, of course, because stock data series do not all share a common start date and (very occasionally) omit data for dates in the middle of the series. This creates an issue for the Correlation function, which can only handle series of equal length.
The usual way of handling this is to apply pairwise correlation, in which each pair of data vectors is truncated to include only the dates common to both series. Of course this can easily be done in WL; but it is very inefficient.
Let's take an example. We start with the last 10 symbols in the S&P 500 index membership:
In[1]:= tickers = Take[FinancialData["^GSPC", "Members"], 10]
Out[1]= {"NASDAQ:WYNN", "NASDAQ:XEL", "NYSE:XRX", "NASDAQ:XLNX", \
"NYSE:XYL", "NYSE:YUM", "NASDAQ:ZBRA", "NYSE:ZBH", "NASDAQ:ZION", \
"NYSE:ZTS"}
Next we obtain the returns series for these stocks, over the last several years. By default, FinancialData retrieves the data as TimeSeries Objects. This is very elegant, but slows the processing of the data, as we shall see.
tsStocks =
FinancialData[tickers, "Return",
DatePlus[Today, {2753, "BusinessDay"}]];
Not all the series contain the same number of datereturn pairs. So using Correlation is out of the question:
In[282]:= Table[Length@tsStocks[[i]]["Values"], {i, 10}]
Out[282]= {2762, 2762, 2762, 2762, 2388, 2762, 2762, 2762, 2762, 2060}
Since Correlation doesn't offer a pairwise option, we have to create the required functionality in WL. Let's start with:
PairsCorrelation[ts_] := Module[{td, correl},
If[ts[[1]]["PathLength"] == ts[[2]]["PathLength"],
correl = Correlation @@ ts,
td = TimeSeriesResample[ts, "Intersection"];
correl = Correlation @@ td[[All, All, 2]]]];
We first check to see if the two arguments are of equal length, in which case we can Apply the Correlation function directly. If not, we use the "Intersection" option of the TSResample function to reduce the series to a set of common observation dates. The function is designed to be deployed using parallelization, as follows:
PairsListCorrelation[tslist_] := Module[{pairs, i, td, c, correl = {}},
pairs = Subsets[Range[Length@tslist], {2}];
correl =
ParallelTable[
PairsCorrelation[tslist[[pairs[[i]]]]], {i, 1, Length@pairs}];
{correl, pairs}]
The Subsets function is used to generate a nonduplicative list of index pairs and then a correlation table is built in parallel using PairsCorrelation function on each pair of series.
When we apply the function to the ten stock time series, we get the following results:
In[263]:= AbsoluteTiming[{correl, pairs} =
PairsListCorrelation[tsStocks];]
Out[263]= {13.4791, Null}
In[270]:= Length@correl
Out[270]= 45
In[284]:= Through[{Mean, Median, Min, Max}[correl]]
Out[284]= {0.381958, 0.396429, 0.200828, 0.536383}
So far, so good. But look again at the timing of the PairsListCorrelation function. It takes 13.5 seconds to calculate the 45 correlation coefficients for 10 series. To carry out an equivalent exercise for the entire S&P 500 universe would entail computing 124,750 coefficients, taking approximately 10.5 hours! This is far too slow to be practically useful in the given context.
Some speed improvement is achievable by retrieving the stock returns data in legacy (i.e. list rather than time series) format, but it still takes around 10 seconds to calculate the coefficients for our 10 stocks. Perhaps further speed improvements are possible through other means (e.g. compilation), but what is really required is a core language function to handle series of unequal length (or a Pairwise method for the Correlation function).
For comparison, I can produce the correlation coefficients for all 500 S&P member stocks in under 3 seconds using the 'Rows', 'pairwise' options of the equivalent correlation function in another scientific computing language.

# UPDATE
Another Mathematica user suggested a way to speed up the pairwise correlation algorithm using associations.
We begin by downloading returns data for the S&P500 membership in legacy (i.e. list) format:
tickers = Take[FinancialData["^GSPC", "Members"]];
stockdata =
FinancialData[tickers, "Return",
DatePlus[Today, {753, "BusinessDay"}], Method > "Legacy"];
Then define:
PairwiseCorrelation[stockdata_] :=
Module[{assocStocks, pairs, correl},
assocStocks = Apply[Rule, stockdata, {2}] // Map[Association];
pairs = Subsets[Range@Length@assocStocks, {2}];
correl =
Map[Correlation @@ Values@KeyIntersection[assocStocks[[#]]] &,
pairs];
{correl, pairs}]
Here we are using the KeyIntersection function to identify common dates between two series, which is much faster than other methods. Accordingly:
In[317]:= AbsoluteTiming[{correl, pairs} =
PairwiseCorrelation[stockdata];]
Out[317]= {112.836, Null}
In[318]:= Length@correl
Out[318]= 127260
In[319]:= Through[{Mean, Median, Min, Max}[correl]]
Out[319]= {0.428747, 0.43533, 0.167036, 0.996379}
This is many times faster than the original algorithm and, although much slower (40x to 50x) than equivalent algorithms in other languages, gets the job done in reasonable time.
So I still think we need a Method> "Pairwise" option for the Correlation function.
Jonathan Kinlay
20210412T07:05:32Z

Hacking a complex function with Mathematica
https://community.wolfram.com/groups//m/t/2241822
&[Wolfram Notebook][1]
[1]: https://www.wolframcloud.com/obj/wolframcommunity/Published/mmaHack.nb
Robert Rimmer
20210411T01:21:31Z

[WSS20] Curve OCR for "AP Calculus"like "sketched" curves
https://community.wolfram.com/groups//m/t/2029803
![enter image description here][1]
&[Wolfram Notebook][2]
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Sinti%CC%81tulo100MODIFIED1.png&userId=1878279
[2]: https://www.wolframcloud.com/obj/ahutahaii/Published/projectNotebook_updated.nb
JosÃ© Antonio FernÃ¡ndez
20200714T16:52:27Z

Approval voting election: analysis and visualization
https://community.wolfram.com/groups//m/t/2240112
![enter image description here][1]
&[Wolfram Notebook][2]
[Original]: https://www.wolframcloud.com/obj/bobs/Published/STLMayorWardEssay.nb
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=elections_hero3.png&userId=20103
[2]: https://www.wolframcloud.com/obj/9c761b5a7710461a9231a538b42f9b8e
Bob Sandheinrich
20210408T16:18:16Z