Message Boards Message Boards

GROUPS:

[CALL] Making COVID-19 Data Computable

Posted 3 months ago
2065 Views
|
9 Replies
|
15 Total Likes
|

MODERATOR NOTE: coronavirus resources & updates: https://wolfr.am/coronavirus


It's inspiring to see so many people in the community devoting time and expertise to tracking and understanding this pandemic. Here at Wolfram, we've been trying to keep on top of a few critical datasets related to COVID-19 (see the "Curated Computable Data" section in the main hub post linked above) and present them through the Wolfram Data Repository so they're ready for immediate analysis in the Wolfram Language. But there are obviously tons of other data resources available, either directly relating to the pandemic or providing helpful historical, medical, or other context. We've had a lot of different suggestions inside the company for further additions to WDR or Wolfram|Alpha, but we can't possibly tackle everything — so we'd like to invite people in the community to contribute to making more data surrounding this topic computable.

First: there are a few suggestions below, but we'd welcome others; I'd hope that part of this hub could just be a growing list of links to useful and relevant external data sources. Even if you don't have the time or ability to scrape and transform a particular dataset yourself, someone else might be able to tackle it. A few initial ideas:

Second: if you have the time and inclination to scrape data from one of these sources and try to put it in a more computable form, please jump in and do so! We've already seen what people can do with the "curated computable data" we're providing, and having more neatly-packaged data resources available to the community would benefit everyone.

If you're not completely confident in your Wolfram Language data curation skills, we do have a lot of resources to help you improve, like a helpful screencast series at

https://github.com/WolframResearch/Data-Curation-Training.

And if you've got a dataset or collection of data around this topic that you'd like to make available to the community — I strongly recommend preparing it using the Data Resource Definition Notebook (available from the File menu in Mathematica, as below):

enter image description here

Particularly for work being done around the topic of COVID-19, where time is of the essence and many data resources are changing rapidly, I'd also strongly encourage you to deploy finished resources directly to the cloud, instead of submitting to the Wolfram Data Repository for formal review and publication:

enter image description here

Deploying resources in this way will make them easily and immediately accessible to other users, and also automatically create a "shingle" for your data resource where anyone can view your data and analysis, while Wolfram Language users can directly access the underlying data as easily as any object published in the WDR:

DateListPlot[
 ResourceData[
   "https://www.wolframcloud.com/obj/alanj/DeployedResources/Data/Hospital-Beds-by-Country"][All, #TotalBedsPer1000Inhabitants &]]

enter image description here

In the interest of making data available to people as quickly as possible, this is probably the best route to follow right now — but I would definitely hope to see lots of user creations become formal Wolfram Data Repository submissions eventually, after "peer review" and revision by the community.

9 Replies

For United States data two nice sources are https://covidtracking.com/ and the NY Times git hub repo here: https://github.com/nytimes/covid-19-data. With some help I created resource functions for importing the latest data from each of them. They were both published in the Wolfram Function Repository today.

https://resources.wolframcloud.com/FunctionRepository/resources/COVIDTrackingData https://resources.wolframcloud.com/FunctionRepository/resources/NYTimesCOVID19Data

See the documentation pages above for examples on how to use them.

Unlike data repository entries where data is periodically updated but always good, these function repository entries are dependent on evolving third party services, so they could break. But they always grab the latest data from those services.

Posted 3 months ago
Posted 3 months ago

I have tracked accumulated Case and Death data for Oklahoma from Worldometers.info.

Because these data are somewhat noisy, I used an exponential low pass on the input with time constant ~ 3 days and fit the series to a sigmoid function of this kind: cases = k / (1 + exp( A - b*days)) like this: OK Sum of Reported CV-19 Cases

I continue by presenting the logistic function (with best-fit parameters) to Wolfram Alpha to plot the differential : which is cases per day like this: Oklahoma  CV-19 Cases per Day

In the same way I fit deaths to a logistic like this: Ok UNSMOOTHED CV-19 Deaths

and the resulting Alpha plot of death rate like this:  OK CV-19 Death Rate

Here is a plot of residual errors in the case fitting function: enter image description here I noticed a growing 7 day oscillation in reported cases. This may be a clerical artifact - or some intyernal dynamic.

DISCUSSION.
The logistic function is just two model steps away from the initial exponential function, and for later days the intermediate Straight Line fit. It provides high R^2 values for the largely pre-peak data. The logistic function has a short-coming to balance its simplicity: it uses the same exponential parameters before AND after the peak. This may become less realistic if the post-peak data decays on a LONGER time constant as it may well do. I hope that this approach may dissuade modelers starting from case/day data which is inevitably more noisy than the sigmoid case progression. It is perfectly possible to require the non-linear regression application to refit the exponential rate after an established peak rate day, but I do not yet have sufficient data to make this practical.

Posted 3 months ago
Posted 2 months ago

This note gave me a warm fuzzy for sure! I don't have an immediate need, but it speaks to practical purposes for useful ends. Thank you! Brian W

I've wrangled the US county data of Cases and Deaths from JHU dataset.

CloudGet[CloudObject[
  "https://www.wolframcloud.com/obj/47230f9f-9bc0-4682-9aff-bdde93e98544"]]

The dataset has this structure: enter image description here

Example:

CloudGet[CloudObject[
    "https://www.wolframcloud.com/obj/47230f9f-9bc0-4682-9aff-bdde93e98544"]][
  SelectFirst[#USCounty == 
     Entity["AdministrativeDivision", {"LosAngelesCounty", 
       "California", "UnitedStates"}] &]][
 DateListPlot[
   Through[{Differences, 
      MovingAverage[Differences[#], Quantity[1, "Weeks"]] &}[#Cases]],
    PlotRange -> All, 
   PlotLegends -> {"Daily new cases", "Filtered over 1 week"}] &]

enter image description here

Please check out my post where I processed Facebook population density map and make them computable datasets in Wolfram Language. It can be very useful for epidemic models using population densities.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract