Group Abstract

Message Boards

WOLFRAM COMMUNITY

22.1K Views

9 Replies

16 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Biological Sciences Data Science Medical Sciences Curated Data Wolfram Language Challenges Events & Media

[CALL] Making COVID-19 Data Computable

Alan Joyce

Alan Joyce, Wolfram|Alpha

Posted 5 years ago

MODERATOR NOTE: coronavirus resources & updates: https://wolfr.am/coronavirus It's inspiring to see so many people in the community devoting time and expertise to tracking and understanding this pandemic. Here at Wolfram, we've been trying to keep on top of a few critical datasets related to COVID-19 (see the "Curated Computable Data" section in the main hub post linked above) and present them through the Wolfram Data Repository so they're ready for immediate analysis in the Wolfram Language. But there are obviously tons of other data resources available, either directly relating to the pandemic or providing helpful historical, medical, or other context. We've had a lot of different suggestions inside the company for further additions to WDR or Wolfram\|Alpha, but we can't possibly tackle everything — so we'd like to invite people in the community to contribute to making more data surrounding this topic computable. First: there are a few suggestions below, but we'd welcome others; I'd hope that part of this hub could just be a growing list of links to useful and relevant external data sources. Even if you don't have the time or ability to scrape and transform a particular dataset yourself, someone else might be able to tackle it. A few initial ideas: Spanish Flu data (R code applied to data from mortality.org) Spanish Flu in NYC Past Pandemics Italian COVID-19 data National Notifiable Diseases Surveillance System FAO Food Price Monitoring and Analysis Tool US Health Weather Map State actions to address coronavirus Second: if you have the time and inclination to scrape data from one of these sources and try to put it in a more computable form, please jump in and do so! We've already seen what people can do with the "curated computable data" we're providing, and having more neatly-packaged data resources available to the community would benefit everyone. If you're not completely confident in your Wolfram Language data curation skills, we do have a lot of resources to help you improve, like a helpful screencast series at https://github.com/WolframResearch/Data-Curation-Training. And if you've got a dataset or collection of data around this topic that you'd like to make available to the community — I strongly recommend preparing it using the Data Resource Definition Notebook (available from the File menu in Mathematica, as below): Particularly for work being done around the topic of COVID-19, where time is of the essence and many data resources are changing rapidly, I'd also strongly encourage you to deploy finished resources directly to the cloud, instead of submitting to the Wolfram Data Repository for formal review and publication: Deploying resources in this way will make them easily and immediately accessible to other users, and also automatically create a "shingle" for your data resource where anyone can view your data and analysis, while Wolfram Language users can directly access the underlying data as easily as any object published in the WDR: DateListPlot[ ResourceData[ "https://www.wolframcloud.com/obj/alanj/DeployedResources/Data/Hospital-Beds-by-Country"][All, #TotalBedsPer1000Inhabitants &]] In the interest of making data available to people as quickly as possible, this is probably the best route to follow right now — but I would definitely hope to see lots of user creations become formal Wolfram Data Repository submissions eventually, after "peer review" and revision by the community.

POSTED BY: Alan Joyce

9 Replies

Sort By:

Bob Sandheinrich

Bob Sandheinrich, Wolfram Research

Posted 5 years ago

For United States data two nice sources are https://covidtracking.com/ and the NY Times git hub repo here: https://github.com/nytimes/covid-19-data. With some help I created resource functions for importing the latest data from each of them. They were both published in the Wolfram Function Repository today. https://resources.wolframcloud.com/FunctionRepository/resources/COVIDTrackingData https://resources.wolframcloud.com/FunctionRepository/resources/NYTimesCOVID19Data See the documentation pages above for examples on how to use them. Unlike data repository entries where data is periodically updated but always good, these function repository entries are dependent on evolving third party services, so they could break. But they always grab the latest data from those services.

POSTED BY: Bob Sandheinrich

Mohammad Bahrami

Mohammad Bahrami, Wolfram Research Inc

Posted 5 years ago

Please check out my post where I processed Facebook population density map and make them computable datasets in Wolfram Language. It can be very useful for epidemic models using population densities.

POSTED BY: Mohammad Bahrami

Alan Joyce

Alan Joyce, Wolfram|Alpha

Posted 5 years ago

We've had a few other WDR additions in the past week or so — not all directly related to COVID-19, but possibly of use for historical or other kinds of context: https://datarepository.wolframcloud.com/resources/OECD-Data-Hospital-Beds-Per-Country https://datarepository.wolframcloud.com/resources/Hospital-Beds-Per-US-State https://datarepository.wolframcloud.com/resources/COVID-19-Hospital-Resource-Use-Projections

POSTED BY: Alan Joyce

Mohammad Bahrami

Mohammad Bahrami, Wolfram Research Inc

Posted 5 years ago

I've wrangled the Google mobility data of USA and Canada

POSTED BY: Mohammad Bahrami

Mohammad Bahrami

Mohammad Bahrami, Wolfram Research Inc

Posted 5 years ago

I've wrangled the US county data of Cases and Deaths from JHU dataset. CloudGet[CloudObject[ "https://www.wolframcloud.com/obj/47230f9f-9bc0-4682-9aff-bdde93e98544"]] The dataset has this structure: Example: CloudGet[CloudObject[ "https://www.wolframcloud.com/obj/47230f9f-9bc0-4682-9aff-bdde93e98544"]][ SelectFirst[#USCounty == Entity["AdministrativeDivision", {"LosAngelesCounty", "California", "UnitedStates"}] &]][ DateListPlot[ Through[{Differences, MovingAverage[Differences[#], Quantity[1, "Weeks"]] &}[#Cases]], PlotRange -> All, PlotLegends -> {"Daily new cases", "Filtered over 1 week"}] &]

I've wrangled the US county data of Cases and Deaths from JHU dataset.

CloudGet[CloudObject[
  "https://www.wolframcloud.com/obj/47230f9f-9bc0-4682-9aff-bdde93e98544"]]

The dataset has this structure: enter image description here

Example:

CloudGet[CloudObject[
    "https://www.wolframcloud.com/obj/47230f9f-9bc0-4682-9aff-bdde93e98544"]][
  SelectFirst[#USCounty == 
     Entity["AdministrativeDivision", {"LosAngelesCounty", 
       "California", "UnitedStates"}] &]][
 DateListPlot[
   Through[{Differences, 
      MovingAverage[Differences[#], Quantity[1, "Weeks"]] &}[#Cases]],
    PlotRange -> All, 
   PlotLegends -> {"Daily new cases", "Filtered over 1 week"}] &]

enter image description here

POSTED BY: Mohammad Bahrami

Brian Whatcott

Posted 5 years ago

This note gave me a warm fuzzy for sure! I don't have an immediate need, but it speaks to practical purposes for useful ends. Thank you! Brian W

POSTED BY: Brian Whatcott

Robert Rimmer

Posted 5 years ago

You might want to look at these posts: Logistic Model for Quarantine Controlled Epidemics Predicting Coronavirus Epidemic in United States

POSTED BY: Robert Rimmer

Brian Whatcott

Posted 5 years ago

I have tracked accumulated Case and Death data for Oklahoma from Worldometers.info. Because these data are somewhat noisy, I used an exponential low pass on the input with time constant ~ 3 days and fit the series to a sigmoid function of this kind: cases = k / (1 + exp( A - b*days)) like this: I continue by presenting the logistic function (with best-fit parameters) to Wolfram Alpha to plot the differential : which is cases per day like this: In the same way I fit deaths to a logistic like this: and the resulting Alpha plot of death rate like this: Here is a plot of residual errors in the case fitting function: I noticed a growing 7 day oscillation in reported cases. This may be a clerical artifact - or some intyernal dynamic. DISCUSSION. The logistic function is just two model steps away from the initial exponential function, and for later days the intermediate Straight Line fit. It provides high R^2 values for the largely pre-peak data. The logistic function has a short-coming to balance its simplicity: it uses the same exponential parameters before AND after the peak. This may become less realistic if the post-peak data decays on a LONGER time constant as it may well do. I hope that this approach may dissuade modelers starting from case/day data which is inevitably more noisy than the sigmoid case progression. It is perfectly possible to require the non-linear regression application to refit the exponential rate after an established peak rate day, but I do not yet have sufficient data to make this practical.

POSTED BY: Brian Whatcott

Robert Rimmer

Posted 5 years ago

POSTED BY: Robert Rimmer

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback