# [CALL] Making COVID-19 Data Computable

Posted 10 months ago
4490 Views
|
9 Replies
|
16 Total Likes
|
 MODERATOR NOTE: coronavirus resources & updates: https://wolfr.am/coronavirus It's inspiring to see so many people in the community devoting time and expertise to tracking and understanding this pandemic. Here at Wolfram, we've been trying to keep on top of a few critical datasets related to COVID-19 (see the "Curated Computable Data" section in the main hub post linked above) and present them through the Wolfram Data Repository so they're ready for immediate analysis in the Wolfram Language. But there are obviously tons of other data resources available, either directly relating to the pandemic or providing helpful historical, medical, or other context. We've had a lot of different suggestions inside the company for further additions to WDR or Wolfram|Alpha, but we can't possibly tackle everything — so we'd like to invite people in the community to contribute to making more data surrounding this topic computable. First: there are a few suggestions below, but we'd welcome others; I'd hope that part of this hub could just be a growing list of links to useful and relevant external data sources. Even if you don't have the time or ability to scrape and transform a particular dataset yourself, someone else might be able to tackle it. A few initial ideas: Second: if you have the time and inclination to scrape data from one of these sources and try to put it in a more computable form, please jump in and do so! We've already seen what people can do with the "curated computable data" we're providing, and having more neatly-packaged data resources available to the community would benefit everyone. If you're not completely confident in your Wolfram Language data curation skills, we do have a lot of resources to help you improve, like a helpful screencast series at And if you've got a dataset or collection of data around this topic that you'd like to make available to the community — I strongly recommend preparing it using the Data Resource Definition Notebook (available from the File menu in Mathematica, as below):Particularly for work being done around the topic of COVID-19, where time is of the essence and many data resources are changing rapidly, I'd also strongly encourage you to deploy finished resources directly to the cloud, instead of submitting to the Wolfram Data Repository for formal review and publication:Deploying resources in this way will make them easily and immediately accessible to other users, and also automatically create a "shingle" for your data resource where anyone can view your data and analysis, while Wolfram Language users can directly access the underlying data as easily as any object published in the WDR: DateListPlot[ ResourceData[ "https://www.wolframcloud.com/obj/alanj/DeployedResources/Data/Hospital-Beds-by-Country"][All, #TotalBedsPer1000Inhabitants &]] In the interest of making data available to people as quickly as possible, this is probably the best route to follow right now — but I would definitely hope to see lots of user creations become formal Wolfram Data Repository submissions eventually, after "peer review" and revision by the community.
Answer
9 Replies
Sort By:
Posted 10 months ago
 For United States data two nice sources are https://covidtracking.com/ and the NY Times git hub repo here: https://github.com/nytimes/covid-19-data. With some help I created resource functions for importing the latest data from each of them. They were both published in the Wolfram Function Repository today.See the documentation pages above for examples on how to use them. Unlike data repository entries where data is periodically updated but always good, these function repository entries are dependent on evolving third party services, so they could break. But they always grab the latest data from those services.
Answer
Posted 10 months ago
Answer
Posted 10 months ago
 We've had a few other WDR additions in the past week or so — not all directly related to COVID-19, but possibly of use for historical or other kinds of context:
Answer
Posted 10 months ago
 I have tracked accumulated Case and Death data for Oklahoma from Worldometers.info.Because these data are somewhat noisy, I used an exponential low pass on the input with time constant ~ 3 days and fit the series to a sigmoid function of this kind: cases = k / (1 + exp( A - b*days)) like this: I continue by presenting the logistic function (with best-fit parameters) to Wolfram Alpha to plot the differential : which is cases per day like this: In the same way I fit deaths to a logistic like this: and the resulting Alpha plot of death rate like this: Here is a plot of residual errors in the case fitting function: I noticed a growing 7 day oscillation in reported cases. This may be a clerical artifact - or some intyernal dynamic.DISCUSSION.The logistic function is just two model steps away from the initial exponential function, and for later days the intermediate Straight Line fit. It provides high R^2 values for the largely pre-peak data. The logistic function has a short-coming to balance its simplicity: it uses the same exponential parameters before AND after the peak. This may become less realistic if the post-peak data decays on a LONGER time constant as it may well do. I hope that this approach may dissuade modelers starting from case/day data which is inevitably more noisy than the sigmoid case progression. It is perfectly possible to require the non-linear regression application to refit the exponential rate after an established peak rate day, but I do not yet have sufficient data to make this practical.
Answer
Posted 10 months ago
 You might want to look at these posts:Logistic Model for Quarantine Controlled EpidemicsPredicting Coronavirus Epidemic in United States
Answer
Posted 9 months ago
 I've wrangled the Google mobility data of USA and Canada
Answer
Posted 9 months ago
 This note gave me a warm fuzzy for sure! I don't have an immediate need, but it speaks to practical purposes for useful ends. Thank you! Brian W
Answer
Posted 9 months ago
 I've wrangled the US county data of Cases and Deaths from JHU dataset. CloudGet[CloudObject[ "https://www.wolframcloud.com/obj/47230f9f-9bc0-4682-9aff-bdde93e98544"]] The dataset has this structure: Example: CloudGet[CloudObject[ "https://www.wolframcloud.com/obj/47230f9f-9bc0-4682-9aff-bdde93e98544"]][ SelectFirst[#USCounty == Entity["AdministrativeDivision", {"LosAngelesCounty", "California", "UnitedStates"}] &]][ DateListPlot[ Through[{Differences, MovingAverage[Differences[#], Quantity[1, "Weeks"]] &}[#Cases]], PlotRange -> All, PlotLegends -> {"Daily new cases", "Filtered over 1 week"}] &] 
Answer
Posted 9 months ago
 Please check out my post where I processed Facebook population density map and make them computable datasets in Wolfram Language. It can be very useful for epidemic models using population densities.
Answer
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments