Group Abstract

Message Boards

WOLFRAM COMMUNITY

6K Views

2 Replies

4 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

[✓] Import data from a URL (COVID-19)

JOSÉ GUILLERMO SÁNCHEZ LEÓN

JOSÉ GUILLERMO SÁNCHEZ LEÓN, Universidad de Salamanca

Posted 5 years ago

How the below file can be imported Import["https://analisis.datosabiertos.jcyl.es/explore/dataset//situacion-de-hospitalizados-por-coronavirus-en-castilla-y-leon//download/?format=xls&timezone=Europe/Madrid&lang=es&use_labels_for_\header=true"] You can copy and paste the URL address in a web browser and the file is downloaded Thank you Guillermo

How the below file can be imported

Import["https://analisis.datosabiertos.jcyl.es/explore/dataset//situacion-de-hospitalizados-por-coronavirus-en-castilla-y-leon//download/?format=xls&timezone=Europe/Madrid&lang=es&use_labels_for_\header=true"]

You can copy and paste the URL address in a web browser and the file is downloaded

Thank you

Guillermo

POSTED BY: JOSÉ GUILLERMO SÁNCHEZ LEÓN

2 Replies

Sort By:

Gustavo Delfino

Posted 5 years ago

Hello Guillermo, The data is in XML format (representing a spreadsheet) but `Import` doesn't know that. If you provide "XML" as the second argument to `Import`, it works fine. However dealing with spreadsheet XML is not fun. As HTML is kind of a derivative of XML I tried with `Import["the url","HTML"]` and it works! So you just need to do some string processing after that to get the data in a useful way. The first step is importing the data as a string: s = Import[ "https://analisis.datosabiertos.jcyl.es/explore/dataset//situacion-de-hospitalizados-por-coronavirus-en-castilla-y-leon//download/?format=xls&timezone=Europe/Madrid&lang=es&use_labels_for_\\header=true", "HTML" ] (* fecha hospital provincia hospitalizados_planta hospitalizados_uci altas fallecimientos codigo_ine 2020-03-18 Complejo Asistencial de Ávila Ávila 24 1 1 1 5019 2020-03-18 Hospital de El Bierzo León 5 1 0 0 24089 (...etc...) 2020-04-04 Hospital Universitario Río Hortega Valladolid 157 51 244 58 47186 ) Then we separate the header from the data, convert the date, split and partition: {header, data} = StringSplit[s, d:DatePattern[{"Year","Month","Day"}] :> DateObject[d]] // {First / StringSplit, Rest /* (Partition[#, 2] &) } // Through and finally we can use a carefully crafted string expression to separate the strings into fields so that we can convert to a dataset: ds = Map[ StringCases[ #[[2]], f1:Except[DigitCharacter]..~~" "~~ f2:WordCharacter..~~" " ~~ f3:NumberString~~" " ~~ f4:NumberString~~" " ~~ f5:NumberString~~" " ~~ f6:NumberString~~" " ~~ f7:NumberString :> AssociationThread[header->{#[[1]],f1,f2,f3,f4,f5,f6,f7}] ] &, data] // Flatten // Dataset

Hello Guillermo,

The data is in XML format (representing a spreadsheet) but Import doesn't know that. If you provide "XML" as the second argument to Import, it works fine. However dealing with spreadsheet XML is not fun. As HTML is kind of a derivative of XML I tried with Import["the url","HTML"] and it works! So you just need to do some string processing after that to get the data in a useful way.

The first step is importing the data as a string:

s = Import[
        "https://analisis.datosabiertos.jcyl.es/explore/dataset//situacion-de-hospitalizados-por-coronavirus-en-castilla-y-leon//download/?format=xls&timezone=Europe/Madrid&lang=es&use_labels_for_\\header=true",
        "HTML"
    ]

(* fecha hospital provincia hospitalizados_planta hospitalizados_uci altas fallecimientos codigo_ine 2020-03-18 Complejo Asistencial de Ávila Ávila 24 1 1 1 5019 2020-03-18 Hospital de El Bierzo León 5 1 0 0 24089 (...etc...) 2020-04-04 Hospital Universitario Río Hortega Valladolid 157 51 244 58 47186 *)

Then we separate the header from the data, convert the date, split and partition:

{header, data} = StringSplit[s, d:DatePattern[{"Year","Month","Day"}] :> DateObject[d]] //
        {First /* StringSplit, Rest /* (Partition[#, 2] &) } //
    Through

and finally we can use a carefully crafted string expression to separate the strings into fields so that we can convert to a dataset:

ds = 
    Map[
        StringCases[
            #[[2]],
            f1:Except[DigitCharacter]..~~" "~~
            f2:WordCharacter..~~" " ~~
            f3:NumberString~~" " ~~
            f4:NumberString~~" " ~~
            f5:NumberString~~" " ~~
            f6:NumberString~~" " ~~
            f7:NumberString :> AssociationThread[header->{#[[1]],f1,f2,f3,f4,f5,f6,f7}]
        ] &,
        data] //
      Flatten //
    Dataset

dataset

POSTED BY: Gustavo Delfino

Jason Biggs

Jason Biggs, Wolfram Research

Posted 5 years ago

If I download the file, I get errors trying to import it. So the problem is independent of the URL. I was able to open it in LibreOffice and then save again as an XLS file, which I could import to mathematica.

POSTED BY: Jason Biggs

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback