Group Abstract Group Abstract

Message Boards Message Boards

How to import an XML file otherwise intended for Microsoft Excel?

Posted 11 years ago
POSTED BY: Stephen Wilkus
2 Replies
POSTED BY: Marcelo De Cicco
Posted 11 years ago

Starting with a sample excel xml string taken from http://technet.microsoft.com/en-us/magazine/2006.01.blogtales.aspx :

xml = ImportString["<?xml version=\"1.0\"?>
<ss:Workbook xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\">
    <ss:Worksheet ss:Name=\"Sheet1\">
        <ss:Table>
            <ss:Column ss:Width=\"80\"/>
            <ss:Column ss:Width=\"80\"/>
            <ss:Column ss:Width=\"80\"/>
            <ss:Row>
                <ss:Cell>
                   <ss:Data ss:Type=\"String\">First Name</ss:Data>
                </ss:Cell>
                <ss:Cell>
                   <ss:Data ss:Type=\"String\">Last Name</ss:Data>
                </ss:Cell>
                <ss:Cell>
                   <ss:Data ss:Type=\"String\">Phone Number</ss:Data>
                </ss:Cell>
            </ss:Row>
            <ss:Row>
                <ss:Cell>
                   <ss:Data ss:Type=\"String\">Nancy</ss:Data>
                </ss:Cell>
                <ss:Cell>
                   <ss:Data ss:Type=\"String\">Davolio</ss:Data>
                </ss:Cell>
                <ss:Cell>
                   <ss:Data ss:Type=\"String\">(206)555 9857</ss:Data>
                </ss:Cell>
            </ss:Row>
        </ss:Table>
    </ss:Worksheet>
</ss:Workbook>", "xml" ]

The result is this visually hard to read expression:

XMLObject[Document][{XMLObject[Declaration][Version->1.0]},XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Workbook},{{http://www.w3.org/2000/xmlns/,ss}->urn:schemas-microsoft-
com:office:spreadsheet},{XMLElement[{urn:schemas-microsoft-com:office:spreadsheet,Worksheet},{{urn
:schemas-microsoft-com:office:spreadsheet,Name}->Sheet1},{XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Table},{},{XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Column},{{urn:schemas-microsoft-
com:office:spreadsheet,Width}->80},{}],XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Column},{{urn:schemas-microsoft-
com:office:spreadsheet,Width}->80},{}],XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Column},{{urn:schemas-microsoft-
com:office:spreadsheet,Width}->80},{}],XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Row},{},{XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Cell},{},{XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Data},{{urn:schemas-microsoft-com:office:spreadsheet,Type}->String},{First
Name}]}],XMLElement[{urn:schemas-microsoft-com:office:spreadsheet,Cell},{},{XMLElement[{urn:schemas-
microsoft-com:office:spreadsheet,Data},{{urn:schemas-microsoft-
com:office:spreadsheet,Type}->String},{Last Name}]}],XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Cell},{},{XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Data},{{urn:schemas-microsoft-com:office:spreadsheet,Type}->String},{Phone
Number}]}]}],XMLElement[{urn:schemas-microsoft-com:office:spreadsheet,Row},{},{XMLElement[{urn
:schemas-microsoft-com:office:spreadsheet,Cell},{},{XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Data},{{urn:schemas-microsoft-
com:office:spreadsheet,Type}->String},{Nancy}]}],XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Cell},{},{XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Data},{{urn:schemas-microsoft-
com:office:spreadsheet,Type}->String},{Davolio}]}],XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Cell},{},{XMLElement[{urn:schemas-microsoft-
com:office:spreadsheet,Data},{{urn:schemas-microsoft-com:office:spreadsheet,Type}->String},{(206)555
9857}]}]}]}]}]}],{}]

In that expression, click over a some useful part of information. For example, click over "Nancy" and then use the "Extend Selection" menu command (CTRL + period) until you have selected the complete XML element. Then copy&paste this into the second argument of a Case command:

Case[xml, <paste here>, Infinity]

Then you can replace the specific pattern with a more general one and extract the data like this

Cases[xml, 
 XMLElement[{"urn:schemas-microsoft-com:office:spreadsheet", 
    "Data"}, {{"urn:schemas-microsoft-com:office:spreadsheet", 
      "Type"} -> "String"}, {name_String}] :> name, \[Infinity]]

Out[] = {"First Name", "Last Name", "Phone Number", "Nancy", "Davolio", "(206)555 9857"}

By using "Extend Selection" a little more, you can get more structure out of the data:

    Cases[xml,
          {XMLElement[{"urn:schemas-microsoft-com:office:spreadsheet","Cell"},
          {},
          {XMLElement[{"urn:schemas-microsoft-com:office:spreadsheet","Data"},
                      {{"urn:schemas-microsoft-com:office:spreadsheet","Type"} -> "String"},
                      {s1_}]}],
           XMLElement[{"urn:schemas-microsoft-com:office:spreadsheet","Cell"},
                      {},
                      {XMLElement[{"urn:schemas-microsoft-com:office:spreadsheet","Data"},
                                  {{"urn:schemas-microsoft-com:office:spreadsheet","Type"} -> "String"},
                                  {s2_}]}],
           XMLElement[{"urn:schemas-microsoft-com:office:spreadsheet", "Cell"},
                      {},
                      {XMLElement[{"urn:schemas-microsoft-com:office:spreadsheet","Data"},
                      {{"urn:schemas-microsoft-com:office:spreadsheet", "Type"} -> "String"},
                      {s3_}]}]} :> {s1, s2, s3},
          Infinity]

Out[]= {{"First Name", "Last Name", "Phone Number"}, {"Nancy", "Davolio",  "(206)555 9857"}}

As a final step, you can experiment simplifying the pattern by replacing some strings with _ (or even __)

    Cases[xml,
          {XMLElement[{_,"Cell"}, {}, {XMLElement[{_,"Data"}, {{_, "Type"} -> "String"}, {s1_}]}],
           XMLElement[{_,"Cell"}, {}, {XMLElement[{_,"Data"}, {{_, "Type"} -> "String"}, {s2_}]}],
           XMLElement[{_,"Cell"}, {}, {XMLElement[{_,"Data"}, {{_, "Type"} -> "String"}, {s3_}]}]} :> {s1, s2, s3},
         Infinity]


Out[]= {{"First Name", "Last Name", "Phone Number"}, {"Nancy", "Davolio",  "(206)555 9857"}}

This procedure is very useful for parsing HTML too.

POSTED BY: Gustavo Delfino
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard