Message Boards Message Boards

1
|
7187 Views
|
5 Replies
|
9 Total Likes
View groups...
Share
Share this post:

Extracting Information from XMLObject

Posted 10 years ago

Hi, I have imported an XMLObject which contains XMLElements of the following form:

  XMLElement["tr", {"class" -> "imp"}, {XMLElement[
       "td", {}, {XMLElement[
         "a", {"href" -> "team_r.aspx?code=KWT&year=2011"}, {"Kuwait"}]}],
       XMLElement["td", {"align" -> "center"}, {"5"}], 
      XMLElement["td", {"align" -> "center"}, {"5"}], 
      XMLElement["td", {"align" -> "center"}, {}], 
      XMLElement["td", {"align" -> "right"}, {"0"}], 
      XMLElement["td", {"align" -> "right"}, {"0"}], 
      XMLElement["td", {"align" -> "right"}, {"0"}], 
      XMLElement["td", {"align" -> "right"}, {"1"}], 
      XMLElement["td", {"align" -> "right"}, {"0"}], 
      XMLElement["td", {"align" -> "right"}, {"0"}], 
      XMLElement["td", {"align" -> "right"}, {"1"}], 
      XMLElement["td", {"align" -> "right"}, {"100"}], 
      XMLElement["td", {"align" -> "center"}, {"0"}], 
      XMLElement["td", {"align" -> "center"}, {"0"}], 
      XMLElement["td", {"align" -> "center"}, {"0"}], 
      XMLElement["td", {"align" -> "center"}, {"0"}], 
      XMLElement["td", {"align" -> "left"}, {"Ibrahim Alqattan"}], 
      XMLElement["td", {"align" -> "left"}, {"Abdulrazzaq Albaghli"}]}]

I am interested in the country name and the numbers there: e.g. {Kuwait, 5,5,,...,100,0,0,0,0}.

How can I extract this information in the most efficient way?

POSTED BY: Sandu Ursu
5 Replies
Posted 10 years ago

If you're really pressed for time (With Complete Loss of Generality) WCLOG

xml = xml /. XMLElement[a_, b_, {}] -> XMLElement[a, b, {"Null"}];
list = xml[[3, 1, 3, 1, 3]]~Join~xml[[3, 2 ;; 16, 3, 1]]

{"Kuwait", "5", "5", "Null", "0", "0", "0", "1", "0", "0", "1", \
"100", "0", "0", "0", "0"}
POSTED BY: Douglas Kubler

A variant possibilty:

(xml[[3]] //. XMLElement[_, _, z_] :>  z)[[;; -3]] /. {
    x_String?(StringMatchQ[#, NumberString] &) :> ToExpression[x]}

This gives

{{{"Kuwait"}}, {5}, {5}, {}, {0}, {0}, {0}, {1}, {0}, {0}, {1}, {100}, {0}, {0}, {0}, {0}} 

You can then decide what to do with the empty list and then Flatten the result.

POSTED BY: David Reiss
Posted 10 years ago

Something like this works. I'll leave it to you for the criteria to drop the last n (=2) items, fixed number or variable? I converted the numbers strings - your option. How you flag an empty list is a matter of style.

    xml = XMLElement[
      "tr", {"class" -> "imp"}, {XMLElement[
        "td", {}, {    
       XMLElement["a", {"href" -> "team_r.aspx?code=KWT&year=2011"}, {"Kuwait"}] }  ], XMLElement["td", {"align" -> "center"}, {"5"}], 
       XMLElement["td", {"align" -> "center"}, {"5"}], 
       XMLElement["td", {"align" -> "center"}, {}],
       XMLElement["td", {"align" -> "right"}, {"0"}],
       XMLElement["td", {"align" -> "right"}, {"0"}],
       XMLElement["td", {"align" -> "right"}, {"0"}],
       XMLElement["td", {"align" -> "right"}, {"1"}],
       XMLElement["td", {"align" -> "right"}, {"0"}],
       XMLElement["td", {"align" -> "right"}, {"0"}],
       XMLElement["td", {"align" -> "right"}, {"1"}],
       XMLElement["td", {"align" -> "right"}, {"100"}],
       XMLElement["td", {"align" -> "center"}, {"0"}],
       XMLElement["td", {"align" -> "center"}, {"0"}],
       XMLElement["td", {"align" -> "center"}, {"0"}],
       XMLElement["td", {"align" -> "center"}, {"0"}],
       XMLElement["td", {"align" -> "left"}, {"Ibrahim Alqattan"}], 
       XMLElement["td", {"align" -> "left"}, {"Abdulrazzaq Albaghli"}]}]

    xml = xml /. XMLElement[a_, b_, {}] -> XMLElement[a, b, {"Null"}];
    datalist = 
     Cases[xml, XMLElement["td", {}, { XMLElement["a", {b_}, {c_String}]}] :> c, 2]
~Join~
Cases[xml, XMLElement["td", _, {c_String}] :> ToExpression[c], \[Infinity]]


{"Kuwait", 5, 5, Null, 0, 0, 0, 1, 0, 0, 1, 100, 0, 0, 0, 0,  Alqattan Ibrahim, Abdulrazzaq Albaghli}
POSTED BY: Douglas Kubler
Posted 10 years ago

Thank you, Jesse

I was looking for the "magic pattern" which does all the job using Cases[].

Something like this:

Cases[data, 
 XMLElement["tr", ___, {XMLElement["td", {}, {XMLElement["a", {___}, {country_}]}],
    i__}, ___] :> {country, List[i][[;; 15, 3]] /. {} -> {Null} // Flatten // ToExpression}, Infinity]
POSTED BY: Sandu Ursu

I have never been quite content with Mathematica's XML handling features. Assuming all of your XMLElements follow the same pattern and have the same number of subelements, try this:

xml = XMLElement[
   "tr", {"class" -> "imp"}, {XMLElement[
     "td", {}, {XMLElement[
       "a", {"href" -> 
         "team_r.aspx?code=KWT&year=2011"}, {"Kuwait"}]}], 
    XMLElement["td", {"align" -> "center"}, {"5"}], 
    XMLElement["td", {"align" -> "center"}, {"5"}], 
    XMLElement["td", {"align" -> "center"}, {}], 
    XMLElement["td", {"align" -> "right"}, {"0"}], 
    XMLElement["td", {"align" -> "right"}, {"0"}], 
    XMLElement["td", {"align" -> "right"}, {"0"}], 
    XMLElement["td", {"align" -> "right"}, {"1"}], 
    XMLElement["td", {"align" -> "right"}, {"0"}], 
    XMLElement["td", {"align" -> "right"}, {"0"}], 
    XMLElement["td", {"align" -> "right"}, {"1"}], 
    XMLElement["td", {"align" -> "right"}, {"100"}], 
    XMLElement["td", {"align" -> "center"}, {"0"}], 
    XMLElement["td", {"align" -> "center"}, {"0"}], 
    XMLElement["td", {"align" -> "center"}, {"0"}], 
    XMLElement["td", {"align" -> "center"}, {"0"}], 
    XMLElement["td", {"align" -> "left"}, {"Ibrahim Alqattan"}], 
    XMLElement["td", {"align" -> "left"}, {"Abdulrazzaq Albaghli"}]}];
    Join[xml[[3, 1, 3, 1, 3]], 
     Flatten[xml[[3, 2 ;; 16, 3]] // Replace[#, {} -> Null, Infinity] &]]

I'm replacing blank lists with Null because by default Flatten will skip over blank lists. This produces an output like such:

{"Kuwait", "5", "5", Null, "0", "0", "0", "1", "0", "0", "1", "100", "0", "0", "0", "0"}

Hope this helps!

POSTED BY: Jesse Friedman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract