Group Abstract Group Abstract

Message Boards Message Boards

CSS Selectors 3 for Symbolic XML

Posted 7 years ago

Introduction

Implementation of CSS Selectors 3 for Wolfram Language SymbolicXML expressions. This work is motivated by the Stack Exchange post css-selectors-for-symbolic-xml. The full package with the examples shown below can be download from the github repo here. The standalone package is attached to this community post but may not remain as up to date as the repo.

The CSS Selectors 3 specification is followed as far as possible, but I make no claims to be absolutely conformant. For example, being that WL SymbolicXML is a static expression, any HTML/XML elements such as dynamic pseudo classes (e.g. active/hover/focus) and pseudo elements (e.g. before/after) are not found.

Load Package

In[] := Needs["Selectors3`"]

Testing on HTML source

Being a little meta, let's test this against the WC3 page for Selectors Level 3.

In[] := document = Import["https://www.w3.org/TR/selectors-3/", "XMLObject"];

Look for elements that belong to classes that contain the letter 'h'.

In[] := Position[document, Selector["[class*=h]"]]
Out[] = {{2, 3, 2, 3, 2}, {2, 3, 2, 3, 484}, {2, 3, 2, 3, 488}, {2, 3, 2, 3, 2, 3, 11}}

In[] := Extract[document, %][[All, 1 ;; 2]] // Column
Out[] = {
 {XMLElement["div", {"class" -> "head"}]},
 {XMLElement["dl", {"class" -> "bibliography"}]},
 {XMLElement["dl", {"class" -> "bibliography"}]},
 {XMLElement["p", {"class" -> "copyright"}]}
}

Look for elements of class '.no-num'

In[] := Extract[document, Position[document, Selector[".no-num"]]] // Column
Out[] = {
 {XMLElement["h2", {"class" -> "no-num no-toc", "id" -> "abstract"}, {"Abstract"}]},
 {XMLElement["h2", {"class" -> "no-num no-toc", "id" -> "status"}, {"Status of this document"}]},
 {XMLElement["h2", {"class" -> "no-num no-toc", "id" -> "contents"}, {"Table of contents"}]},
 {XMLElement["h2", {"class" -> "no-num no-toc"}, {"W3C Recommendation 06 November 2018"}]}
}

Check specificity of the selector

In[] := Selector[document, "[class~=a] b > *:link"]["Specificity"]
Out[] = {0, 2, 1}

In[] := Selector[document, "[class~=a] b > :not(p)"]["Specificity"]
Out[] = {0, 1, 2}

In[] := Selector[document, "#welcome"]["Specificity"]
Out[] = {1, 0, 0}

Testing on XML source

In[] := str = "<html xml:lang='zh'><head><title>Test</title></head><body \
        xmlns='http://www.w3.org/1999/xhtml'><p lang='en' class='red' \
        myid='unique'>Here is some math.</p><p><m:math \
        xmlns:m='http://www.w3.org/1998/Math/MathML'><m:mi \
        m:title='cat'>x</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:math></p></body>\
        \n</html>";
     
In[] := obj = ImportString[str, "XML"];

Namespace

If the selector does not specify a namespace, then the namespace is ignored:

In[] := Selector[str, "mo"]
Out[] = <|"Specificity" -> {0, 0, 1}, "Elements" -> {{2, 3, 2, 3, 2, 3, 1, 3, 2}}|>

If a namespace is given in the selector, then you need to provide the prefix's expansion rule. Otherwise the selector won't match any element.

In[] := Selector[str, "m|mo"]
Out[] = <|"Specificity" -> {0, 0, 1}, "Elements" -> {}|>

In[] := Selector[str, "m|mo", "Namespaces" -> {"m" -> "http://www.w3.org/1998/Math/MathML"}]
Out[] = <|"Specificity" -> {0, 0, 1}, "Elements" -> {{2, 3, 2, 3, 2, 3, 1, 3, 2}}|>

ID

XML can define its own unique ID tags. Use the "ID" option to indicate what tag name is in use. This is equivalent to using the attribute selector but with higher specificity.

In[] := Selector[str, "#unique", "ID" -> "myid"]
Out[] = <|"Specificity" -> {1, 0, 0}, "Elements" -> {{2, 3, 2, 3, 1}}|>

In[] := Selector[str, "[myid=unique]"]
Out[] = <|"Specificity" -> {0, 1, 0}, "Elements" -> {{2, 3, 2, 3, 1}}|>

Case sensitivity

XML is case-sensitive, but the Selectors3 package is not by default. Use the "CaseInsensitive" option to enforce case sensitivity.

In[] := Selector[str, "[myID=unique]", "CaseInsensitive" -> True]
Out[] = <|"Specificity" -> {0, 1, 0}, "Elements" -> {{2, 3, 2, 3, 1}}|>

In[] := Selector[str, "[myID=unique]", "CaseInsensitive" -> False]
Out[] = <|"Specificity" -> {0, 1, 0}, "Elements" -> {}|>

You can specify the case-sensitivity separately for attribute name and value.

In[] := Selector[str, "[myID=Unique]", "CaseInsensitive" -> {"AttributeName" -> True, "AttributeValue" -> False}]
Out[] = <|"Specificity" -> {0, 1, 0}, "Elements" -> {}|>

In[] := Selector[str, "[myID=Unique]", "CaseInsensitive" -> {"AttributeName" -> False, "AttributeValue" -> True}]
Out[] = <|"Specificity" -> {0, 1, 0}, "Elements" -> {}|>

In[] := Selector[str, "[myID=Unique]", "CaseInsensitive" -> {"AttributeName" -> True, "AttributeValue" -> True}]
Out[] = <|"Specificity" -> {0, 1, 0}, "Elements" -> {{2, 3, 2, 3, 1}}|>

You can specify the case-sensitivity separately for type.

In[] := Selector[str, "P", "CaseInsensitive" -> {"Type" -> True}]
Out[] = <|"Specificity" -> {0, 0, 1}, "Elements" -> {{2, 3, 2, 3, 1}, {2, 3, 2, 3, 2}}|>

In[] := Selector[str, "P", "CaseInsensitive" -> {"Type" -> False}]
Out[] = <|"Specificity" -> {0, 0, 1}, "Elements" -> {}|>
Attachments:
POSTED BY: Kevin Daily
10 Replies

Congrats on publishing! I got the bonus workflow #1 in your slides to work but am having an issue with #2, I think it should just be CSSTargets[doc, "body"][[1]] (not [[1,1]]), if I change these lines in your presentation's last slide:

body = styleDataCell["Notebook", Notebook,  CSSTargets[doc, "body"][[1]]]
h3 = styleDataCell["question", Cell, CSSTargets[doc, "h3"][[1]]]
h1 = styleDataCell["h1", Cell, CSSTargets[doc, "h1"][[1]]]
li = styleDataCell["li", Cell, CSSTargets[doc, "li"][[1]]]
p = styleDataCell["p", Cell, CSSTargets[doc, "p"][[1]]]
h2 = styleDataCell["h2", Cell, CSSTargets[doc, "h2"][[1]]]

I get the output on the left (which doesn't look exactly right):

enter image description here

POSTED BY: Michael Sollami
POSTED BY: Kevin Daily
POSTED BY: Michael Sollami
Posted 6 years ago

Hi @Kevin Daily please keep us posteed when it is released.

POSTED BY: Diego Zviovich
POSTED BY: Kevin Daily

Hi Kevin, Nice post! Did you ever publish your CSSTools` package?

POSTED BY: Michael Sollami
POSTED BY: Kevin Daily

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming, and consider contributing your work to the The Notebook Archive!

POSTED BY: EDITORIAL BOARD
Posted 7 years ago

I like the package. Clean implementation too.

On a related note here are two other CSS selector methods, one using a real XML processing library in Java rather than any WL hacks: https://mathematica.stackexchange.com/a/183970/38205

This will be as robust as JLink is (i.e. very robust). It is object-oriented and so much, much nicer to work with than the standard Mathematica XML headaches.

And another one I wrote up that uses pure Graph methods to implement selectors in terms of a DFS: https://mathematica.stackexchange.com/a/184417/38205

This one still performs quite well, though, and for very complex queries is conceivably cleaner than a pure Cases/Positions method. Also it references a proper Graph and thus can also be object oriented and thus attribute and property lookup from nodes is nearly instantaneous.

POSTED BY: b3m2a1 ​ 
POSTED BY: Kevin Daily
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard