Message Boards Message Boards

0
|
3589 Views
|
10 Replies
|
3 Total Likes
View groups...
Share
Share this post:

Why is SyntaxQ so opaque?

POSTED BY: David Vasholz
10 Replies

POSTED BY: David Vasholz
Posted 2 years ago

Sander and I both made the point that to replicate the parsing of all the tokens and patterns recognized by the front end would be a massive undertaking.

Indeed, take a look at the CodeParser implemented by Wolfram Research.

POSTED BY: Rohit Namjoshi
Posted 2 years ago

You found a bug in my code. The replacement rule for the SequencePattern should be

SequencePattern -> "[]"

As for the rest, you can't compare IsExpression directly with SyntaxQ, because IsExpression is intended to work only on full form expressions. That was per your guidance:

I am not interested in syntactic sugar I want an emulator that is as close to symbols, square brackets, and commas as possible

SyntaxQ is vastly more comprehensive. You'll notice that things like SyntaxQ["s[]s"] return True, because there is a front end "helper" in the parsing that assumes that there was intended to be a space, s[] s, which gets parsed to (i.e. is syntactic sugar for) Times[s,s[]]. Sander and I both made the point that to replicate the parsing of all the tokens and patterns recognized by the front end would be a massive undertaking.

POSTED BY: Eric Rimbey

POSTED BY: David Vasholz
Posted 2 years ago

I should have remarked that there are some obvious things this doesn't capture. For example, it assumes symbol names are only alpha characters. My qualifier around not being complete or correct was leaving the door open for overlooked patterns, not nitty-gritty details. Having said that, I'm not overlooking Null--instead I'm assuming that any Null is provided explicitly.

POSTED BY: Eric Rimbey

Thank you! This looks really interesting.

POSTED BY: David Vasholz
Posted 2 years ago

Here's something that "emulates" SyntaxQ. I've named it IsExpression. It works by recursively identifying legal sub-expressions and sequences. Since you're not interested in syntactic sugar, it assumes full form expressions. I am not confident that this is complete or correct for all cases, but maybe it's a sufficiently decent approximation.

ReduceExpressionStep[token_String][exp_String] :=
  With[
   {NumberPattern = 
     RegularExpression["(\\d+\\.\\d+)|(\\d+\\.)|(\\.\\d+)|(\\d+)"],
    StringPattern = RegularExpression["\"[^\"]\""],
    SymbolPattern = RegularExpression["[[:alpha:]]+"],
    HeadedPattern = "$[]",
    SequencePattern = RegularExpression["\\[(\\$,)*\\$\\]"]},
   StringReplace[
    StringDelete[exp, Whitespace],
    {NumberPattern -> token,
     StringPattern -> token,
     SymbolPattern -> token,
     HeadedPattern -> token,
     SequencePattern -> ""}]];
ReduceExpression[token_String][exp_String] := 
  FixedPoint[ReduceExpressionStep[token], exp];
IsExpression[exp_String] := "$" == ReduceExpression["$"][exp]
POSTED BY: Eric Rimbey

This would be highly non-trivial to write, there are many different (compound) notations. See my post here: http://community.wolfram.com/groups/-/m/t/1070946

There is no short and easy code unfortunately.

POSTED BY: Sander Huisman

I am not interested in syntactic sugar I want an emulator that is as close to symbols, square brackets, and commas as possible, and contains only Mathematica functions that cannot be expressed in terms of simpler Mathematica functions. I hope to send in an example that is not super trivial. A stupid example would be an emulator that returns False for every input string, which would be the correct answer much more than half the time.

POSTED BY: David Vasholz
Posted 2 years ago

It's not clear to me what you expect from an "emulator" (do you actually mean "parser"?). It's also not clear what level you want to analyze the syntax. There is a lot of syntactic sugar available, and handling all of that would make the parser more complicated. Without the syntactic sugar, I think we just have symbols, square brackets, and commas.

Rather than re-implement the parser, I'd suggest you play around with FullForm and TreeForm (probably with some Hold sprinkled in now and then).

But the syntax of expressions still just scratches the surface. The rules of evaluation and how attributes fit into those rules might be a more interesting/challenging topic.

A "deeper insight into the Wolfram Language" could mean a lot of things. Maybe it would be better to start with some explicit questions or examples that you find challenging.

POSTED BY: Eric Rimbey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract