Group Abstract

Message Boards

WOLFRAM COMMUNITY

3.7K Views

2 Replies

1 Total Like

View groups...

Follow this post

Share this post:

GROUPS:

Wolfram Language Natural Language Processing

WolframLanguageData plain text usage definitions, some problems.

Andrew Meit

Posted 4 years ago

In my effort to pull apart usage definition text blocks into lists, I have written some code to do so: getUsg2[symb1_] := StringSplit[ StringReplace[ WolframLanguageData[symb1, "PlaintextUsage"], {". " -> ".. ", "\\!\\(\\" -> "", ",AutoSpacing->False" -> "", "\\\"" -> "\""}], ". "]; However I found a small list of symbols whose plaintext usage blocks has boxes still used. Examine the code below to see the problems: getUsg2@ {"Begin", "BeginPackage", "CloudExpression", "CloudObject", "DeclarePackage", "DumpSave", "FindChannels", "Needs", "Save"} Am attaching a notebook. Can someone please help me nail down the last 9 symbols to parse. Or teach me the knowledge to figure it out. Thanks Attachments:* Problematic Wolf...nb

POSTED BY: Andrew Meit

2 Replies

Sort By:

Andrew Meit

Posted 4 years ago

I have decided to fix the malformed usage definitions than parse them; now I can revise my code better. Attaching notebook. I have reported the malformed definitions with my fixes to their bug database. Found out not all symbols in System` will be found in WLD at this time; but over time what is missing will be addressed. At least am no longer in parsing boxes hell. ;-) Attachments: fixed WLD usage.nb

POSTED BY: Andrew Meit

Updating Name

Posted 4 years ago

If you're goal is to have usage data in a particular (plaintext) format, and you only have 9 symbols that your function cannot parse, then just overload your function with ad hoc definitions for those symbols. getUsg2["Begin"] = {"Begin[\"context`\"] resets the current context."} If you're worried about this approach, because your textual documentation will no longer be in sync for these 9 symbols, well, that's legitimate, but weigh the cost against the benefit. If you're wanting to make your getUsg2 function absolutely bullet-proof against malformed box strings, then I just don't think that's feasible. You can't know all of the ways that such box strings will be malformed. A compromise would be to handle the particular malformations that you already know about. So, instead of hard-coding getUsg2["Begin"], you could analyze that particular malformation, implement a fix for that pattern, rerun getUsg2 to see what "bad" symbols remain (it might have cleaned up some of the others as well), and then repeat this process until no bad cases remain. For example: cleanUsageText[str_String] := StringReplace[ str, { RegularExpression["\"\\W+StyleBox\[([^]]+)]\""] :> StringDelete["$1", ("\\" \| "(" \| "")](, the ones you already had would go here*)}] getUsageSentences[sym_] := TextSentences[ cleanUsageText[WolframLanguageData[sym, "PlaintextUsage"]]] This cleans up the usage for Begin. It partially cleans up BeginPackage. You can either try to get this to work for both, or just move on to creating an ad-hoc fix for BeginPackage. You may end up the maximum of 9 special fixes for the 9 remaining problems, but oh well.

POSTED BY: Updating Name

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback