Group Abstract Group Abstract

Message Boards Message Boards

0
|
3.6K Views
|
2 Replies
|
1 Total Like
View groups...
Share
Share this post:

WolframLanguageData plain text usage definitions, some problems.

Posted 3 years ago

In my effort to pull apart usage definition text blocks into lists, I have written some code to do so:

getUsg2[symb1_] :=   StringSplit[ StringReplace[
WolframLanguageData[symb1, "PlaintextUsage"], {". " -> ".. ", 
"\\!\\(\\*" -> "", ",AutoSpacing->False" -> "", "\\\"" -> "\""}],
". "];

However I found a small list of symbols whose plaintext usage blocks has boxes still used. Examine the code below to see the problems:

getUsg2@ {"Begin", "BeginPackage", "CloudExpression", "CloudObject",  
"DeclarePackage", "DumpSave", "FindChannels", "Needs", "Save"}

Am attaching a notebook. Can someone please help me nail down the last 9 symbols to parse. Or teach me the knowledge to figure it out.

Thanks

Attachments:
POSTED BY: Andrew Meit
2 Replies
Posted 3 years ago

I have decided to fix the malformed usage definitions than parse them; now I can revise my code better. Attaching notebook. I have reported the malformed definitions with my fixes to their bug database. Found out not all symbols in System` will be found in WLD at this time; but over time what is missing will be addressed. At least am no longer in parsing boxes hell. ;-)

Attachments:
POSTED BY: Andrew Meit
Posted 3 years ago

If you're goal is to have usage data in a particular (plaintext) format, and you only have 9 symbols that your function cannot parse, then just overload your function with ad hoc definitions for those symbols.

getUsg2["Begin"] = {"Begin[\"context`\"] resets the current context."}

If you're worried about this approach, because your textual documentation will no longer be in sync for these 9 symbols, well, that's legitimate, but weigh the cost against the benefit.

If you're wanting to make your getUsg2 function absolutely bullet-proof against malformed box strings, then I just don't think that's feasible. You can't know all of the ways that such box strings will be malformed.

A compromise would be to handle the particular malformations that you already know about. So, instead of hard-coding getUsg2["Begin"], you could analyze that particular malformation, implement a fix for that pattern, rerun getUsg2 to see what "bad" symbols remain (it might have cleaned up some of the others as well), and then repeat this process until no bad cases remain.

For example:

cleanUsageText[str_String] :=
 StringReplace[
  str,
  {
   RegularExpression["\"\\W+StyleBox\[([^]]+)]\""] :> 
    StringDelete["$1", ("\\" | "(" | "*")](*,
   the ones you already had would go here*)}]

getUsageSentences[sym_] := 
 TextSentences[
  cleanUsageText[WolframLanguageData[sym, "PlaintextUsage"]]]

This cleans up the usage for Begin. It partially cleans up BeginPackage. You can either try to get this to work for both, or just move on to creating an ad-hoc fix for BeginPackage. You may end up the maximum of 9 special fixes for the 9 remaining problems, but oh well.

POSTED BY: Updating Name
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard