Neil, Wow, thats a cool idea and very diffrent from mine.
It appears that you are parsing LaTeX so I think your code breaks if the square "]" is a "}" or if the Typesetting only has an opening "{" and no closing one.
I'm not sure if I understand correctly, but in this case the "{" and "}" are written as "\{" and "\}". This situation should be solved in my code with the first test in the following "Which"
Which[
i > 1 && chars[[i - 1]] == "\\", ,
chars[[i]] == "{", sum = sum + 1,
chars[[i]] == "}", sum = sum - 1
];
The only problem I could imagine is, when there is a commentary inside which has not to follow any syntax-rules. But I know, that in the options are no commentaries inside, so that this shouldnt be a problem for my work.
The code above avoids looping (looping is generally not "Mathematica Friendly")
Do you mean, that I should try to not use "While"? Is this for performance-reasons? I compared our two functions with a bigger file (attached):
In[39]:= Timing[findOption[text, "\\def\\opa"];]
Out[39]= {0.015625, Null}
In[40]:= Timing[findOption2[text, "\\def\\opa"];]
Out[40]= {25.9219, Null}
I think, that in bigger files with lots of "{" and "}" the variable "possible" of your function is exploding and makes it very slow, so that this solution is not useful for my situation. But thank you anyway, I like it very much to see different ideas.
I've done the work now with a workaround instead of using RegularExpressions:
findOption[text_String, option_String] := Module[
{pos, chars, sum, i},
pos = StringPosition[text, option <> "{"];
If[Length[pos] == 0, Return[False], pos = First[pos]];
chars = Characters[StringTake[text, {pos[[2]] + 1, -1}]];
sum = 0; i = 1;
While[
sum != -1 && i <= Length[chars],
Which[
i > 1 && chars[[i - 1]] == "\\", ,
chars[[i]] == "{", sum = sum + 1,
chars[[i]] == "}", sum = sum - 1
];
i++];
Return[pos + {0, i}]
]
This works now pretty good:
In[88]:= text
Out[88]= "\\input{mystandard}
\\begin{document}
% bla bla bla { { { } } }
\\def\\opx{$\\left\\{\\cfrac{x}{y}\\right]$}
% bla bla bla { { { } } }
\\opx
\\end{document}"
In[89]:= pos = findOption[text, "\\def\\opx"]
Out[89]= {73, 111}
In[90]:= StringTake[text, pos]
Out[90]= "\\def\\opx{$\\left\\{\\cfrac{x}{y}\\right]$}"
So I'm happy with how it works :)
Thank you all for your help, and sorry again for my bad and unclear examples.
|
|
Juerg, I think the easiest way to do this is to go back to your original example and add Longest to the pattern and simplify it.
In[1]:= string2 = "abcde(fgh(ij)k)lmn"
justTheInsides = StringCases[string2, Longest["e(" ~~ x___ ~~ ")"] -> x][[1]]
Out[1]= "abcde(fgh(ij)k)lmn"
Out[2]= "fgh(ij)k"
Now you can do the German umlaut without any trouble -- you can even go back to your original approach of finding the position and using StringTake (in which case you do not need to name the string "x"):
pos2 = First[StringPosition[string2, Longest["e(" ~~ ___ ~~ ")"]]]
StringTake[string2, pos2 + {2, -1}]
Now on umlauts the code still works:
In[106]:= string3 = "\\def\\opc{Z\\\"{u}rich}"
StringCases[string3, Longest["opc{" ~~ x___ ~~ "}"] -> x][[1]]
Out[106]= "\\def\\opc{Z\\\"{u}rich}"
Out[107]= "Z\\\"{u}rich"
Regards, Neil
|
|
Thank you Neil Good idea, but if the closing bracket is not the last one, it doesen't work anymore:
In[82]:= string = "abcde(fgh(ij)k)lmn(opq)rs"
justTheInsides =
StringCases[string, Longest["e(" ~~ x___ ~~ ")"] -> x][[1]]
Out[82]= "abcde(fgh(ij)k)lmn(opq)rs"
Out[83]= "fgh(ij)k)lmn(opq"
|
|
Hello Henrik & Gianluca Thank you very much for your answers. Sorry, my example was not very accurate, I tried to simplfiy it but that was not a good idee. The string is not an expression, its one of thousends of text-files. In this files it has for example a part with the following text:
\def\opa{2018}
\def\opb{Winter}
\def\opn{a}
\def\opc{Z\"{u}rich}
\def\opd{1}
I want read out the location Z\"{u}rich and replace \def\opc{Z\"{u}rich} with \def\opLocation{Z\"{u}rich}. The solution of Henrik doesnt work for my problem. And I think the solution of Gianluca doesn't help too, but I dont really understand it and have to think about it first. Sorry for the bad example!
|
|
Hi Juerg,
I want read out the location Z\"{u}rich and replace \def\opc{Z\"{u}rich} with \def\opLocation{Z\"{u}rich}.
This appears to be a different thing (but nevertheless simplifying examples is generally a good idea!) and could be done simply like so:
text = "\\def\\opa{2018}
\\def\\opb{Winter}
\\def\\opn{a}
\\def\\opc{Z\\\"{u}rich}
\\def\\opd{1}";
StringPosition[text, "\\opc{Z\\\"{u}rich}"]
StringReplace[text, "\\opc{Z\\\"{u}rich}" -> "\\opLocation{{Z\\\"{u}rich}"]
Or am I still misunderstanding your problem? Regards -- Henrik
|
|
The problem is that the location in "\def\opc{location}" could be anything, "\def\opc{Z\"{u}rich}" was only an example, which shows the issue that the location could have brackets too, so its not so easy, to find out, which is the closing bracket. I think there should be an easy way with regular expressions, but I dont know much about it and I have to learn it first. As a workaround I could lookup for "\def\opc{" only, then make characters of the following string and I can count "{" as +1 and "}" as -1 and stop if the sum is -1.
text = "\\def\\opa{2018}
\\def\\opb{Winter}
\\def\\opn{a}
\\def\\opc{Z\\\"{u}rich}
\\def\\opd{1}";
pos1 = First[StringPosition[text, "\\def\\opc{"]];
chars = Characters[StringTake[text, {pos1[[2]] + 1, -1}]];
sum = 0; i = 1;
While[
sum != -1 && i <= Length[chars],
Which[
chars[[i]] == "{", sum = sum + 1,
chars[[i]] == "}", sum = sum - 1
];
i++];
location = StringTake[text, pos1 + {9, i - 2}]
(* Out location: "Z\\\"{u}rich" *)
text1 = StringReplacePart[text,
"\\def\\opLocation{" <> location <> "}", pos1 + {0, i - 1}]
(* Out text1:
"\\def\\opa{2018}
\\def\\opb{Winter}
\\def\\opn{a}
\\def\\opLocation{Z\\\"{u}rich}
\\def\\opd{1}"
*)
If I dont find a solution with regular Expressions, I try it with this workaround, but its not really nice.
|
|
Well, as it seems the German "Umlauts" are the problem - and in particular the brackets there. How about getting rid of them as a very first step, then doing all manipulation/searching/etc. with the data, and as a last step putting them back in (if necessary):
text = "\\def\\opa{2018}
\\def\\opb{Winter}
\\def\\opn{a}
\\def\\opc{Z\\\"{u}rich}
\\def\\opd{1}";
umlautRule = {"\\\"{a}" -> "$$$a", "\\\"{o}" -> "$$$o", "\\\"{u}" -> "$$$u", "\\\"{s}" -> "$$$s"};
invUmlautRule = Reverse /@ umlautRule;
text1 = StringReplace[text, umlautRule];
(* data manipulation: *) result0 = text1;
result = StringReplace[result0, invUmlautRule];
The programming should be nicer then.
|
|
In the example its a "German Umlaut", but there are other options that could contain any LaTeX-Code, as an example:
"\\def\\opx{\frac{x}{e^{x-1}}}"
So its not possible to change the "inner brackets" temporarly.
|
|
Here is my hack:
myString = "abcde(fgh(ij)k)lmn";
subsequences = StringSplit[myString, "(" | ")"];
replacements =
Thread[subsequences -> Map[ToString, Range[Length[subseq]]]];
inverseReplacements = Thread[Range[Length[subseq]] -> subsequences];
StringJoin["{",
StringSplit[myString, x : "(" | ")" :> x] /.
replacements /. {"(" -> ",{", ")" -> "},"}, "}"];
% // ToExpression;
% /. inverseReplacements
Cases[%, _List, {1, Infinity}];
Map[StringJoin[Flatten[#]] &, %]
If I were sure that the string does not contain curly braces it would be half as long. The idea is to convert the string to a WL nested list expression delimited by curly braces. Then it is easy to extract sublists with Cases .
|
|
Hi Juerg, I am sure there is a much more elegant solution, but maybe this is in some way helpful:
string2 = "abcde(fgh(ij)k)lmn";
expr = MakeExpression[RowBox[Characters[string2]], StandardForm];
Level[expr, {Depth[expr] - 1}]
(* Out: {i,j} *)
Level[expr, {Depth[expr] - 2}]
(* Out: {f,g,h,i j,k} *)
Regards -- Henrik
|
|
Reply to this discussion
in reply to
Group Abstract
|