Group Abstract Group Abstract

Message Boards Message Boards

Find email addresses in text using TextCases? "Recursion limit problem"

POSTED BY: l van Veen
15 Replies

Bug is solved in version 11.3

POSTED BY: l van Veen

Hi Sander, I submitted a case. This is in my opinion a bug indeed. I'll post the outcome.

POSTED BY: l van Veen
POSTED BY: Sander Huisman

It seem to me the pattern used is "wrong" When I change it to

f[x_] := StringCases[x, (WordCharacter | "_" | "%" | "+" | "-" | ".") .. ~~Verbatim["@"] ~~ (WordCharacter | "." | "-") .. ~~ Verbatim["."] ~~Repeated[LetterCharacter, {2, 4}], Overlaps -> False]

it seems to work like a charm.

POSTED BY: l van Veen

That would allow for email addresses starting with a . which is not allowed as far as I know...

POSTED BY: Sander Huisman

You're right. But the addresses have already been evaluated (otherwise I wouldn't have received them) But what about if I force a wordcharacter first. Gives the same result but resolves your point.

f[x_] := StringCases[x, 
  WordCharacter ~~ (WordCharacter | "_" | "%" | "+" | "-" | ".") .. ~~
    Verbatim["@"] ~~ (WordCharacter | "." | "-") .. ~~ Verbatim["."] ~~
    Repeated[LetterCharacter, {2, 4}], Overlaps -> False]
POSTED BY: l van Veen

Well, I think now you can still have multiple . in a row; I think the original stringpattern (stringexpression) is correct, it is either the conversion to RegularExpression that goes wrong, or the evaluation of the regularexpression... I would call this a bug indeed, do you mind sending it in as product feedback?

POSTED BY: Sander Huisman

This is a strange but annoying problem! I noticed you can do a stack trace. Is this the pattern used to find mail addresses?

enter image description here

So when I even use the standard StringCases I run into the same max recursion problem. I would think this must be a bug..?

str = "Lab_Wolfram_Interest_Group <Lab_Wolfram_Interest_Group@groups.wolfram.com>";
StringCases[str, ((WordCharacter | "_" | "%" | "+" | "-") ..~~Repeated[Verbatim["."], {0, 1}]) ..~~Verbatim["@"] ~~WordCharacter | "." | "-") .. ~~ Verbatim["."] ~~Repeated[LetterCharacter, {2, 4}], Overlaps -> False]
POSTED BY: l van Veen
POSTED BY: Sander Huisman

Hi Sander, great to know! btw.. How do you know all this stuff? can't find it in any book. You should write one :)

POSTED BY: l van Veen

Well, I don't know these things by heart, but you can do:

Needs["GeneralUtilities`"]
PrintDefinitions@NaturalLanguageProcessing`TextPosition`PackagePrivate`iTextPosition

to see the internals of TextPosition and click on the various links you will see in that document and go deeper in to the code and so on...

POSTED BY: Sander Huisman

Would splitting the string be acceptable?

Flatten[TextCases[#, "EmailAddress"] & /@ StringSplit["Lab_Wolfram_Interest_Group <Lab_Wolfram_Interest_Group@groups.wolfram.com>"], 1]
POSTED BY: Pedro Fonseca
POSTED BY: Sander Huisman

If you follow the command with your input:

str="Lab_Wolfram_Interest_Group <Lab_Wolfram_Interest_Group@groups.wolfram.com>"
TextCases[str,"EmailAddress"]

calls:

TextCases[str,{"EmailAddress"}]

which calls:

NaturalLanguageProcessing`iTextCases[str,{"EmailAddress"}->"String"]

which calls:

TextPosition[str,{"EmailAddress"}]
POSTED BY: Sander Huisman
POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard