Group Abstract Group Abstract

Message Boards Message Boards

Avoid DeleteStopwords to unproperly break hyphenated words?

POSTED BY: Konstantin Nosov
5 Replies
Posted 7 years ago

To get the list of built-in stop words.

WordList["Stopwords"]

or

WordData[All, "Stopwords"]
POSTED BY: Rohit Namjoshi
POSTED BY: Marco Thiel
Posted 7 years ago

Hi Konstantin,

I completely agree, which is why I suggested reporting it to Wolfram Support as a bug.

POSTED BY: Rohit Namjoshi

Thanks, Rohit

But I suppose that stop words should not crack the words. If follow-up is not a stop word, DeleteStopwords has to remain it untouched.

POSTED BY: Konstantin Nosov
Posted 7 years ago

That looks like a bug. I would report it to Wolfram Support. One workaround would be to remove trailing hyphens.

sentence = "through the mechanism of follow-up of living patients 
the natural history of various diseases of military-medical 
importance";
sentence // TextWords // DeleteStopwords // StringReplace["-" ~~ EndOfString -> ""]

(* {"mechanism", "follow", "living", "patients", "natural", "history", "various", "diseases", "military-medical", "importance"} *)
POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard