Group Abstract

Message Boards

WOLFRAM COMMUNITY

10.8K Views

5 Replies

3 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Avoid DeleteStopwords to unproperly break hyphenated words?

Konstantin Nosov

Konstantin Nosov, V. N. Karazin Kharkov National University

Posted 7 years ago

POSTED BY: Konstantin Nosov

5 Replies

Sort By:

Rohit Namjoshi

Posted 7 years ago

To get the list of built-in stop words. WordList["Stopwords"] or WordData[All, "Stopwords"]

POSTED BY: Rohit Namjoshi

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 7 years ago

POSTED BY: Marco Thiel

Rohit Namjoshi

Posted 7 years ago

Hi Konstantin, I completely agree, which is why I suggested reporting it to Wolfram Support as a bug.

POSTED BY: Rohit Namjoshi

Konstantin Nosov

Konstantin Nosov, V. N. Karazin Kharkov National University

Posted 7 years ago

Thanks, Rohit But I suppose that stop words should not crack the words. If follow-up is not a stop word, DeleteStopwords has to remain it untouched.

POSTED BY: Konstantin Nosov

Rohit Namjoshi

Posted 7 years ago

That looks like a bug. I would report it to Wolfram Support. One workaround would be to remove trailing hyphens. sentence = "through the mechanism of follow-up of living patients the natural history of various diseases of military-medical importance"; sentence // TextWords // DeleteStopwords // StringReplace["-" ~~ EndOfString -> ""] (* {"mechanism", "follow", "living", "patients", "natural", "history", "various", "diseases", "military-medical", "importance"} *)

That looks like a bug. I would report it to Wolfram Support. One workaround would be to remove trailing hyphens.

sentence = "through the mechanism of follow-up of living patients 
the natural history of various diseases of military-medical 
importance";
sentence // TextWords // DeleteStopwords // StringReplace["-" ~~ EndOfString -> ""]

(* {"mechanism", "follow", "living", "patients", "natural", "history", "various", "diseases", "military-medical", "importance"} *)

POSTED BY: Rohit Namjoshi

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback