Message Boards Message Boards

0
|
5959 Views
|
9 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Delete lines in a Wordlist.txt that contain specified chars

Posted 4 years ago

Hello community, I started to learn Wolfram Mathematica v12.0 some days ago and now I want to solve a little Problem.

There is a Wordlist.txt with German, Chinese and some other signs/symbols in it. Now I want to remove every line/word in it, that contains something different then “abc..., ABC....,123...., and !?”.

I did many exercises and one of them was about strings, but I don’t understand how to put those things together yet.

Thanks for some help, with best regards :3

p.s. sorry about my English, missed practicing a very long time :0

POSTED BY: Chris Bart
9 Replies

So etwas vielleicht

text = {"aaabcdabce", "uuu", "bbABCmmm", "123$77", "3333", "ffeg"}
Select[text, Not[StringFreeQ[#, {"abc", "ABC", "123"}]] &]
Select[text, StringFreeQ[#, {"abc", "ABC", "123"}] &]
POSTED BY: Hans Dolhaine
Posted 4 years ago

Hi Hans Dolhaine, first of all many thanks but that's not exactly what I meant.

I want to remove Chinese/Arabic characters from a text file. Only numbers and letters from the German alphabet should be kept. So to put it simply, there should only be a-z, A-Z, no umlauts and "!?".

So e.g. Wordlist.txt before ------> Monkey should! Banana? Dog 今 天 Mother 指

Wordlist.txt after -------> Monkey should! Banana? Dog

I think I picked a somewhat difficult exercise for a beginner, but it is har for me to let go of things that I set out to do.

Kind regards Christian

POSTED BY: Chris Bart
Posted 4 years ago
words = {"Monkey should!", "Banana?", "Dog", "今 天" , "Mother 指"};
includeChars = {CharacterRange["A", "Z"], CharacterRange["a", "z"], 
    CharacterRange["0", "9"], "!", "?", " "} // Flatten;
words // Select[StringFreeQ[#, Except[includeChars]] &]

(* {"Monkey should!", "Banana?", "Dog"} *)
POSTED BY: Rohit Namjoshi
Posted 4 years ago

Oh this won't be easy…

So my first Idea was something like:

Take[WordList[Wordlist.txt], StringDelete[{"ÿ","þ", "\[Not]", "Ø", "\.b2", "©", "yY","\[PlusMinus]"}]]

The code looks crazy when I try to copy and paste it so please look at the Picture.

Maybe a Problem with extra symbols / Chinese chars?

So this only works with “normal” characters or numbers and you have to say which characters you want to delete,. But I want to say which should stay in the file and also it does not delete the whole string only the character you put in the list.

So as you can see I'm not experienced with Mathematica, but I'm willing learn :D

Attachment

Attachments:
POSTED BY: Chris Bart
Posted 4 years ago

Hi Christian,

I made up some data

SeedRandom[1];
words = RandomWord["CommonWords", 20, Language -> "German"] // 
  Join[#, {"树", "指", "都", "鱼", "刻", "朵", "今天"}] &

includeChars = {CharacterRange["A", "Z"], CharacterRange["a", "z"], 
   CharacterRange["0", "9"], "!"} // Flatten;

words // Select[! StringContainsQ[#, Except[includeChars]] &]

(*
{"Verhandlungsgeschick", "baggern", "Separation", "jure", 
"Rechenschritt", "Kriegsbereit", "Riesenteleskop", "Statusbit", 
"Dreieck", "cineastisch", "demselben", "Andalusier"}
*)
POSTED BY: Rohit Namjoshi
Posted 4 years ago

Hello Rohit, its not to 100% what I want but pretty close.

Is it possible to take a Wordlist.txt compute the code and save the changes to the Wordlist.txt?

This would be a solutioen I can agree with for the moment :)

As I said to Dolhaine, maybe this question was to complex for a beginner like me. But I cant sleep until a problem is solved :D

OK I dont understand everything you did yet but a lot of practice will clear things out ;)

Thank you for helping me out.

POSTED BY: Chris Bart
Posted 4 years ago

Hi Christian,

I created a sample file wordlist.txt (see attached).

includeChars = {CharacterRange["A", "Z"], CharacterRange["a", "z"], 
    CharacterRange["0", "9"], "!", "?", " "} // Flatten;

wordList = Import["~/wordlist.txt", "Data"]

filtered = wordList // Select[StringFreeQ[#, Except[includeChars]] &]

Export["~/filteredList.txt", filtered]

The file filteredList.txt has the lines that only have the characters in includeChars.

Attachments:
POSTED BY: Rohit Namjoshi
Posted 4 years ago

Hello Rohit,

Yes this is exactly what I was thinking of and it works great. Thank you for your time and effort.

Have a nice day.

Greetings,

Christian

POSTED BY: Chris Bart

Please give an example that can be worked on

POSTED BY: l van Veen
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract