Message Boards Message Boards

0
|
5755 Views
|
4 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Get a list of unique words in a text document?

Posted 4 years ago

This may be very trivial but I'm just starting with Mathematica and I don't see how to do it.

I'm using ImageIdentify to generate content guesses for a large number of photos.

I create a text file of the with an entry for each photo that looks like this

file path,{bird, 0.95, sparrow, 0.8, flying bird, 0.9}

I use that data to add the image content guesses to the files comment field so that I can search for all files that ImageIdentify labels as say pigeons.

So I'd like a list of all the names-- in this case bird sparrow and flying bird-- without duplicates.

When I use TextWords it just spits out everything, numbers file paths etc.

Thanks for any help!

POSTED BY: trinko
4 Replies
Posted 4 years ago

Thanks to you guys pointing me in the right direction I figured out a solution:

datext = StringReplace[datext, " " -> ""];
datext = StringReplace[datext, "{" -> ""];
datext = StringReplace[datext, "}" -> ""];
dalist = StringSplit[datext, ","];
dalist = Select[dalist, Not[StringContainsQ[#, "."]] &]
dalist = DeleteDuplicates[dalist]
POSTED BY: trinko
Posted 4 years ago

I'm sorry I wasn't clear. The data is text not a list and it's of the form "Filepath1, {bird, 0.09, flying bird, 0.8, big bird, 0.3},Filepath2,{bird, 0.09, flying bird, 0.8, big bird, 0.3}

So I need to know how to get rid of the file paths and how to turn the text into a list.

Thanks for the help so far though!

POSTED BY: trinko
Posted 4 years ago

Hi Trinko,

Assuming you always have three birds per list

birdData =
  {{"bird", 0.95, "sparrow", 0.8, "flying bird", 0.9},
   {"duck", 0.95, "flying bird", 0.8, "goose", 0.9},
   {"duck", 0.95, "flying bird", 0.8, "swan", 0.9}};

birdData[[All, {1, 3, 5}]] // Flatten // DeleteDuplicates
(* {"bird", "sparrow", "flying bird", "duck", "goose", "swan"} *)

birdData[[All, {1, 3, 5}]] // Flatten // Counts
(* <|"bird" -> 1, "sparrow" -> 1, "flying bird" -> 3, "duck" -> 2,  "goose" -> 1, "swan" -> 1|> *)

If there are a varying number of birds per list the part specification needs to pick every odd element e.g.

birdData[[All, 1 ;; -1 ;; 2]]
POSTED BY: Rohit Namjoshi

One out of many ways is doing it like so:

list0 = {"bird", 0.95, "sparrow", 0.8, "flying bird", 0.9, "sparrow", 0.3, "bird", 42};
DeleteDuplicates[Select[list0, StringQ]]
(*  Out:  {"bird","sparrow","flying bird"}  *)
POSTED BY: Henrik Schachner
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract