Message Boards Message Boards

Unexpected output from WordCloud[ ] using list of strings

Posted 3 years ago

Hi! I extracted book titles from a CSV file. I am only testing now with 6 titles (they are from a specific research area). When I run the WordCloud function on the file with the 6 book titles, I did not retrieve a WordCloud image. Instead, the 6 titles were just listed horizontally in the image. What did I do wrong? Thank you so much for your time!

data1 = Import[  "/Documents//Test.csv", {"Data", All, 5}] (*column 5*)

WordCloud[data1]
POSTED BY: Yuening Zhang
11 Replies
Posted 3 years ago

Thank you, Rohit. Can you please check your LinkedIN?

POSTED BY: Yuening Zhang
Posted 3 years ago
POSTED BY: Yuening Zhang
Posted 3 years ago

Hi Yuening,

In the Mathematica notebook interface, strings are not enclosed in quotes in output cells. If you want to see the quotes use InputForm

data1 = {"title1", "title 2", "title 3"}
(* {title1, title 2, title 3} *)  (* No quotes displayed *)

InputForm@data1
(* {"title1", "title 2", "title 3"} *)  (* Quotes are displayed *)

WordCloud@StringRiffle@data1

enter image description here

I am not sure what you mean by "does not work". Were you expecting that "title 2" would be treated as a single element rather than "title" and "2"? If you want that, remove the StringRiffle. But with long titles it is probably not what you want because that was the original question you asked.

POSTED BY: Rohit Namjoshi
Posted 3 years ago

Hi Rohit, I am still learning how to do text analysis. So I am testing with a few titles. These are in fact dissertation titles. My purpose is to identify research trends with text analysis. Hope my explanation makes sense. Thank you!

POSTED BY: Yuening Zhang
Posted 3 years ago

Rohit, thank you so much for your time!

POSTED BY: Yuening Zhang
Posted 3 years ago

Hi Yuening,

不客气. I am curious about why you are analyzing word frequency in book titles. Could you share some details?

POSTED BY: Rohit Namjoshi
Posted 3 years ago

May I ask a follow up question? I am aware that in WordCloud, the size of an element is based on the number of occurrences. But is there a way to list the top 5 or 10 most occurred elements/words? Thank you!

POSTED BY: Yuening Zhang
Posted 3 years ago

Using the same s as above.

s // TextWords // Flatten // Counts // TakeLargest[5]
(* <|"and" -> 6, "Liquid" -> 3, "Crystals" -> 3, "to" -> 2, "Using" -> 2|> *)

By default WordCloud removes stop words

s // TextWords // Flatten // DeleteStopwords // Counts // TakeLargest[5]
(* <|"Liquid" -> 3, "Crystals" -> 3, "Light" -> 2, "Using" -> 2, "liquid" -> 2|> *)
POSTED BY: Rohit Namjoshi
Posted 3 years ago

Thank you very much for your help, Rohit!

POSTED BY: Updating Name
Posted 3 years ago

Now the problem is clear. It is a list of string. Convert it to a single string

s = {"Artifical Light-Driven Chiral Molecular Switches Based on Halogenated and Cyclic Azo-Binaphthyl Derivatives","Using Light to Study Liquid Crystals and Using Liquid Crystals to Control Light", "Design and Synthesis of Novel Liquid Crystals and Organic Semiconductors","Responsive liquid crystal films and fibers", "NEMATIC LIQUID CRYSTAL GUEST-HOST SYSTEM FOR EYEWEAR AND RANDOM LASER APPLICATIONS","Defect structure and dynamics in liquid crystals"}

WordCloud@StringRiffle@s
POSTED BY: Rohit Namjoshi
Posted 3 years ago

Hi Rohit, Thank you! I am sorry. I sent the image because when I copied and pasted, the preview looked different from what I have in the program. Here it is:

{"Artifical Light-Driven Chiral Molecular Switches Based on \
Halogenated and Cyclic Azo-Binaphthyl Derivatives", "Using Light to \
Study Liquid Crystals and Using Liquid Crystals to Control Light", \
"Design and Synthesis of Novel Liquid Crystals and Organic \
Semiconductors", "Responsive liquid crystal films and fibers", \
"NEMATIC LIQUID CRYSTAL GUEST-HOST SYSTEM FOR EYEWEAR AND RANDOM \
LASER APPLICATIONS", "Defect structure and dynamics in liquid \
crystals"}
POSTED BY: Yuening Zhang
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract