WolframAlpha.com
WolframCloud.com
All Sites & Public Resources...
Products & Services
Wolfram|One
Mathematica
Wolfram|Alpha Notebook Edition
Finance Platform
System Modeler
Wolfram Player
Wolfram Engine
WolframScript
Enterprise Private Cloud
Application Server
Enterprise Mathematica
Wolfram|Alpha Appliance
Enterprise Solutions
Corporate Consulting
Technical Consulting
Wolfram|Alpha Business Solutions
Resource System
Data Repository
Neural Net Repository
Function Repository
Wolfram|Alpha
Wolfram|Alpha Pro
Problem Generator
API
Data Drop
Products for Education
Mobile Apps
Wolfram Player
Wolfram Cloud App
Wolfram|Alpha for Mobile
Wolfram|Alpha-Powered Apps
Services
Paid Project Support
Wolfram U
Summer Programs
All Products & Services »
Technologies
Wolfram Language
Revolutionary knowledge-based programming language.
Wolfram Cloud
Central infrastructure for Wolfram's cloud products & services.
Wolfram Science
Technology-enabling science of the computational universe.
Wolfram Notebooks
The preeminent environment for any technical workflows.
Wolfram Engine
Software engine implementing the Wolfram Language.
Wolfram Natural Language Understanding System
Knowledge-based broadly deployed natural language.
Wolfram Data Framework
Semantic framework for real-world data.
Wolfram Universal Deployment System
Instant deployment across cloud, desktop, mobile, and more.
Wolfram Knowledgebase
Curated computable knowledge powering Wolfram|Alpha.
All Technologies »
Solutions
Engineering, R&D
Aerospace & Defense
Chemical Engineering
Control Systems
Electrical Engineering
Image Processing
Industrial Engineering
Mechanical Engineering
Operations Research
More...
Finance, Statistics & Business Analysis
Actuarial Sciences
Bioinformatics
Data Science
Econometrics
Financial Risk Management
Statistics
More...
Education
All Solutions for Education
Trends
Machine Learning
Multiparadigm Data Science
Internet of Things
High-Performance Computing
Hackathons
Software & Web
Software Development
Authoring & Publishing
Interface Development
Web Development
Sciences
Astronomy
Biology
Chemistry
More...
All Solutions »
Learning & Support
Learning
Wolfram Language Documentation
Fast Introduction for Programmers
Wolfram U
Videos & Screencasts
Wolfram Language Introductory Book
Webinars & Training
Summer Programs
Books
Need Help?
Support FAQ
Wolfram Community
Contact Support
Premium Support
Paid Project Support
Technical Consulting
All Learning & Support »
Company
About
Company Background
Wolfram Blog
Events
Contact Us
Work with Us
Careers at Wolfram
Internships
Other Wolfram Language Jobs
Initiatives
Wolfram Foundation
MathWorld
Computer-Based Math
A New Kind of Science
Wolfram Technology for Hackathons
Student Ambassador Program
Wolfram for Startups
Demonstrations Project
Wolfram Innovator Awards
Wolfram + Raspberry Pi
Summer Programs
More...
All Company »
Search
WOLFRAM COMMUNITY
Connect with users of Wolfram technologies to learn, solve problems and share ideas
Join
Sign In
Dashboard
Groups
People
Message Boards
Answer
(
Unmark
)
Mark as an Answer
GROUPS:
Staff Picks
Graphics and Visualization
Graphs and Networks
Wolfram Language
Natural Language Processing
Wolfram High School Summer Camp
Computational Humanities
3
Christine Tsu
[WSC21] Creating word clouds of character names from works of fiction
Christine Tsu
Posted
1 year ago
2385 Views
|
1 Reply
|
3 Total Likes
Follow this post
|
Creating word clouds of character names from works of fiction
by
Christine Tsu
Introduction
Characters provide the driving force of works of fiction, adding a sense of relatability, personality, and emotion as well as forming a medium through which readers can experience the story. Using a program to identify character names in works of fiction can help reveal trends in the ways characters are written, and visualizations can enhance a reader’s understanding of and connection with the text. My project uses proper nouns and key words, such as "said," "asked," and "believed," to identify character names in works of fiction. It displays those names in a word cloud based on the frequency of each name and also creates a graph of the character relationships, with thicker edges indicating a closer relationship.
Identifying Proper Nouns
This specific code will work for any book entities that have the "Plaintext" property, but most of the code could easily be adapted to work for any plaintext that the user inputs. To begin finding the character names in a work of fiction, I first identified the proper nouns in the work.
Getting the plaintext of a book:
I
n
[
]
:
=
g
e
t
B
o
o
k
[
t
i
t
l
e
_
]
:
=
E
n
t
i
t
y
V
a
l
u
e
[
I
n
t
e
r
p
r
e
t
e
r
[
"
B
o
o
k
"
]
[
t
i
t
l
e
]
,
"
P
l
a
i
n
t
e
x
t
"
]
Identifying the proper nouns in a text:
I
n
[
]
:
=
g
e
t
P
r
o
p
e
r
N
o
u
n
s
[
b
o
o
k
_
]
:
=
T
e
x
t
C
a
s
e
s
[
b
o
o
k
,
"
P
r
o
p
e
r
N
o
u
n
"
]
/
/
D
e
l
e
t
e
D
u
p
l
i
c
a
t
e
s
Testing it on
Alice’s Adventures in Wonderland
by Lewis Carroll:
g
e
t
P
r
o
p
e
r
N
o
u
n
s
[
g
e
t
B
o
o
k
[
"
A
l
i
c
e
'
s
A
d
v
e
n
t
u
r
e
s
i
n
W
o
n
d
e
r
l
a
n
d
"
]
]
O
u
t
[
]
=
{
A
l
i
c
e
,
W
h
i
t
e
,
O
R
A
N
G
E
M
A
R
M
A
L
A
D
E
,
D
i
n
a
h
,
D
R
I
N
K
M
E
,
I
I
,
E
n
g
l
i
s
h
,
P
o
o
r
A
l
i
c
e
,
D
u
c
h
e
s
s
,
D
e
a
r
,
O
M
o
u
s
e
,
W
i
l
l
i
a
m
,
C
o
n
q
u
e
r
o
r
,
D
o
d
o
,
L
o
r
y
,
E
a
g
l
e
t
,
I
I
I
,
A
C
A
U
C
U
S
-
R
A
C
E
,
A
h
e
m
,
E
d
w
i
n
,
M
o
r
c
a
r
,
E
a
r
l
s
,
M
e
r
c
i
a
,
N
o
r
t
h
u
m
b
r
i
a
,
S
t
i
g
a
n
d
,
C
a
n
t
e
r
b
u
r
y
,
E
d
g
a
r
A
t
h
e
l
i
n
g
,
P
r
i
z
e
s
,
M
i
n
e
,
C
a
n
a
r
y
,
W
h
i
t
e
R
a
b
b
i
t
,
M
a
r
y
A
n
n
,
R
u
n
,
W
.
,
P
a
t
,
B
i
l
l
,
N
a
y
,
B
r
a
n
d
y
,
A
D
V
I
C
E
F
R
O
M
A
C
A
T
E
R
P
I
L
L
A
R
,
C
a
t
e
r
p
i
l
l
a
r
,
P
i
g
e
o
n
,
U
g
h
,
S
e
r
p
e
n
t
,
V
I
,
F
i
s
h
-
F
o
o
t
m
a
n
,
c
r
o
q
u
e
t
,
F
r
o
g
-
F
o
o
t
m
a
n
,
Q
u
e
e
n
,
F
o
o
t
m
a
n
,
C
h
e
s
h
i
r
e
-
C
a
t
,
C
h
e
s
h
i
r
e
-
C
a
t
s
,
C
h
e
s
h
i
r
e
-
P
u
s
s
,
C
a
t
,
M
a
r
c
h
H
a
r
e
,
H
a
t
t
e
r
,
D
o
r
m
o
u
s
e
,
C
R
O
Q
U
E
T
,
M
i
s
s
,
F
i
r
s
t
,
K
i
n
g
s
,
Q
u
e
e
n
s
,
K
n
a
v
e
,
H
e
a
r
t
s
,
K
i
n
g
,
M
a
j
e
s
t
y
,
H
e
r
a
l
d
,
M
a
r
c
h
,
P
e
p
p
e
r
,
A
L
I
C
E
}
Using Dialogue Tags
For the most part, only characters are able to speak, so dialogue tags (such as “he said” or “she exclaimed”) will help identify character names and distinguish them from other proper nouns (names of places, months, holidays, etc.).
I didn’t want the article “the” to stay in the text because it is not actually part of someone’s name. For example, I wanted to pull out “Hatter” instead of “the Hatter.” I split the text at whitespace and punctuation characters, which left me with a list of words and spaces that I could then use to identify the locations of dialogue words, and subsequently, the locations of character names.
Removing "the" and splitting the plaintext at whitespace and punctuation characters:
I
n
[
]
:
=
g
e
t
T
e
x
t
[
t
i
t
l
e
_
]
:
=
S
t
r
i
n
g
S
p
l
i
t
[
S
t
r
i
n
g
R
e
p
l
a
c
e
[
g
e
t
B
o
o
k
[
t
i
t
l
e
]
,
"
t
h
e
"
-
>
"
"
]
,
{
W
h
i
t
e
s
p
a
c
e
,
P
u
n
c
t
u
a
t
i
o
n
C
h
a
r
a
c
t
e
r
}
]
;
I wanted to catch two-word names, such as "Mary Jones" and "Ms. Jones," as well as one-word names. However, names with abbreviations, like Ms. Jones, will now appear with a blank space between the abbreviation and the last name. To format all two-word names in the same way, I removed the space between “Mrs” and “Jones” in the list.
Example of splitting a string with "Mary Jones" and "Ms. Jones":
s
p
l
i
t
S
t
r
i
n
g
=
S
t
r
i
n
g
S
p
l
i
t
[
"
M
a
r
y
J
o
n
e
s
a
n
d
M
s
.
J
o
n
e
s
"
,
{
W
h
i
t
e
s
p
a
c
e
,
P
u
n
c
t
u
a
t
i
o
n
C
h
a
r
a
c
t
e
r
}
]
O
u
t
[
]
=
{
M
a
r
y
,
J
o
n
e
s
,
a
n
d
,
M
s
,
,
J
o
n
e
s
}
Deleting the space after an abbreviation:
I
n
[
]
:
=
D
e
l
e
t
e
[
s
p
l
i
t
S
t
r
i
n
g
,
P
o
s
i
t
i
o
n
[
s
p
l
i
t
S
t
r
i
n
g
,
"
M
s
"
]
+
1
]
O
u
t
[
]
=
{
M
a
r
y
,
J
o
n
e
s
,
a
n
d
,
M
s
,
J
o
n
e
s
}
I applied this logic to create a function that will do the same for the entire text.
A function that deletes the space after abbreviations:
I
n
[
]
:
=
s
h
o
r
t
e
n
A
b
b
r
e
v
i
a
t
i
o
n
s
[
t
e
x
t
_
]
:
=
M
o
d
u
l
e
[
{
a
b
b
r
e
v
i
a
t
i
o
n
I
n
d
i
c
e
s
=
P
o
s
i
t
i
o
n
[
t
e
x
t
,
"
M
r
s
"
|
"
M
r
"
|
"
M
s
"
|
"
D
r
"
|
"
M
x
"
]
}
,
D
e
l
e
t
e
[
t
e
x
t
,
a
b
b
r
e
v
i
a
t
i
o
n
I
n
d
i
c
e
s
+
1
]
]
;
Next, I needed to find the indices of dialogue words (like "said," "demanded," "exclaimed," etc.). I caught two-word names as well as one-word names by checking if there was an uppercase word located at an index either two above or two below the index of the dialogue word. If not, I just took the single words on either side of the speech word.
A function that finds the indices of speech words and puts the names around them into a list:
I
n
[
]
:
=
f
i
n
d
S
p
e
e
c
h
W
o
r
d
s
[
t
e
x
t
_
]
:
=
M
o
d
u
l
e
s
p
e
e
c
h
I
n
d
i
c
e
s
=
F
l
a
t
t
e
n
P
o
s
i
t
i
o
n
t
e
x
t
,
S
p
e
e
c
h
W
o
r
d
s
,
s
p
e
e
c
h
W
o
r
d
s
,
s
p
e
e
c
h
W
o
r
d
s
=
F
l
a
t
t
e
n
[
T
a
b
l
e
[
W
h
i
c
h
[
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
+
2
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
&
&
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
+
1
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
,
S
t
r
i
n
g
J
o
i
n
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
+
1
]
]
,
"
"
,
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
+
2
]
]
]
,
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
-
2
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
&
&
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
-
1
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
,
S
t
r
i
n
g
J
o
i
n
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
-
2
]
]
,
"
"
,
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
-
1
]
]
]
,
T
r
u
e
,
{
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
-
1
]
]
,
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
+
1
]
]
}
]
,
{
i
,
L
e
n
g
t
h
[
s
p
e
e
c
h
I
n
d
i
c
e
s
]
}
]
]
/
/
D
e
l
e
t
e
D
u
p
l
i
c
a
t
e
s
Since I removed the periods after abbreviations like "Ms.," I put them back in by adding a line of code to the end of the function above.
A function that finds the indices of speech words, puts the words around them into a list, and adds the periods back to abbreviations:
I
n
[
]
:
=
f
i
n
d
S
p
e
e
c
h
W
o
r
d
s
[
t
e
x
t
_
]
:
=
M
o
d
u
l
e
s
p
e
e
c
h
I
n
d
i
c
e
s
=
F
l
a
t
t
e
n
P
o
s
i
t
i
o
n
t
e
x
t
,
S
p
e
e
c
h
W
o
r
d
s
,
s
p
e
e
c
h
W
o
r
d
s
,
s
p
e
e
c
h
W
o
r
d
s
=
F
l
a
t
t
e
n
[
T
a
b
l
e
[
W
h
i
c
h
[
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
+
2
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
&
&
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
+
1
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
,
S
t
r
i
n
g
J
o
i
n
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
+
1
]
]
,
"
"
,
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
+
2
]
]
]
,
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
-
2
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
&
&
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
-
1
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
,
S
t
r
i
n
g
J
o
i
n
[
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
-
2
]
]
,
"
"
,
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
-
1
]
]
]
,
T
r
u
e
,
{
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
-
1
]
]
,
t
e
x
t
[
[
s
p
e
e
c
h
I
n
d
i
c
e
s
[
[
i
]
]
+
1
]
]
}
]
,
{
i
,
L
e
n
g
t
h
[
s
p
e
e
c
h
I
n
d
i
c
e
s
]
}
]
]
/
/
D
e
l
e
t
e
D
u
p
l
i
c
a
t
e
s
;
s
p
e
e
c
h
W
o
r
d
s
=
I
f
[
S
t
r
i
n
g
C
o
n
t
a
i
n
s
Q
[
#
,
"
M
r
s
"
|
"
M
r
"
|
"
M
s
"
|
"
D
r
"
|
"
M
x
"
]
,
S
t
r
i
n
g
R
e
p
l
a
c
e
[
#
,
"
"
-
>
"
.
"
]
,
#
]
&
/
@
s
p
e
e
c
h
W
o
r
d
s
Testing the function on
Alice’s Adventures in Wonderland
:
I
n
[
]
:
=
f
i
n
d
S
p
e
e
c
h
W
o
r
d
s
[
s
h
o
r
t
e
n
A
b
b
r
e
v
i
a
t
i
o
n
s
[
g
e
t
T
e
x
t
[
"
A
l
i
c
e
'
s
A
d
v
e
n
t
u
r
e
s
i
n
W
o
n
d
e
r
l
a
n
d
"
]
]
]
O
u
t
[
]
=
{
,
A
l
i
c
e
,
s
h
e
,
a
n
d
,
a
n
x
i
o
u
s
l
y
,
t
h
i
s
,
t
h
e
s
e
,
i
t
,
n
o
t
h
i
n
g
,
M
o
u
s
e
,
L
o
r
y
,
D
u
c
k
,
r
a
t
h
e
r
,
D
o
d
o
,
E
a
g
l
e
t
,
v
o
i
c
e
s
,
i
n
,
F
u
r
y
,
t
o
,
c
u
n
n
i
n
g
,
v
o
i
c
e
,
h
e
a
r
,
n
o
w
,
m
a
s
t
e
r
,
y
o
u
,
C
a
t
e
r
p
i
l
l
a
r
,
v
e
r
y
,
h
a
s
t
i
l
y
,
h
a
d
,
P
i
g
e
o
n
,
F
o
o
t
m
a
n
,
t
w
o
,
D
u
c
h
e
s
s
,
S
h
e
,
l
a
s
t
,
C
a
t
,
t
h
e
y
,
o
u
t
,
h
e
,
w
a
s
,
a
l
o
u
d
,
M
a
r
c
h
H
a
r
e
,
D
o
r
m
o
u
s
e
,
H
a
t
t
e
r
,
n
o
,
S
e
v
e
n
,
Q
u
e
e
n
,
s
e
v
e
r
e
l
y
,
a
,
R
a
b
b
i
t
,
S
o
m
e
b
o
d
y
,
W
h
i
t
e
R
a
b
b
i
t
,
K
i
n
g
,
c
o
o
k
,
w
h
o
,
K
n
a
v
e
,
h
e
r
}
Using Other Key Verbs
Aside from dialogue tags, I also used other verbs that tend to be exclusively used for characters (like "thought" or "believed"). The same method can be used to create a list of words that appear before these verbs.
A function that creates a list of words that appear before specific verbs:
I
n
[
]
:
=
f
i
n
d
O
t
h
e
r
V
e
r
b
s
[
t
e
x
t
_
]
:
=
M
o
d
u
l
e
[
{
v
e
r
b
I
n
d
i
c
e
s
=
F
l
a
t
t
e
n
[
P
o
s
i
t
i
o
n
[
t
e
x
t
,
"
b
o
u
g
h
t
"
|
"
b
u
y
s
"
|
"
h
e
a
r
d
"
|
"
h
e
a
r
s
"
|
"
t
h
o
u
g
h
t
"
|
"
t
h
i
n
k
s
"
|
"
s
a
t
"
|
"
s
i
t
s
"
|
"
b
e
l
i
e
v
e
d
"
|
"
b
e
l
i
e
v
e
s
"
]
]
,
v
e
r
b
W
o
r
d
s
}
,
v
e
r
b
W
o
r
d
s
=
D
e
l
e
t
e
C
a
s
e
s
[
F
l
a
t
t
e
n
[
T
a
b
l
e
[
W
h
i
c
h
[
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
v
e
r
b
I
n
d
i
c
e
s
[
[
i
]
]
-
2
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
&
&
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
v
e
r
b
I
n
d
i
c
e
s
[
[
i
]
]
-
1
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
,
S
t
r
i
n
g
J
o
i
n
[
t
e
x
t
[
[
v
e
r
b
I
n
d
i
c
e
s
[
[
i
]
]
-
2
]
]
,
"
"
,
t
e
x
t
[
[
v
e
r
b
I
n
d
i
c
e
s
[
[
i
]
]
-
1
]
]
]
,
S
t
r
i
n
g
S
t
a
r
t
s
Q
[
t
e
x
t
[
[
v
e
r
b
I
n
d
i
c
e
s
[
[
i
]
]
-
1
]
]
,
T
o
U
p
p
e
r
C
a
s
e
[
A
l
p
h
a
b
e
t
[
]
]
]
,
t
e
x
t
[
[
v
e
r
b
I
n
d
i
c
e
s
[
[
i
]
]
-
1
]
]
]
,
{
i
,
L
e
n
g
t
h
[
v
e
r
b
I
n
d
i
c
e
s
]
}
]
]
,
N
u
l
l
]
/
/
D
e
l
e
t
e
D
u
p
l
i
c
a
t
e
s
;
v
e
r
b
W
o
r
d
s
=
I
f
[
S
t
r
i
n
g
C
o
n
t
a
i
n
s
Q
[
#
,
"
M
r
s
"
|
"
M
r
"
|
"
M
s
"
|
"
D
r
"
|
"
M
x
"
]
,
S
t
r
i
n
g
R
e
p
l
a
c
e
[
#
,
"
"
-
>
"
.
"
]
,
#
]
&
/
@
v
e
r
b
W
o
r
d
s
]
;
Testing the function on
Pride and Prejudice
:
I
n
[
]
:
=
f
i
n
d
O
t
h
e
r
V
e
r
b
s
[
s
h
o
r
t
e
n
A
b
b
r
e
v
i
a
t
i
o
n
s
[
g
e
t
T
e
x
t
[
"
P
r
i
d
e
a
n
d
P
r
e
j
u
d
i
c
e
"
]
]
]
O
u
t
[
]
=
{
M
r
.
B
i
n
g
l
e
y
,
I
,
E
l
i
z
a
b
e
t
h
,
M
r
s
.
H
u
r
s
t
,
H
e
,
M
a
r
i
a
,
M
r
s
.
C
o
l
l
i
n
s
,
Y
o
u
,
L
y
d
i
a
,
M
r
s
.
G
a
r
d
i
n
e
r
,
M
r
.
B
e
n
n
e
t
,
C
o
l
o
n
e
l
F
o
r
s
t
e
r
,
J
a
n
e
,
S
h
e
,
M
r
s
.
B
e
n
n
e
t
}
Removing Duplicates
From the work above, I gathered a list of proper nouns, words from dialogue tags, and words from key verbs. I first combined the words from dialogue tags and words from key verbs to catch as many character names as possible. Then, I found the intersection of this new list with the list of proper nouns.
A function that uses all three lists to create a new list:
I
n
[
]
:
=
g
e
t
I
n
t
e
r
s
e
c
t
i
o
n
[
w
o
r
d
L
i
s
t
1
_
,
w
o
r
d
L
i
s
t
2
_
,
p
r
o
p
e
r
N
o
u
n
L
i
s
t
_
]
:
=
F
l
a
t
t
e
n
[
I
n
t
e
r
s
e
c
t
i
o
n
[
p
r
o
p
e
r
N
o
u
n
L
i
s
t
,
U
n
i
o
n
[
w
o
r
d
L
i
s
t
1
,
w
o
r
d
L
i
s
t
2
]
]
]
;
Testing it on
The Secret Garden
:
I
n
[
]
:
=
s
h
o
r
t
e
n
e
d
T
e
x
t
=
s
h
o
r
t
e
n
A
b
b
r
e
v
i
a
t
i
o
n
s
[
g
e
t
T
e
x
t
[
"
T
h
e
S
e
c
r
e
t
G
a
r
d
e
n
"
]
]
;
g
e
t
I
n
t
e
r
s
e
c
t
i
o
n
[
f
i
n
d
S
p
e
e
c
h
W
o
r
d
s
[
s
h
o
r
t
e
n
e
d
T
e
x
t
]
,
f
i
n
d
O
t
h
e
r
V
e
r
b
s
[
s
h
o
r
t
e
n
e
d
T
e
x
t
]
,
g
e
t
P
r
o
p
e
r
N
o
u
n
s
[
g
e
t
B
o
o
k
[
"
T
h
e
S
e
c
r
e
t
G
a
r
d
e
n
"
]
]
]
O
u
t
[
]
=
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
B
a
s
i
l
,
B
e
n
,
B
e
n
W
e
a
t
h
e
r
s
t
a
f
f
,
C
o
l
i
n
,
C
o
l
o
n
e
l
M
c
G
r
e
w
,
D
i
c
k
o
n
,
D
r
.
C
r
a
v
e
n
,
M
a
r
t
h
a
,
M
a
r
y
,
M
a
s
t
e
r
C
r
a
v
e
n
,
M
e
m
S
a
h
i
b
,
M
e
s
t
e
r
C
o
l
i
n
,
M
i
s
s
M
a
r
y
,
M
i
s
t
r
e
s
s
M
a
r
y
,
M
o
t
h
e
r
,
M
r
.
C
r
a
v
e
n
,
M
r
.
R
o
a
c
h
,
M
r
s
.
M
e
d
l
o
c
k
,
M
r
s
.
S
o
w
e
r
b
y
,
N
u
t
,
R
a
j
a
h
,
R
o
a
c
h
,
r
o
b
i
n
,
S
u
s
a
n
S
o
w
e
r
b
y
,
W
e
a
t
h
e
r
s
t
a
f
f
}
There is some overlap here. For instance, "Mary," "Mary Lennox," "Miss Mary," and "Mistress Mary" all refer to the same person. To remove these duplicates, I first separated the names into a list of one-word names and a list of two-word names.
Separating a list of character names into one-word and two-word names:
I
n
[
]
:
=
t
w
o
W
o
r
d
N
a
m
e
s
=
{
}
;
o
n
e
W
o
r
d
N
a
m
e
s
=
{
}
;
I
f
[
S
t
r
i
n
g
C
o
n
t
a
i
n
s
Q
[
#
,
_
_
~
~
"
"
~
~
_
_
]
,
t
w
o
W
o
r
d
N
a
m
e
s
=
A
p
p
e
n
d
[
t
w
o
W
o
r
d
N
a
m
e
s
,
#
]
,
o
n
e
W
o
r
d
N
a
m
e
s
=
A
p
p
e
n
d
[
o
n
e
W
o
r
d
N
a
m
e
s
,
#
]
]
&
/
@
C
h
a
r
a
c
t
e
r
N
a
m
e
s
;
o
n
e
W
o
r
d
N
a
m
e
s
t
w
o
W
o
r
d
N
a
m
e
s
O
u
t
[
]
=
{
B
a
s
i
l
,
B
e
n
,
C
o
l
i
n
,
D
i
c
k
o
n
,
M
a
r
t
h
a
,
M
a
r
y
,
M
o
t
h
e
r
,
N
u
t
,
R
a
j
a
h
,
R
o
a
c
h
,
r
o
b
i
n
,
W
e
a
t
h
e
r
s
t
a
f
f
}
O
u
t
[
]
=
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
B
e
n
W
e
a
t
h
e
r
s
t
a
f
f
,
C
o
l
o
n
e
l
M
c
G
r
e
w
,
D
r
.
C
r
a
v
e
n
,
M
a
s
t
e
r
C
r
a
v
e
n
,
M
e
m
S
a
h
i
b
,
M
e
s
t
e
r
C
o
l
i
n
,
M
i
s
s
M
a
r
y
,
M
i
s
t
r
e
s
s
M
a
r
y
,
M
r
.
C
r
a
v
e
n
,
M
r
.
P
i
t
c
h
e
r
,
M
r
.
R
o
a
c
h
,
M
r
s
.
C
r
a
w
f
o
r
d
,
M
r
s
.
M
e
d
l
o
c
k
,
M
r
s
.
S
o
w
e
r
b
y
,
S
u
s
a
n
S
o
w
e
r
b
y
}
I went through each item in the list of twoWordNames and checked if part of it was already included in oneWordNames (for example, "Miss Mary" includes "Mary"). Then, I collected each group of duplicates in a list.
Collecting the groups of duplicate characters in a list:
I
n
[
]
:
=
d
u
p
l
i
c
a
t
e
C
h
a
r
a
c
t
e
r
s
=
D
e
l
e
t
e
C
a
s
e
s
[
F
l
a
t
t
e
n
[
T
a
b
l
e
[
W
h
i
c
h
[
S
t
r
i
n
g
C
o
u
n
t
[
i
,
j
]
!
=
0
&
&
i
!
=
j
,
{
i
,
j
}
]
,
{
i
,
t
w
o
W
o
r
d
N
a
m
e
s
}
,
{
j
,
o
n
e
W
o
r
d
N
a
m
e
s
}
]
,
1
]
,
N
u
l
l
]
O
u
t
[
]
=
{
{
B
e
n
W
e
a
t
h
e
r
s
t
a
f
f
,
B
e
n
}
,
{
B
e
n
W
e
a
t
h
e
r
s
t
a
f
f
,
W
e
a
t
h
e
r
s
t
a
f
f
}
,
{
M
e
s
t
e
r
C
o
l
i
n
,
C
o
l
i
n
}
,
{
M
i
s
s
M
a
r
y
,
M
a
r
y
}
,
{
M
i
s
t
r
e
s
s
M
a
r
y
,
M
a
r
y
}
,
{
M
r
.
R
o
a
c
h
,
R
o
a
c
h
}
}
I started to create a list of all of the characters by first adding the characters that were not included in the duplicate list.
Collecting the character names that are not duplicates:
I
n
[
]
:
=
a
l
l
C
h
a
r
a
c
t
e
r
s
=
C
o
m
p
l
e
m
e
n
t
C
h
a
r
a
c
t
e
r
N
a
m
e
s
,
F
l
a
t
t
e
n
[
d
u
p
l
i
c
a
t
e
C
h
a
r
a
c
t
e
r
s
]
O
u
t
[
]
=
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
B
a
s
i
l
,
C
o
l
o
n
e
l
M
c
G
r
e
w
,
D
i
c
k
o
n
,
D
r
.
C
r
a
v
e
n
,
M
a
r
t
h
a
,
M
a
s
t
e
r
C
r
a
v
e
n
,
M
e
m
S
a
h
i
b
,
M
o
t
h
e
r
,
M
r
.
C
r
a
v
e
n
,
M
r
.
P
i
t
c
h
e
r
,
M
r
s
.
C
r
a
w
f
o
r
d
,
M
r
s
.
M
e
d
l
o
c
k
,
M
r
s
.
S
o
w
e
r
b
y
,
N
u
t
,
R
a
j
a
h
,
r
o
b
i
n
,
S
u
s
a
n
S
o
w
e
r
b
y
}
I chose the shortest item from each list of duplicate characters (for example, “Mary” instead of “Miss Mary”) and added it to the list of all of the characters.
Add the shorter name from each of the duplicate character lists to the list of all characters:
I
n
[
]
:
=
T
a
b
l
e
[
a
l
l
C
h
a
r
a
c
t
e
r
s
=
A
p
p
e
n
d
[
a
l
l
C
h
a
r
a
c
t
e
r
s
,
i
[
[
F
l
a
t
t
e
n
[
P
o
s
i
t
i
o
n
[
S
t
r
i
n
g
L
e
n
g
t
h
[
i
]
,
M
i
n
[
S
t
r
i
n
g
L
e
n
g
t
h
[
i
]
]
]
]
]
]
]
,
{
i
,
d
u
p
l
i
c
a
t
e
C
h
a
r
a
c
t
e
r
s
}
]
;
a
l
l
C
h
a
r
a
c
t
e
r
s
O
u
t
[
]
=
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
B
a
s
i
l
,
C
o
l
o
n
e
l
M
c
G
r
e
w
,
D
i
c
k
o
n
,
D
r
.
C
r
a
v
e
n
,
M
a
r
t
h
a
,
M
a
s
t
e
r
C
r
a
v
e
n
,
M
e
m
S
a
h
i
b
,
M
o
t
h
e
r
,
M
r
.
C
r
a
v
e
n
,
M
r
.
P
i
t
c
h
e
r
,
M
r
s
.
C
r
a
w
f
o
r
d
,
M
r
s
.
M
e
d
l
o
c
k
,
M
r
s
.
S
o
w
e
r
b
y
,
N
u
t
,
R
a
j
a
h
,
r
o
b
i
n
,
S
u
s
a
n
S
o
w
e
r
b
y
,
{
B
e
n
}
,
{
W
e
a
t
h
e
r
s
t
a
f
f
}
,
{
C
o
l
i
n
}
,
{
M
a
r
y
}
,
{
M
a
r
y
}
,
{
R
o
a
c
h
}
}
Organize the list and delete duplicates:
I
n
[
]
:
=
a
l
l
C
h
a
r
a
c
t
e
r
s
=
S
o
r
t
[
F
l
a
t
t
e
n
[
a
l
l
C
h
a
r
a
c
t
e
r
s
/
/
D
e
l
e
t
e
D
u
p
l
i
c
a
t
e
s
]
]
O
u
t
[
]
=
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
B
a
s
i
l
,
B
e
n
,
C
o
l
i
n
,
C
o
l
o
n
e
l
M
c
G
r
e
w
,
D
i
c
k
o
n
,
D
r
.
C
r
a
v
e
n
,
M
a
r
t
h
a
,
M
a
r
y
,
M
a
s
t
e
r
C
r
a
v
e
n
,
M
e
m
S
a
h
i
b
,
M
o
t
h
e
r
,
M
r
.
C
r
a
v
e
n
,
M
r
.
P
i
t
c
h
e
r
,
M
r
s
.
C
r
a
w
f
o
r
d
,
M
r
s
.
M
e
d
l
o
c
k
,
M
r
s
.
S
o
w
e
r
b
y
,
N
u
t
,
R
a
j
a
h
,
R
o
a
c
h
,
r
o
b
i
n
,
S
u
s
a
n
S
o
w
e
r
b
y
,
W
e
a
t
h
e
r
s
t
a
f
f
}
Putting It All Together
I wrapped all of the steps above in a function that removes duplicates in a list of characters. One issue is that the function isn't able to tell if names like "Mrs. Sowerby" and "Susan Sowerby" refer to the same person or not. I decided to leave the duplicates in rather than only including one name that has "Sowerby" in it because it is possible that two people have the same last name (for instance, if they are related to each other).
A function that removes duplicates in a list of characters:
I
n
[
]
:
=
r
e
m
o
v
e
D
u
p
l
i
c
a
t
e
s
[
c
h
a
r
a
c
t
e
r
L
i
s
t
_
]
:
=
M
o
d
u
l
e
[
{
t
w
o
W
o
r
d
N
a
m
e
s
=
{
}
,
o
n
e
W
o
r
d
N
a
m
e
s
=
{
}
,
d
u
p
l
i
c
a
t
e
C
h
a
r
a
c
t
e
r
s
=
{
}
,
a
l
l
C
h
a
r
a
c
t
e
r
s
}
,
I
f
[
S
t
r
i
n
g
C
o
n
t
a
i
n
s
Q
[
#
,
_
_
~
~
"
"
~
~
_
_
]
,
t
w
o
W
o
r
d
N
a
m
e
s
=
A
p
p
e
n
d
[
t
w
o
W
o
r
d
N
a
m
e
s
,
#
]
,
o
n
e
W
o
r
d
N
a
m
e
s
=
A
p
p
e
n
d
[
o
n
e
W
o
r
d
N
a
m
e
s
,
#
]
]
&
/
@
c
h
a
r
a
c
t
e
r
L
i
s
t
;
d
u
p
l
i
c
a
t
e
C
h
a
r
a
c
t
e
r
s
=
D
e
l
e
t
e
C
a
s
e
s
[
F
l
a
t
t
e
n
[
T
a
b
l
e
[
W
h
i
c
h
[
S
t
r
i
n
g
C
o
u
n
t
[
i
,
j
]
!
=
0
&
&
i
!
=
j
,
{
i
,
j
}
]
,
{
i
,
t
w
o
W
o
r
d
N
a
m
e
s
}
,
{
j
,
o
n
e
W
o
r
d
N
a
m
e
s
}
]
,
1
]
,
N
u
l
l
]
;
a
l
l
C
h
a
r
a
c
t
e
r
s
=
C
o
m
p
l
e
m
e
n
t
[
c
h
a
r
a
c
t
e
r
L
i
s
t
,
F
l
a
t
t
e
n
[
d
u
p
l
i
c
a
t
e
C
h
a
r
a
c
t
e
r
s
]
]
;
T
a
b
l
e
[
a
l
l
C
h
a
r
a
c
t
e
r
s
=
A
p
p
e
n
d
[
a
l
l
C
h
a
r
a
c
t
e
r
s
,
i
[
[
F
l
a
t
t
e
n
[
P
o
s
i
t
i
o
n
[
S
t
r
i
n
g
L
e
n
g
t
h
[
i
]
,
M
i
n
[
S
t
r
i
n
g
L
e
n
g
t
h
[
i
]
]
]
]
]
]
]
,
{
i
,
d
u
p
l
i
c
a
t
e
C
h
a
r
a
c
t
e
r
s
}
]
;
a
l
l
C
h
a
r
a
c
t
e
r
s
=
S
o
r
t
[
F
l
a
t
t
e
n
[
a
l
l
C
h
a
r
a
c
t
e
r
s
/
/
D
e
l
e
t
e
D
u
p
l
i
c
a
t
e
s
]
]
]
;
Creating the Final List
In my final character list, I decided not to include words that were only prepositions, determiners, interjections, and pronouns, all of which would most likely not be character names. In case some character names had different capitalization styles, I also removed duplicates after changing the entire list to lower case. I reformatted the list after that to make sure that the capitalization style was correct again.
A function that cleans the final character list:
I
n
[
]
:
=
f
i
n
a
l
C
h
a
r
a
c
t
e
r
L
i
s
t
[
a
l
l
C
h
a
r
a
c
t
e
r
s
_
]
:
=
M
o
d
u
l
e
[
{
f
i
n
a
l
C
h
a
r
a
c
t
e
r
s
=
{
}
}
,
T
a
b
l
e
[
I
f
[
P
a
r
t
O
f
S
p
e
e
c
h
[
i
]
=
!
=
{
E
n
t
i
t
y
[
"
G
r
a
m
m
a
t
i
c
a
l
U
n
i
t
"
,
"
P
r
e
p
o
s
i
t
i
o
n
"
]
}
&
&
P
a
r
t
O
f
S
p
e
e
c
h
[
i
]
=
!
=
{
E
n
t
i
t
y
[
"
G
r
a
m
m
a
t
i
c
a
l
U
n
i
t
"
,
"
D
e
t
e
r
m
i
n
e
r
"
]
}
&
&
P
a
r
t
O
f
S
p
e
e
c
h
[
i
]
=
!
=
{
E
n
t
i
t
y
[
"
G
r
a
m
m
a
t
i
c
a
l
U
n
i
t
"
,
"
I
n
t
e
r
j
e
c
t
i
o
n
"
]
}
&
&
P
a
r
t
O
f
S
p
e
e
c
h
[
i
]
=
!
=
{
E
n
t
i
t
y
[
"
G
r
a
m
m
a
t
i
c
a
l
U
n
i
t
"
,
"
P
r
o
n
o
u
n
"
]
}
,
f
i
n
a
l
C
h
a
r
a
c
t
e
r
s
=
A
p
p
e
n
d
[
f
i
n
a
l
C
h
a
r
a
c
t
e
r
s
,
i
]
]
,
{
i
,
a
l
l
C
h
a
r
a
c
t
e
r
s
}
]
;
f
i
n
a
l
C
h
a
r
a
c
t
e
r
s
=
C
a
p
i
t
a
l
i
z
e
[
D
e
l
e
t
e
D
u
p
l
i
c
a
t
e
s
[
T
o
L
o
w
e
r
C
a
s
e
[
f
i
n
a
l
C
h
a
r
a
c
t
e
r
s
]
]
,
"
A
l
l
W
o
r
d
s
"
]
;
I
f
[
S
t
r
i
n
g
C
o
n
t
a
i
n
s
Q
[
#
,
"
M
c
"
]
,
S
t
r
i
n
g
R
e
p
l
a
c
e
[
#
,
"
M
c
"
~
~
x
_
-
>
"
M
c
"
~
~
T
o
U
p
p
e
r
C
a
s
e
[
x
]
]
,
#
]
&
/
@
f
i
n
a
l
C
h
a
r
a
c
t
e
r
s
]
;
Testing the function on
The Secret Garden
:
I
n
[
]
:
=
f
i
n
a
l
C
h
a
r
a
c
t
e
r
L
i
s
t
A
l
l
C
h
a
r
a
c
t
e
r
s
O
u
t
[
]
=
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
B
a
s
i
l
,
B
e
n
,
C
o
l
i
n
,
C
o
l
o
n
e
l
M
c
G
r
e
w
,
D
i
c
k
o
n
,
D
r
.
C
r
a
v
e
n
,
M
a
r
t
h
a
,
M
a
r
y
,
M
a
s
t
e
r
C
r
a
v
e
n
,
M
e
m
S
a
h
i
b
,
M
o
t
h
e
r
,
M
r
.
C
r
a
v
e
n
,
M
r
.
P
i
t
c
h
e
r
,
M
r
s
.
C
r
a
w
f
o
r
d
,
M
r
s
.
M
e
d
l
o
c
k
,
M
r
s
.
S
o
w
e
r
b
y
,
N
u
t
,
R
a
j
a
h
,
R
o
a
c
h
,
R
o
b
i
n
,
S
u
s
a
n
S
o
w
e
r
b
y
,
W
e
a
t
h
e
r
s
t
a
f
f
}
Creating a Word Cloud of the Character Names
I used a word cloud to display the character names. Since character names that appear a greater number of times in the text are larger in the word cloud, it provides a visual of the importance of different characters.
First-person narratives present a challenge because the narrator's name may not appear. For instance, if Mary was the narrator of
The Secret Garden
, it is unlikely that "said Mary" or "Mary thought" would appear in the text. As a temporary work-around for this, I decided to create a function that tests if the work was written in the first person. If it is, a caption under the word cloud will let the user know that the narrator's name may not appear in the cloud.
Testing If the Work Was Written in the First Person
My function first deletes the dialogue in the work of fiction. This way, the program will not have to sift through dialogue that may contain the word "I." Next, the function counts the number of times that the word "I" appears in the work (not including the dialogue) and divides this number by the total word count to come up with a percentage. Through testing, I found that if the percentage is over 1.5%, the work of fiction tends to be in the first person.
A function that tests if the work of fiction as written in the first person:
I
n
[
]
:
=
f
i
r
s
t
P
e
r
s
o
n
T
e
s
t
[
b
o
o
k
_
]
:
=
M
o
d
u
l
e
[
{
n
o
D
i
a
l
o
g
u
e
=
S
t
r
i
n
g
D
e
l
e
t
e
[
b
o
o
k
,
S
h
o
r
t
e
s
t
[
"
\
"
"
~
~
_
_
_
~
~
"
\
"
"
]
]
}
,
I
f
[
S
t
r
i
n
g
C
o
u
n
t
[
n
o
D
i
a
l
o
g
u
e
,
"
"
~
~
"
I
"
~
~
"
"
]
/
W
o
r
d
C
o
u
n
t
[
n
o
D
i
a
l
o
g
u
e
]
>
=
0
.
0
1
5
,
"
T
h
i
s
w
o
r
k
o
f
f
i
c
t
i
o
n
w
a
s
l
i
k
e
l
y
w
r
i
t
t
e
n
i
n
t
h
e
f
i
r
s
t
p
e
r
s
o
n
,
s
o
t
h
e
n
a
r
r
a
t
o
r
'
s
n
a
m
e
m
a
y
n
o
t
a
p
p
e
a
r
i
n
t
h
e
w
o
r
d
c
l
o
u
d
a
n
d
g
r
a
p
h
.
"
,
"
"
]
]
;
Testing it on a book written in the first person (
The Time Machine
) and a book not written in the first person (
The Secret Garden
):
I
n
[
]
:
=
f
i
r
s
t
P
e
r
s
o
n
T
e
s
t
[
g
e
t
B
o
o
k
[
"
T
h
e
T
i
m
e
M
a
c
h
i
n
e
"
]
]
O
u
t
[
]
=
T
h
i
s
w
o
r
k
o
f
f
i
c
t
i
o
n
w
a
s
l
i
k
e
l
y
w
r
i
t
t
e
n
i
n
t
h
e
f
i
r
s
t
p
e
r
s
o
n
,
s
o
t
h
e
n
a
r
r
a
t
o
r
'
s
n
a
m
e
m
a
y
n
o
t
a
p
p
e
a
r
i
n
t
h
e
w
o
r
d
c
l
o
u
d
a
n
d
g
r
a
p
h
.
I
n
[
]
:
=
f
i
r
s
t
P
e
r
s
o
n
T
e
s
t
[
g
e
t
B
o
o
k
[
"
T
h
e
S
e
c
r
e
t
G
a
r
d
e
n
"
]
]
O
u
t
[
]
=
Displaying the Word Cloud
I wrote a function that uses the final list of characters to create a word cloud based on how often each character name appeared in the text. The word cloud also includes a caption below it, which will display the first-person message above if the work was written in the first person.
A function that creates a word cloud of the character names:
I
n
[
]
:
=
c
r
e
a
t
e
W
o
r
d
C
l
o
u
d
[
b
o
o
k
_
,
c
h
a
r
a
c
t
e
r
L
i
s
t
_
,
s
t
r
_
,
c
a
p
t
i
o
n
_
]
:
=
L
a
b
e
l
e
d
W
o
r
d
C
l
o
u
d
W
o
r
d
F
r
e
q
u
e
n
c
y
[
b
o
o
k
,
c
h
a
r
a
c
t
e
r
L
i
s
t
,
I
g
n
o
r
e
C
a
s
e
-
>
T
r
u
e
]
,
S
e
q
u
e
n
c
e
I
c
o
n
,
S
t
y
l
e
[
c
a
p
t
i
o
n
,
"
S
m
a
l
l
T
e
x
t
"
]
;
Testing it on
The Secret Garden
:
I
n
[
]
:
=
c
r
e
a
t
e
W
o
r
d
C
l
o
u
d
g
e
t
B
o
o
k
[
"
T
h
e
S
e
c
r
e
t
G
a
r
d
e
n
"
]
,
F
i
n
a
l
C
h
a
r
a
c
t
e
r
L
i
s
t
,
"
T
h
e
S
e
c
r
e
t
G
a
r
d
e
n
"
,
f
i
r
s
t
P
e
r
s
o
n
T
e
s
t
[
g
e
t
B
o
o
k
[
"
T
h
e
S
e
c
r
e
t
G
a
r
d
e
n
"
]
]
O
u
t
[
]
=
Creating a Graph of the Character Relationships
My graph provides a visualization of how strong/weak character relationships are by correlating edge thickness with how often two character names appear together. Counting the number of times two character names appear in the same chapter would be a good way to measure this. However, different texts have different formatting for chapters, so it would be more complex to separate different books by chapter. I instead decided to count how many times two character names appear in the same 30-paragraph section.
A function that separates the plaintext into 30-paragraph sections:
I
n
[
]
:
=
g
e
t
N
e
w
T
e
x
t
[
b
o
o
k
_
]
:
=
P
a
r
t
i
t
i
o
n
[
T
e
x
t
C
a
s
e
s
[
b
o
o
k
,
"
P
a
r
a
g
r
a
p
h
"
]
,
U
p
T
o
[
3
0
]
]
Counting How Many Times Two Character Names Appear in the Same Section
Creating a list of character names for each section:
I
n
[
]
:
=
c
h
a
r
a
c
t
e
r
s
I
n
S
e
c
t
i
o
n
s
=
T
a
b
l
e
I
f
[
S
t
r
i
n
g
C
o
u
n
t
[
S
t
r
i
n
g
J
o
i
n
[
i
]
,
j
]
>
0
,
j
,
#
#
&
[
]
]
,
{
i
,
g
e
t
N
e
w
T
e
x
t
[
g
e
t
B
o
o
k
[
"
T
h
e
S
e
c
r
e
t
G
a
r
d
e
n
"
]
]
}
,
j
,
F
i
n
a
l
C
h
a
r
a
c
t
e
r
L
i
s
t
/
/
S
h
o
r
t
O
u
t
[
]
/
/
S
h
o
r
t
=
{
{
M
a
r
y
,
M
e
m
S
a
h
i
b
}
,
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
B
a
s
i
l
,
C
o
l
o
n
e
l
M
c
G
r
e
w
,
M
a
r
y
,
M
r
s
.
C
r
a
w
f
o
r
d
,
M
r
s
.
M
e
d
l
o
c
k
}
,
7
1
,
{
B
e
n
,
C
o
l
i
n
,
M
r
s
.
M
e
d
l
o
c
k
}
}
Placing the lists of character names that include more than one name in a new list:
I
n
[
]
:
=
m
u
l
t
i
p
l
e
C
h
a
r
a
c
t
e
r
s
=
{
}
;
D
e
l
e
t
e
C
a
s
e
s
[
I
f
[
L
e
n
g
t
h
[
#
]
>
1
,
m
u
l
t
i
p
l
e
C
h
a
r
a
c
t
e
r
s
=
A
p
p
e
n
d
[
m
u
l
t
i
p
l
e
C
h
a
r
a
c
t
e
r
s
,
#
]
]
&
/
@
c
h
a
r
a
c
t
e
r
s
I
n
S
e
c
t
i
o
n
s
,
N
u
l
l
]
;
m
u
l
t
i
p
l
e
C
h
a
r
a
c
t
e
r
s
/
/
S
h
o
r
t
O
u
t
[
]
/
/
S
h
o
r
t
=
{
{
{
M
a
r
y
,
M
e
m
S
a
h
i
b
}
,
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
B
a
s
i
l
,
C
o
l
o
n
e
l
M
c
G
r
e
w
,
M
a
r
y
,
M
r
s
.
C
r
a
w
f
o
r
d
,
M
r
s
.
M
e
d
l
o
c
k
}
,
7
1
,
{
B
e
n
,
C
o
l
i
n
,
M
r
s
.
M
e
d
l
o
c
k
}
}
}
Count the number of times two specific character names appear in a sublist:
I
n
[
]
:
=
C
o
u
n
t
s
[
F
l
a
t
t
e
n
[
S
u
b
s
e
t
s
[
#
,
{
2
}
]
&
/
@
m
u
l
t
i
p
l
e
C
h
a
r
a
c
t
e
r
s
,
1
]
]
/
/
S
h
o
r
t
O
u
t
[
]
/
/
S
h
o
r
t
=
{
M
a
r
y
,
M
e
m
S
a
h
i
b
}
2
,
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
B
a
s
i
l
}
1
,
1
3
7
,
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
W
e
a
t
h
e
r
s
t
a
f
f
}
1
,
{
M
r
.
C
r
a
v
e
n
,
R
a
j
a
h
}
1
I created a function to do the three steps above.
A function that counts how many times two characters appear in the same section:
I
n
[
]
:
=
c
h
a
r
a
c
t
e
r
C
o
u
n
t
s
[
n
e
w
T
e
x
t
_
,
f
i
n
a
l
C
h
a
r
a
c
t
e
r
s
_
]
:
=
M
o
d
u
l
e
[
{
m
u
l
t
i
p
l
e
C
h
a
r
a
c
t
e
r
s
=
{
}
,
c
h
a
r
a
c
t
e
r
s
I
n
S
e
c
t
i
o
n
s
}
,
c
h
a
r
a
c
t
e
r
s
I
n
S
e
c
t
i
o
n
s
=
T
a
b
l
e
[
I
f
[
S
t
r
i
n
g
C
o
u
n
t
[
S
t
r
i
n
g
J
o
i
n
[
i
]
,
j
]
>
0
,
j
,
#
#
&
[
]
]
,
{
i
,
n
e
w
T
e
x
t
}
,
{
j
,
f
i
n
a
l
C
h
a
r
a
c
t
e
r
s
}
]
;
D
e
l
e
t
e
C
a
s
e
s
[
I
f
[
L
e
n
g
t
h
[
#
]
>
1
,
m
u
l
t
i
p
l
e
C
h
a
r
a
c
t
e
r
s
=
A
p
p
e
n
d
[
m
u
l
t
i
p
l
e
C
h
a
r
a
c
t
e
r
s
,
#
]
]
&
/
@
c
h
a
r
a
c
t
e
r
s
I
n
S
e
c
t
i
o
n
s
,
N
u
l
l
]
;
C
o
u
n
t
s
[
F
l
a
t
t
e
n
[
S
u
b
s
e
t
s
[
#
,
{
2
}
]
&
/
@
m
u
l
t
i
p
l
e
C
h
a
r
a
c
t
e
r
s
,
1
]
]
]
Testing it on
The Secret Garden:
c
h
a
r
a
c
t
e
r
C
o
u
n
t
s
g
e
t
N
e
w
T
e
x
t
[
g
e
t
B
o
o
k
[
"
T
h
e
S
e
c
r
e
t
G
a
r
d
e
n
"
]
]
,
F
i
n
a
l
C
h
a
r
a
c
t
e
r
L
i
s
t
/
/
S
h
o
r
t
O
u
t
[
]
/
/
S
h
o
r
t
=
{
M
a
r
y
,
M
e
m
S
a
h
i
b
}
2
,
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
B
a
s
i
l
}
1
,
1
3
7
,
{
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
,
W
e
a
t
h
e
r
s
t
a
f
f
}
1
,
{
M
r
.
C
r
a
v
e
n
,
R
a
j
a
h
}
1
Converting the Counts to Edge Weights for the Graph
I wanted to scale the edge weights of my graph so that character relationships with greater counts would have larger weights. I decided to standardize the scale by setting the maximum value to 7 and adjusting the other values accordingly.
A function that converts the counts to a format that will set the edge weights of the graph:
I
n
[
]
:
=
s
e
t
W
e
i
g
h
t
s
[
c
o
u
n
t
s
_
]
:
=
M
o
d
u
l
e
[
{
v
a
l
u
e
=
M
a
x
[
V
a
l
u
e
s
[
c
o
u
n
t
s
]
]
/
7
,
w
e
i
g
h
t
s
}
,
w
e
i
g
h
t
s
=
V
a
l
u
e
s
[
c
o
u
n
t
s
]
/
v
a
l
u
e
;
A
s
s
o
c
i
a
t
i
o
n
T
h
r
e
a
d
[
T
a
b
l
e
[
i
[
[
1
]
]
i
[
[
2
]
]
,
{
i
,
K
e
y
s
[
c
o
u
n
t
s
]
}
]
,
w
e
i
g
h
t
s
]
]
;
Testing it on
The Secret Garden
:
I
n
[
]
:
=
s
e
t
W
e
i
g
h
t
s
C
h
a
r
a
c
t
e
r
C
o
u
n
t
s
/
/
S
h
o
r
t
O
u
t
[
]
/
/
S
h
o
r
t
=
M
a
r
y
M
e
m
S
a
h
i
b
1
4
5
3
,
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
B
a
s
i
l
7
5
3
,
1
3
7
,
A
r
c
h
i
b
a
l
d
C
r
a
v
e
n
W
e
a
t
h
e
r
s
t
a
f
f
7
5
3
,
M
r
.
C
r
a
v
e
n
R
a
j
a
h