WolframAlpha.com
WolframCloud.com
All Sites & Public Resources...
Products & Services
Wolfram|One
Mathematica
Wolfram|Alpha Notebook Edition
Finance Platform
System Modeler
Wolfram Player
Wolfram Engine
WolframScript
Enterprise Private Cloud
Application Server
Enterprise Mathematica
Wolfram|Alpha Appliance
Enterprise Solutions
Corporate Consulting
Technical Consulting
Wolfram|Alpha Business Solutions
Resource System
Data Repository
Neural Net Repository
Function Repository
Wolfram|Alpha
Wolfram|Alpha Pro
Problem Generator
API
Data Drop
Products for Education
Mobile Apps
Wolfram Player
Wolfram Cloud App
Wolfram|Alpha for Mobile
Wolfram|Alpha-Powered Apps
Services
Paid Project Support
Wolfram U
Summer Programs
All Products & Services »
Technologies
Wolfram Language
Revolutionary knowledge-based programming language.
Wolfram Cloud
Central infrastructure for Wolfram's cloud products & services.
Wolfram Science
Technology-enabling science of the computational universe.
Wolfram Notebooks
The preeminent environment for any technical workflows.
Wolfram Engine
Software engine implementing the Wolfram Language.
Wolfram Natural Language Understanding System
Knowledge-based broadly deployed natural language.
Wolfram Data Framework
Semantic framework for real-world data.
Wolfram Universal Deployment System
Instant deployment across cloud, desktop, mobile, and more.
Wolfram Knowledgebase
Curated computable knowledge powering Wolfram|Alpha.
All Technologies »
Solutions
Engineering, R&D
Aerospace & Defense
Chemical Engineering
Control Systems
Electrical Engineering
Image Processing
Industrial Engineering
Mechanical Engineering
Operations Research
More...
Finance, Statistics & Business Analysis
Actuarial Sciences
Bioinformatics
Data Science
Econometrics
Financial Risk Management
Statistics
More...
Education
All Solutions for Education
Trends
Machine Learning
Multiparadigm Data Science
Internet of Things
High-Performance Computing
Hackathons
Software & Web
Software Development
Authoring & Publishing
Interface Development
Web Development
Sciences
Astronomy
Biology
Chemistry
More...
All Solutions »
Learning & Support
Learning
Wolfram Language Documentation
Fast Introduction for Programmers
Wolfram U
Videos & Screencasts
Wolfram Language Introductory Book
Webinars & Training
Summer Programs
Books
Need Help?
Support FAQ
Wolfram Community
Contact Support
Premium Support
Paid Project Support
Technical Consulting
All Learning & Support »
Company
About
Company Background
Wolfram Blog
Events
Contact Us
Work with Us
Careers at Wolfram
Internships
Other Wolfram Language Jobs
Initiatives
Wolfram Foundation
MathWorld
Computer-Based Math
A New Kind of Science
Wolfram Technology for Hackathons
Student Ambassador Program
Wolfram for Startups
Demonstrations Project
Wolfram Innovator Awards
Wolfram + Raspberry Pi
Summer Programs
More...
All Company »
Search
WOLFRAM COMMUNITY
Connect with users of Wolfram technologies to learn, solve problems and share ideas
Join
Sign In
Dashboard
Groups
People
Message Boards
Answer
(
Unmark
)
Mark as an Answer
GROUPS:
Staff Picks
Data Science
Wolfram Language
Natural Language Processing
Wolfram High School Summer Camp
Computational Humanities
6
Anika Karpurapu
[WSC21] Structuring and exploring TEI marked-up texts
Anika Karpurapu
Posted
1 year ago
4115 Views
|
5 Replies
|
6 Total Likes
Follow this post
|
Structuring and exploring TEI marked-up texts
by
Anika Karpurapu
Nowadays, play scripts are all over the internet, but they are often lengthy and difficult to navigate. These texts are often unstructured, so you may have a hard time quickly finding what you're looking for. Some play scripts are marked up in formats such as TEI and XML, making it slightly easier to traverse, but these texts are overwhelmed by tags and numerous descriptors. It takes a while to become accustomed to these formats and understand how everything is formatted. The goal of my project is to implement additional forms of text segmentation, specifically regarding plays. Given a marked up play, I am able to process it into a structured, hierarchical format, so that it may be easily interpreted by anyone. With this newly formatted text, I am able to conduct numerous analyses on different plays to find out information such as who the most prevalent characters are or what the main themes of each act are.
Introduction
Text Encoding Initiative, or TEI, is a variation of the well-known XML format. It currently acts as the standard for maintaining texts digitally, and is organized in more of a semantic matter. In TEI, everything from acts to individual words have their own tag, so everything is accounted for.
The tags mentioned in this project are as follows:
Methodology
Data Importing
The text used in this project is TEI-formatted and was downloaded from
The Folger Shakespeare Library
.
Importing the play
Romeo and Juliet
from the cloud.
I
n
[
]
:
=
p
l
a
y
T
e
x
t
=
C
l
o
u
d
I
m
p
o
r
t
[
C
l
o
u
d
O
b
j
e
c
t
[
h
t
t
p
s
:
/
/
w
w
w
.
w
o
l
f
r
a
m
c
l
o
u
d
.
c
o
m
/
o
b
j
/
a
n
i
k
a
.
k
a
r
p
u
r
a
p
u
/
p
l
a
y
T
e
x
t
]
]
O
u
t
[
]
/
/
S
h
o
r
t
=
X
M
L
O
b
j
e
c
t
[
D
o
c
u
m
e
n
t
]
[
{
X
M
L
O
b
j
e
c
t
[
P
r
o
c
e
s
s
i
n
g
I
n
s
t
r
u
c
t
i
o
n
]
[
x
m
l
-
m
o
d
e
l
,
h
r
e
f
=
"
h
t
t
p
s
:
/
/
r
a
w
.
g
i
t
h
u
b
u
s
e
r
c
o
n
t
e
n
t
.
c
o
m
/
T
E
I
C
/
T
E
I
-
S
i
m
p
l
e
/
m
a
s
t
e
r
/
t
e
i
s
i
m
p
l
e
.
r
n
g
"
t
y
p
e
=
"
a
p
p
l
i
c
a
t
i
o
n
/
x
m
l
"
s
c
h
e
m
a
t
y
p
e
n
s
=
"
h
t
t
p
:
/
/
r
e
l
a
x
n
g
.
o
r
g
/
n
s
/
s
t
r
u
c
t
u
r
e
/
1
.
0
"
]
}
,
X
M
L
E
l
e
m
e
n
t
[
T
E
I
,
{
{
h
t
t
p
:
/
/
w
w
…
0
0
0
/
x
m
l
n
s
/
,
…
}
…
}
,
{
1
,
X
M
L
E
l
e
m
e
n
t
[
t
e
x
t
,
{
}
,
1
]
}
]
,
{
}
]
Text Processing Rules
By just looking at Romeo and Juliet in TEI format, it's difficult to figure out where acts begin and what lines are assigned to each character. In order to convert the play into a more legible and structured format, I created rules and patterns to process the data.
Word Processing
The wordProcessingRule deals with the most fundamental structure in our text — words. You can think of it as the base case to our recursive function. In a TEI-formatted document, a single sentence is broken down into words, punctuation characters, whitespace, and line beginnings. Each of these properties has it's own line, so you must read down several lines in order to fully read and understand a sentence. The following wordProcessingRule takes all of these attributes into account and allows us to label and sort through all of them in order to parse the text more easily later on.
Specifically, wordProcessingRule assigns the data into seven different categories, with the main categories being:
◼
Word: Either parses the <c> tag into whitespace or keeps track of the words and punctuation in a sentence.
◼
Type: Stores and renames the tag type into a more understandable label.
◼
ID: Each tag has a unique number/ID associated with it.
◼
Number: Each section in the play is broken down such that each corresponding group of tags has the same number.
◼
Lemma: Stores the root word(s) of a given word in the play.
If a certain data point doesn't fall into some of the categories, it is given a label of "Missing" for that specific property.
I
n
[
]
:
=
w
o
r
d
P
r
o
c
e
s
s
i
n
g
R
u
l
e
=
X
M
L
E
l
e
m
e
n
t
[
t
y
p
e
:
(
"
w
"
|
"
p
c
"
|
"
c
"
|
"
l
b
"
)
,
a
t
t
r
i
b
_
L
i
s
t
,
w
o
r
d
_
]
:
>
M
o
d
u
l
e
[
{
a
t
t
a
s
s
o
c
=
A
s
s
o
c
i
a
t
i
o
n
@
@
a
t
t
r
i
b
,
w
=
R
e
p
l
a
c
e
[
w
o
r
d
,
{
w
w
_
}
:
>
w
w
]
}
,
<
|
"
W
o
r
d
"
-
>
W
h
i
c
h
[
M
a
t
c
h
Q
[
t
y
p
e
,
"
c
"
]
,
"
"
,
T
r
u
e
,
w
]
,
"
T
y
p
e
"
-
>
R
e
p
l
a
c
e
[
t
y
p
e
,
{
"
w
"
-
>
"
W
o
r
d
"
,
"
p
c
"
-
>
"
P
u
n
c
t
u
a
t
i
o
n
"
,
"
c
"
-
>
"
W
h
i
t
e
s
p
a
c
e
"
,
"
l
b
"
-
>
"
L
i
n
e
B
e
g
i
n
n
i
n
g
"
}
]
,
"
I
D
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
K
e
y
[
{
"
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
"
,
"
i
d
"
}
]
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
N
u
m
b
e
r
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
n
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
L
e
m
m
a
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
l
e
m
m
a
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
A
n
a
l
y
s
i
s
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
a
n
a
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
R
e
n
d
i
t
i
o
n
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
r
e
n
d
i
t
i
o
n
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
|
>
]
;
The following is an example of a sentence that we would apply the wordProcessingRule to. The sentence “Gregory, on my word we’ll not carry coals.” is broken down into several different lines. Each line represents one line beginning, character, whitespace, or punctuation mark from the sentence.
I
n
[
]
:
=
e
x
a
m
p
l
e
S
e
n
t
e
n
c
e
=
C
a
s
e
s
[
p
l
a
y
T
e
x
t
,
X
M
L
E
l
e
m
e
n
t
[
"
p
"
,
{
{
"
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
"
,
"
i
d
"
}
"
p
-
0
0
1
5
"
}
,
_
]
,
I
n
f
i
n
i
t
y
]
O
u
t
[
]
=
{
X
M
L
E
l
e
m
e
n
t
[
p
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
p
-
0
0
1
5
}
,
{
X
M
L
E
l
e
m
e
n
t
[
l
b
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
t
l
n
-
0
0
1
5
,
n
1
.
1
.
1
}
,
{
}
]
,
X
M
L
E
l
e
m
e
n
t
[
w
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
s
-
r
o
m
-
0
0
0
2
6
2
0
,
n
1
.
1
.
1
,
l
e
m
m
a
G
r
e
g
o
r
y
,
a
n
a
#
n
1
-
n
n
}
,
{
G
r
e
g
o
r
y
}
]
,
X
M
L
E
l
e
m
e
n
t
[
p
c
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
s
-
r
o
m
-
0
0
0
2
6
3
0
,
n
1
.
1
.
1
}
,
{
,
}
]
,
X
M
L
E
l
e
m
e
n
t
[
c
,
{
}
,
{
}
]
,
X
M
L
E
l
e
m
e
n
t
[
w
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
s
-
r
o
m
-
0
0
0
2
6
5
0
,
n
1
.
1
.
1
,
l
e
m
m
a
o
n
,
a
n
a
#
a
c
p
-
p
}
,
{
o
n
}
]
,
X
M
L
E
l
e
m
e
n
t
[
c
,
{
}
,
{
}
]
,
X
M
L
E
l
e
m
e
n
t
[
w
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
s
-
r
o
m
-
0
0
0
2
6
7
0
,
n
1
.
1
.
1
,
l
e
m
m
a
m
y
,
a
n
a
#
p
o
}
,
{
m
y
}
]
,
X
M
L
E
l
e
m
e
n
t
[
c
,
{
}
,
{
}
]
,
X
M
L
E
l
e
m
e
n
t
[
w
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
s
-
r
o
m
-
0
0
0
2
6
9
0
,
n
1
.
1
.
1
,
l
e
m
m
a
w
o
r
d
,
a
n
a
#
n
1
}
,
{
w
o
r
d
}
]
,
X
M
L
E
l
e
m
e
n
t
[
c
,
{
}
,
{
}
]
,
X
M
L
E
l
e
m
e
n
t
[
w
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
s
-
r
o
m
-
0
0
0
2
7
1
0
,
n
1
.
1
.
1
,
l
e
m
m
a
w
e
|
w
i
l
l
,
a
n
a
#
p
n
s
|
v
m
b
}
,
{
w
e
’
l
l
}
]
,
X
M
L
E
l
e
m
e
n
t
[
c
,
{
}
,
{
}
]
,
X
M
L
E
l
e
m
e
n
t
[
w
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
s
-
r
o
m
-
0
0
0
2
7
3
0
,
n
1
.
1
.
1
,
l
e
m
m
a
n
o
t
,
a
n
a
#
x
x
}
,
{
n
o
t
}
]
,
X
M
L
E
l
e
m
e
n
t
[
c
,
{
}
,
{
}
]
,
X
M
L
E
l
e
m
e
n
t
[
w
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
s
-
r
o
m
-
0
0
0
2
7
5
0
,
n
1
.
1
.
1
,
l
e
m
m
a
c
a
r
r
y
,
a
n
a
#
v
v
i
}
,
{
c
a
r
r
y
}
]
,
X
M
L
E
l
e
m
e
n
t
[
c
,
{
}
,
{
}
]
,
X
M
L
E
l
e
m
e
n
t
[
w
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
s
-
r
o
m
-
0
0
0
2
7
7
0
,
n
1
.
1
.
1
,
l
e
m
m
a
c
o
a
l
,
a
n
a
#
n
2
}
,
{
c
o
a
l
s
}
]
,
X
M
L
E
l
e
m
e
n
t
[
p
c
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
f
s
-
r
o
m
-
0
0
0
2
7
8
0
,
n
1
.
1
.
1
}
,
{
.
}
]
}
]
}
Applying the wordProcessingRule to previous sentence.
I
n
[
]
:
=
e
x
a
m
p
l
e
S
e
n
t
e
n
c
e
/
.
w
o
r
d
P
r
o
c
e
s
s
i
n
g
R
u
l
e
O
u
t
[
]
=
{
X
M
L
E
l
e
m
e
n
t
[
p
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
p
-
0
0
1
5
}
,
{
W
o
r
d
{
}
,
T
y
p
e
L
i
n
e
B
e
g
i
n
n
i
n
g
,
I
D
f
t
l
n
-
0
0
1
5
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
A
n
a
l
y
s
i
s
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
G
r
e
g
o
r
y
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
6
2
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
G
r
e
g
o
r
y
,
A
n
a
l
y
s
i
s
#
n
1
-
n
n
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
,
,
T
y
p
e
P
u
n
c
t
u
a
t
i
o
n
,
I
D
f
s
-
r
o
m
-
0
0
0
2
6
3
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
A
n
a
l
y
s
i
s
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
I
D
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
N
u
m
b
e
r
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
L
e
m
m
a
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
A
n
a
l
y
s
i
s
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
o
n
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
6
5
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
o
n
,
A
n
a
l
y
s
i
s
#
a
c
p
-
p
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
I
D
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
N
u
m
b
e
r
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
L
e
m
m
a
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
A
n
a
l
y
s
i
s
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
m
y
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
6
7
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
m
y
,
A
n
a
l
y
s
i
s
#
p
o
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
I
D
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
N
u
m
b
e
r
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
L
e
m
m
a
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
A
n
a
l
y
s
i
s
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
w
o
r
d
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
6
9
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
w
o
r
d
,
A
n
a
l
y
s
i
s
#
n
1
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
I
D
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
N
u
m
b
e
r
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
L
e
m
m
a
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
A
n
a
l
y
s
i
s
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
w
e
’
l
l
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
7
1
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
w
e
|
w
i
l
l
,
A
n
a
l
y
s
i
s
#
p
n
s
|
v
m
b
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
I
D
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
N
u
m
b
e
r
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
L
e
m
m
a
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
A
n
a
l
y
s
i
s
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
n
o
t
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
7
3
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
n
o
t
,
A
n
a
l
y
s
i
s
#
x
x
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
I
D
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
N
u
m
b
e
r
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
L
e
m
m
a
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
A
n
a
l
y
s
i
s
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
c
a
r
r
y
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
7
5
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
c
a
r
r
y
,
A
n
a
l
y
s
i
s
#
v
v
i
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
I
D
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
N
u
m
b
e
r
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
L
e
m
m
a
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
A
n
a
l
y
s
i
s
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
c
o
a
l
s
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
7
7
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
c
o
a
l
,
A
n
a
l
y
s
i
s
#
n
2
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
W
o
r
d
.
,
T
y
p
e
P
u
n
c
t
u
a
t
i
o
n
,
I
D
f
s
-
r
o
m
-
0
0
0
2
7
8
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
A
n
a
l
y
s
i
s
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
,
R
e
n
d
i
t
i
o
n
M
i
s
s
i
n
g
[
N
o
t
A
v
a
i
l
a
b
l
e
]
}
]
}
As we can see, the text looks a lot cleaner, which each attribute being properly labeled. If we remove all of the “Missing” characteristics, we can get a clearer view of the attributes and have a better idea of how the rule is working.
I
n
[
]
:
=
n
o
M
i
s
s
i
n
g
=
D
e
l
e
t
e
C
a
s
e
s
[
e
x
a
m
p
l
e
S
e
n
t
e
n
c
e
/
.
w
o
r
d
P
r
o
c
e
s
s
i
n
g
R
u
l
e
,
M
i
s
s
i
n
g
[
_
]
,
I
n
f
i
n
i
t
y
]
O
u
t
[
]
=
{
X
M
L
E
l
e
m
e
n
t
[
p
,
{
{
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
,
i
d
}
p
-
0
0
1
5
}
,
{
W
o
r
d
{
}
,
T
y
p
e
L
i
n
e
B
e
g
i
n
n
i
n
g
,
I
D
f
t
l
n
-
0
0
1
5
,
N
u
m
b
e
r
1
.
1
.
1
,
W
o
r
d
G
r
e
g
o
r
y
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
6
2
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
G
r
e
g
o
r
y
,
A
n
a
l
y
s
i
s
#
n
1
-
n
n
,
W
o
r
d
,
,
T
y
p
e
P
u
n
c
t
u
a
t
i
o
n
,
I
D
f
s
-
r
o
m
-
0
0
0
2
6
3
0
,
N
u
m
b
e
r
1
.
1
.
1
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
W
o
r
d
o
n
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
6
5
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
o
n
,
A
n
a
l
y
s
i
s
#
a
c
p
-
p
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
W
o
r
d
m
y
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
6
7
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
m
y
,
A
n
a
l
y
s
i
s
#
p
o
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
W
o
r
d
w
o
r
d
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
6
9
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
w
o
r
d
,
A
n
a
l
y
s
i
s
#
n
1
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
W
o
r
d
w
e
’
l
l
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
7
1
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
w
e
|
w
i
l
l
,
A
n
a
l
y
s
i
s
#
p
n
s
|
v
m
b
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
W
o
r
d
n
o
t
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
7
3
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
n
o
t
,
A
n
a
l
y
s
i
s
#
x
x
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
W
o
r
d
c
a
r
r
y
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
7
5
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
c
a
r
r
y
,
A
n
a
l
y
s
i
s
#
v
v
i
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
W
o
r
d
c
o
a
l
s
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
0
0
2
7
7
0
,
N
u
m
b
e
r
1
.
1
.
1
,
L
e
m
m
a
c
o
a
l
,
A
n
a
l
y
s
i
s
#
n
2
,
W
o
r
d
.
,
T
y
p
e
P
u
n
c
t
u
a
t
i
o
n
,
I
D
f
s
-
r
o
m
-
0
0
0
2
7
8
0
,
N
u
m
b
e
r
1
.
1
.
1
}
]
}
Although we would not remove any of the Missing values when working with the text in practice, this view of the sentence allows us to clearly see how each line is broken down.
Stage Directions
A play script consists of numerous stage directions, and each direction informs us about two main things—what characters are involved and a description of how the characters must proceed within a scene. In a TEI-formatted play, we have a <stage> tag to indicate that whatever information follows will be part of a stage direction.
One level beneath the <stage> tag, we will find our word processing tags (<w>, <pc>, <c>, <lb>). Thus, once we process our stage tag, we must go one level down and call the wordProcessingRule to fully process the given text. Whenever we need to go a couple of levels down and call a rule recursively, we place the referenced rule in the "Contents" attribute.
The stageDirectionRule processes any tag that begins with the tag <stage>.
I
n
[
]
:
=
s
t
a
g
e
D
i
r
e
c
t
i
o
n
R
u
l
e
=
X
M
L
E
l
e
m
e
n
t
[
t
y
p
e
:
"
s
t
a
g
e
"
,
a
t
t
r
i
b
_
L
i
s
t
,
o
u
t
T
y
p
e
_
]
:
>
M
o
d
u
l
e
[
{
a
t
t
a
s
s
o
c
=
A
s
s
o
c
i
a
t
i
o
n
@
@
a
t
t
r
i
b
}
,
<
|
"
I
D
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
i
d
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
N
u
m
b
e
r
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
n
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
T
y
p
e
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
t
y
p
e
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
C
h
a
r
a
c
t
e
r
s
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
w
h
o
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
,
S
t
r
i
n
g
S
p
l
i
t
]
,
"
C
o
n
t
e
n
t
s
"
-
>
(
o
u
t
T
y
p
e
/
/
.
w
o
r
d
P
r
o
c
e
s
s
i
n
g
R
u
l
e
)
|
>
]
;
Headings
Another tag we use is called the <head> tag. Essentially, this tag acts as a heading and oftentimes contains information about what scene or act is coming up.
The headRule searches for all of the <head> tags and begins processing from that level.
I
n
[
]
:
=
h
e
a
d
R
u
l
e
=
X
M
L
E
l
e
m
e
n
t
[
t
y
p
e
:
"
h
e
a
d
"
,
a
t
t
r
i
b
_
L
i
s
t
,
o
u
t
T
y
p
e
_
]
:
>
M
o
d
u
l
e
[
{
a
t
t
a
s
s
o
c
=
A
s
s
o
c
i
a
t
i
o
n
@
@
a
t
t
r
i
b
}
,
<
|
"
T
y
p
e
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
t
y
p
e
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
I
D
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
K
e
y
[
{
"
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
"
,
"
i
d
"
}
]
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
C
o
n
t
e
n
t
s
"
-
>
(
o
u
t
T
y
p
e
/
/
.
w
o
r
d
P
r
o
c
e
s
s
i
n
g
R
u
l
e
)
|
>
]
;
Lines
Next, we have the lineRule, which accounts for the <l> tag. This tag represents each line in a play. If a certain character has multiple lines to speak in a row, each line will be broken down into <l> tags, and each <l> is further broken down into the word processing tags.
The lineRule processes all lines that include the <l>, or line, tag.
I
n
[
]
:
=
l
i
n
e
R
u
l
e
=
X
M
L
E
l
e
m
e
n
t
[
t
y
p
e
:
"
l
"
,
a
t
t
r
i
b
_
L
i
s
t
,
o
u
t
T
y
p
e
_
]
:
>
M
o
d
u
l
e
[
{
a
t
t
a
s
s
o
c
=
A
s
s
o
c
i
a
t
i
o
n
@
@
a
t
t
r
i
b
}
,
<
|
"
T
y
p
e
"
-
>
"
L
i
n
e
"
,
"
I
D
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
K
e
y
[
{
"
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
"
,
"
i
d
"
}
]
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
N
u
m
b
e
r
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
n
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
C
o
n
t
e
n
t
s
"
-
>
(
o
u
t
T
y
p
e
/
/
.
w
o
r
d
P
r
o
c
e
s
s
i
n
g
R
u
l
e
)
|
>
]
;
Speakers
The <speaker> tag in a play keeps track of the character(s) who perform next. A stage direction or a sentence usually follows a speaker rule, so to account for both of these possibilities, the speakerRule leads into both the stage direction and word processing rules.
The speakerRule starts processing from the "speaker" rule, and works its way down from there.
I
n
[
]
:
=
s
p
e
a
k
e
r
R
u
l
e
=
X
M
L
E
l
e
m
e
n
t
[
t
y
p
e
:
"
s
p
e
a
k
e
r
"
,
a
t
t
r
i
b
_
L
i
s
t
,
o
u
t
T
y
p
e
_
]
:
>
M
o
d
u
l
e
[
{
a
t
t
a
s
s
o
c
=
A
s
s
o
c
i
a
t
i
o
n
@
@
a
t
t
r
i
b
}
,
<
|
"
T
y
p
e
"
-
>
"
S
p
e
a
k
e
r
"
,
"
I
D
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
K
e
y
[
{
"
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
"
,
"
i
d
"
}
]
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
C
o
n
t
e
n
t
s
"
-
>
(
o
u
t
T
y
p
e
/
/
.
{
s
t
a
g
e
D
i
r
e
c
t
i
o
n
R
u
l
e
,
w
o
r
d
P
r
o
c
e
s
s
i
n
g
R
u
l
e
}
)
|
>
]
;
Paragraphs
In a prose play, the <p> tag divides speech into paragraphs. Each paragraph is filled with sentences, so we must call the wordProcessingRule next to drill down further.
Searching for each element beginning with a <p>, the pRule goes about processing the data.
I
n
[
]
:
=
p
R
u
l
e
=
X
M
L
E
l
e
m
e
n
t
[
t
y
p
e
:
"
p
"
,
a
t
t
r
i
b
_
L
i
s
t
,
o
u
t
T
y
p
e
_
]
:
>
M
o
d
u
l
e
[
{
a
t
t
a
s
s
o
c
=
A
s
s
o
c
i
a
t
i
o
n
@
@
a
t
t
r
i
b
}
,
<
|
"
T
y
p
e
"
-
>
"
P
a
r
a
g
r
a
p
h
"
,
"
I
D
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
K
e
y
[
{
"
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
"
,
"
i
d
"
}
]
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
C
o
n
t
e
n
t
s
"
-
>
(
o
u
t
T
y
p
e
/
/
.
w
o
r
d
P
r
o
c
e
s
s
i
n
g
R
u
l
e
)
|
>
]
;
Speech
The next rule searches for the <sp> tag, which stands for speech. Along with paragraphs and lines, a single speech block contains stage directions and speakers. To account for all of these conditions, we make sure to call the line, speaker, paragraph, and stage direction rules after we process the information in the speech tag.
Here, we can see the spRule looking for all the <sp> tags and processing the data following this pattern.
I
n
[
]
:
=
s
p
e
e
c
h
R
u
l
e
=
X
M
L
E
l
e
m
e
n
t
[
t
y
p
e
:
"
s
p
"
,
a
t
t
r
i
b
_
L
i
s
t
,
o
u
t
T
y
p
e
_
]
:
>
M
o
d
u
l
e
[
{
a
t
t
a
s
s
o
c
=
A
s
s
o
c
i
a
t
i
o
n
@
@
a
t
t
r
i
b
}
,
<
|
"
T
y
p
e
"
-
>
"
S
p
e
e
c
h
"
,
"
I
D
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
K
e
y
[
{
"
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
"
,
"
i
d
"
}
]
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
C
h
a
r
a
c
t
e
r
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
w
h
o
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
,
S
t
r
i
n
g
S
p
l
i
t
]
,
"
C
o
n
t
e
n
t
s
"
-
>
(
o
u
t
T
y
p
e
/
/
.
{
l
i
n
e
R
u
l
e
,
s
p
e
a
k
e
r
R
u
l
e
,
p
R
u
l
e
,
s
t
a
g
e
D
i
r
e
c
t
i
o
n
R
u
l
e
}
)
|
>
]
;
Main Rule
Finally, we have the mainRule. All of the previous rules culminate to this main rule, because it will process any tag in a play. We can apply this rule to any block of text and it will process it into a clean, readable format. The highest level tag in a play is the <div> tag, and the prologue, along with each act, each have this tag. Below this tag, we have more tags ranging from speech to stage directions to a heading, so those rules are called recursively in the "Contents" section.
The final, main rule that will take care of processing that play.
I
n
[
]
:
=
m
a
i
n
R
u
l
e
=
X
M
L
E
l
e
m
e
n
t
[
t
y
p
e
:
"
d
i
v
"
,
a
t
t
r
i
b
_
L
i
s
t
,
o
u
t
T
y
p
e
_
]
:
>
M
o
d
u
l
e
[
{
a
t
t
a
s
s
o
c
=
A
s
s
o
c
i
a
t
i
o
n
@
@
a
t
t
r
i
b
,
w
=
o
u
t
T
y
p
e
}
,
<
|
"
T
y
p
e
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
t
y
p
e
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
N
u
m
b
e
r
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
"
n
"
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
,
"
C
o
n
t
e
n
t
s
"
-
>
(
o
u
t
T
y
p
e
/
/
.
{
s
p
e
e
c
h
R
u
l
e
,
s
t
a
g
e
D
i
r
e
c
t
i
o
n
R
u
l
e
,
h
e
a
d
R
u
l
e
}
)
|
>
]
;
To see the main rule in action, we will navigate to Act 4 of the play and see how it breaks down each individual component.
I
n
[
]
:
=
a
c
t
4
=
C
a
s
e
s
[
p
l
a
y
T
e
x
t
,
X
M
L
E
l
e
m
e
n
t
[
"
d
i
v
"
,
{
"
t
y
p
e
"
-
>
"
a
c
t
"
,
"
n
"
-
>
"
4
"
}
,
_
]
,
I
n
f
i
n
i
t
y
]
/
.
m
a
i
n
R
u
l
e
O
u
t
[
]
=
T
y
p
e
a
c
t
,
N
u
m
b
e
r
4
,
C
o
n
t
e
n
t
s
{
⋯
1
⋯
}
l
a
r
g
e
o
u
t
p
u
t
s
h
o
w
l
e
s
s
s
h
o
w
m
o
r
e
s
h
o
w
a
l
l
s
e
t
s
i
z
e
l
i
m
i
t
.
.
.
Once again, if we view Act 4 without the Missing elements, we have a clear picture of what is going on.
I
n
[
]
:
=
D
e
l
e
t
e
C
a
s
e
s
[
a
c
t
4
,
M
i
s
s
i
n
g
[
_
]
,
I
n
f
i
n
i
t
y
]
O
u
t
[
]
=
T
y
p
e
a
c
t
,
N
u
m
b
e
r
4
,
C
o
n
t
e
n
t
s
C
o
n
t
e
n
t
s
{
W
o
r
d
A
C
T
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
3
8
0
7
2
0
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
W
o
r
d
4
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
3
8
0
7
4
0
}
,
⋯
4
⋯
,
X
M
L
E
l
e
m
e
n
t
d
i
v
,
{
⋯
1
⋯
}
,
C
o
n
t
e
n
t
s
{
W
o
r
d
S
c
e
n
e
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
4
2
8
0
0
0
,
W
o
r
d
,
T
y
p
e
W
h
i
t
e
s
p
a
c
e
,
W
o
r
d
5
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
4
2
8
0
2
0
}
,
⋯
1
⋯
,
⋯
5
3
⋯
,
T
y
p
e
S
p
e
e
c
h
,
I
D
s
p
-
2
7
3
1
,
C
h
a
r
a
c
t
e
r
⋯
1
⋯
,
C
o
n
t
e
n
t
s
T
y
p
e
S
p
e
a
k
e
r
,
I
D
s
p
k
-
2
7
3
1
,
C
o
n
t
e
n
t
s
W
o
r
d
S
E
C
O
N
D
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
4
5
2
4
7
0
,
⋯
1
⋯
,
W
o
r
d
M
U
S
I
C
I
A
N
,
T
y
p
e
W
o
r
d
,
I
D
f
s
-
r
o
m
-
0
4
5
2
4
9
0
,
⋯
1
⋯
,
⋯
1
⋯
l
a
r
g
e
o
u
t
p
u
t
s
h
o
w
l
e
s
s
s
h
o
w
m
o
r
e
s
h
o
w
a
l
l
s
e
t
s
i
z
e
l
i
m
i
t
.
.
.
Cast Rule
We can also create another rule to grab the character list from the play.
I
n
[
]
:
=
c
a
s
t
R
u
l
e
=
X
M
L
E
l
e
m
e
n
t
[
t
y
p
e
:
"
c
a
s
t
I
t
e
m
"
,
a
t
t
r
i
b
_
L
i
s
t
,
o
u
t
T
y
p
e
_
]
:
>
M
o
d
u
l
e
[
{
a
t
t
a
s
s
o
c
=
A
s
s
o
c
i
a
t
i
o
n
@
@
a
t
t
r
i
b
}
,
<
|
"
T
y
p
e
"
-
>
"
C
a
s
t
M
e
m
b
e
r
"
,
"
C
h
a
r
a
c
t
e
r
"
-
>
L
o
o
k
u
p
[
a
t
t
a
s
s
o
c
,
K
e
y
[
{
"
h
t
t
p
:
/
/
w
w
w
.
w
3
.
o
r
g
/
X
M
L
/
1
9
9
8
/
n
a
m
e
s
p
a
c
e
"
,
"
i
d
"
}
]
,
M
i
s
s
i
n
g
[
"
N
o
t
A
v
a
i
l
a
b
l
e
"
]
]
|
>
]
;
Exploration
Now that the play is nicely formatted, it is a lot easier to analyze the text. I generated a created a couple of new functions and conducted some analyses about the text itself along with characters in the text.
Functions and Text Analysis
The following DisplayPrologue function can be used to display the prologue of any given TEI-formatted play. We can simply search for the "Contents" and "Word" labels under the "prologue" section in the processed text, and the prologue will be displayed.
I
n
[
]
:
=
D
i
s
p
l
a
y
P
r
o
l
o
g
u
e
[
x
_
]
:
=
(
F
l
a
t
t
e
n
@
L
o
o
k
u
p
[
#
,
"
C
h
a
r
a
c
t
e
r
"
]
-
>
S
t
r
i
n
g
J
o
i
n
@
@
L
o
o
k
u
p
[
F
l
a
t
t
e
n
@
L
o
o
k
u
p
[
F
l
a
t
t
e
n
@
L
o
o
k
u
p
[
#
,
"
C
o
n
t
e
n
t
s
"
]
,
"
C
o
n
t
e
n
t
s
"
]
,
"
W
o
r
d
"
,
"
"
]
&
/
@
(
S
e
l
e
c
t
[
#
,
#
T
y
p
e
=
=
=
"
S
p
e
e
c
h
"
&
]
&
/
@
L
o
o
k
u
p
[
(
C
a
s
e
s
[
x
,
X
M
L
E
l
e
m
e
n
t
[
"
d
i
v
"
,
{
"
t
y
p
e
"
-
>
"
p
r
o
l
o
g
u
e
"
,
"
n
"
-
>
_
}
,
_
]
,
I
n
f
i
n
i
t
y
]
/
.
m
a
i
n
R
u
l
e
)
,
"
C
o
n
t
e
n
t
s
"
]
)
)
[
[
1
]
]
[
[
2
]
]
I
n
[
]
:
=
D
i
s
p
l
a
y
P
r
o
l
o
g
u
e
[
p
l
a
y
T
e
x
t
]
O
u
t
[
]
=
T
w
o
h
o
u
s
e
h
o
l
d
s
,
b
o
t
h
a
l
i
k
e
i
n
d
i
g
n
i
t
y
(
I
n
f
a
i
r
V
e
r
o
n
a
,
w
h
e
r
e
w
e
l
a
y
o
u
r
s
c
e
n
e
)
,
F
r
o
m
a
n
c
i
e
n
t
g
r
u
d
g
e
b
r
e
a
k
t
o
n
e
w
m
u
t
i
n
y
,
W
h
e
r
e
c
i
v
i
l
b
l
o
o
d
m
a
k
e
s
c
i
v
i
l
h
a
n
d
s
u
n
c
l
e
a
n
.
F
r
o
m
f
o
r
t
h
t
h
e
f
a
t
a
l
l
o
i
n
s
o
f
t
h
e
s
e
t
w
o
f
o
e
s
A
p
a
i
r
o
f
s
t
a
r
-
c
r
o
s
s
e
d
l
o
v
e
r
s
t
a
k
e
t
h
e
i
r
l
i
f
e
;
W
h
o
s
e
m
i
s
a
d
v
e
n
t
u
r
e
d
p
i
t
e
o
u
s
o
v
e
r
t
h
r
o
w
s
D
o
t
h
w
i
t
h
t
h
e
i
r
d
e
a
t
h
b
u
r
y
t
h
e
i
r
p
a
r
e
n
t
s
’
s
t
r
i
f
e
.
T
h
e
f
e
a
r
f
u
l
p
a
s
s
a
g
e
o
f
t
h
e
i
r
d
e
a
t
h
-
m
a
r
k
e
d
l
o
v
e
A
n
d
t
h
e
c
o
n
t
i
n
u
a
n
c
e
o
f
t
h
e
i
r
p
a
r
e
n
t
s
’
r
a
g
e
,
W
h
i
c
h
,
b
u
t
t
h
e
i
r
c
h
i
l
d
r
e
n
’
s
e
n
d
,
n
a
u
g
h
t
c
o
u
l
d
r
e
m
o
v
e
,
I
s
n
o
w
t
h
e
t
w
o
h
o
u
r
s
’
t
r
a
f
f
i
c
o
f
o
u
r
s
t
a
g
e
;
T
h
e
w
h
i
c
h
,
i
f
y
o
u
w
i
t
h
p
a
t
i
e
n
t
e
a
r
s
a
t
t
e
n
d
,
W
h
a
t
h
e
r
e
s
h
a
l
l
m
i
s