WolframAlpha.com
WolframCloud.com
All Sites & Public Resources...
Products & Services
Wolfram|One
Mathematica
Wolfram|Alpha Notebook Edition
Finance Platform
System Modeler
Wolfram Player
Wolfram Engine
WolframScript
Enterprise Private Cloud
Application Server
Enterprise Mathematica
Wolfram|Alpha Appliance
Enterprise Solutions
Corporate Consulting
Technical Consulting
Wolfram|Alpha Business Solutions
Resource System
Data Repository
Neural Net Repository
Function Repository
Wolfram|Alpha
Wolfram|Alpha Pro
Problem Generator
API
Data Drop
Products for Education
Mobile Apps
Wolfram Player
Wolfram Cloud App
Wolfram|Alpha for Mobile
Wolfram|Alpha-Powered Apps
Services
Paid Project Support
Wolfram U
Summer Programs
All Products & Services »
Technologies
Wolfram Language
Revolutionary knowledge-based programming language.
Wolfram Cloud
Central infrastructure for Wolfram's cloud products & services.
Wolfram Science
Technology-enabling science of the computational universe.
Wolfram Notebooks
The preeminent environment for any technical workflows.
Wolfram Engine
Software engine implementing the Wolfram Language.
Wolfram Natural Language Understanding System
Knowledge-based broadly deployed natural language.
Wolfram Data Framework
Semantic framework for real-world data.
Wolfram Universal Deployment System
Instant deployment across cloud, desktop, mobile, and more.
Wolfram Knowledgebase
Curated computable knowledge powering Wolfram|Alpha.
All Technologies »
Solutions
Engineering, R&D
Aerospace & Defense
Chemical Engineering
Control Systems
Electrical Engineering
Image Processing
Industrial Engineering
Mechanical Engineering
Operations Research
More...
Finance, Statistics & Business Analysis
Actuarial Sciences
Bioinformatics
Data Science
Econometrics
Financial Risk Management
Statistics
More...
Education
All Solutions for Education
Trends
Machine Learning
Multiparadigm Data Science
Internet of Things
High-Performance Computing
Hackathons
Software & Web
Software Development
Authoring & Publishing
Interface Development
Web Development
Sciences
Astronomy
Biology
Chemistry
More...
All Solutions »
Learning & Support
Learning
Wolfram Language Documentation
Fast Introduction for Programmers
Wolfram U
Videos & Screencasts
Wolfram Language Introductory Book
Webinars & Training
Summer Programs
Books
Need Help?
Support FAQ
Wolfram Community
Contact Support
Premium Support
Paid Project Support
Technical Consulting
All Learning & Support »
Company
About
Company Background
Wolfram Blog
Events
Contact Us
Work with Us
Careers at Wolfram
Internships
Other Wolfram Language Jobs
Initiatives
Wolfram Foundation
MathWorld
Computer-Based Math
A New Kind of Science
Wolfram Technology for Hackathons
Student Ambassador Program
Wolfram for Startups
Demonstrations Project
Wolfram Innovator Awards
Wolfram + Raspberry Pi
Summer Programs
More...
All Company »
Search
WOLFRAM COMMUNITY
Connect with users of Wolfram technologies to learn, solve problems and share ideas
Join
Sign In
Dashboard
Groups
People
Message Boards
Answer
(
Unmark
)
Mark as an Answer
GROUPS:
Staff Picks
Business Analytics
Data Science
Mathematica
Graphics and Visualization
Import and Export
Wolfram Language
Machine Learning
Computational Linguistics
Natural Language Processing
Know-How
4
Anton Antonov
Ingest Airbnb reviews data
Anton Antonov, Accendo Data LLC
Posted
2 years ago
6177 Views
|
1 Reply
|
4 Total Likes
Follow this post
|
Introduction
In this notebook we show how to ingest Airbnb reviews data.
(The data can be used for exemplifying relevant Machine Learning algorithms.)
Sources:
MathematicaForPrediction
at
GitHub
MathematicaForPrediction
at
WordPress
Downloading the data
The data was downloaded from this site :
http://insideairbnb.com/get-the-data.html
.
I
n
[
]
:
=
W
e
b
I
m
a
g
e
[
"
h
t
t
p
:
/
/
i
n
s
i
d
e
a
i
r
b
n
b
.
c
o
m
/
g
e
t
-
t
h
e
-
d
a
t
a
.
h
t
m
l
"
]
O
u
t
[
]
=
Load data
In this section the data is imported. (You have to download the data first, and replace the corresponding names.)
I
n
[
]
:
=
l
s
F
i
l
e
N
a
m
e
s
=
F
i
l
e
N
a
m
e
s
[
"
~
/
D
a
t
a
s
e
t
s
/
A
i
r
B
n
B
/
*
r
e
v
i
e
w
s
.
c
s
v
"
]
O
u
t
[
]
=
{
~
/
D
a
t
a
s
e
t
s
/
A
i
r
B
n
B
/
A
i
r
B
n
B
-
A
u
s
t
i
n
-
T
e
x
a
s
-
U
n
i
t
e
d
-
S
t
a
t
e
s
-
r
e
v
i
e
w
s
.
c
s
v
,
~
/
D
a
t
a
s
e
t
s
/
A
i
r
B
n
B
/
A
i
r
B
n
B
-
B
r
o
w
a
r
d
-
C
o
u
n
t
y
-
F
l
o
r
i
d
a
-
U
n
i
t
e
d
-
S
t
a
t
e
s
-
r
e
v
i
e
w
s
.
c
s
v
}
A
b
s
o
l
u
t
e
T
i
m
i
n
g
d
s
R
e
v
i
e
w
s
=
J
o
i
n
@
@
M
a
p
R
e
s
o
u
r
c
e
F
u
n
c
t
i
o
n
R
e
s
o
u
r
c
e
O
b
j
e
c
t
[
]
,
l
s
F
i
l
e
N
a
m
e
s
;
O
u
t
[
]
=
{
1
8
.
0
5
5
9
,
N
u
l
l
}
I
n
[
]
:
=
D
i
m
e
n
s
i
o
n
s
[
d
s
R
e
v
i
e
w
s
]
O
u
t
[
]
=
{
5
2
4
7
5
9
,
6
}
I
n
[
]
:
=
S
e
e
d
R
a
n
d
o
m
[
3
4
2
]
;
R
a
n
d
o
m
S
a
m
p
l
e
[
d
s
R
e
v
i
e
w
s
,
1
2
]
O
u
t
[
]
=
l
i
s
t
i
n
g
_
i
d
i
d
d
a
t
e
r
e
v
i
e
w
e
r
_
i
d
r
e
v
i
e
w
e
r
_
n
a
m
e
c
o
m
m
e
n
t
s
3
2
7
6
1
2
9
9
4
5
9
7
8
4
6
4
3
2
0
1
9
-
0
5
-
2
7
2
6
0
6
7
3
1
7
6
R
a
c
h
e
l
T
h
i
s
p
l
a
c
e
a
n
d
l
o
c
a
t
i
o
n
w
a
s
p
e
r
f
e
c
t
!
W
e
w
e
r
e
a
b
l
e
t
o
h
a
v
e
w
a
l
k
i
n
g
d
i
s
t
a
n
c
e
t
o
t
h
⋱
5
4
7
7
4
1
3
8
2
2
3
5
9
8
7
2
0
1
6
-
0
6
-
2
6
3
1
9
6
5
1
9
4
M
e
l
o
d
y
L
a
r
g
e
,
l
o
v
e
l
y
h
o
m
e
w
i
t
h
g
r
e
a
t
1
0
-
m
i
n
w
a
l
k
i
n
g
a
c
c
e
s
s
t
o
t
h
e
b
e
a
c
h
a
n
d
b
r
o
a
d
w
a
l
k
,
⋱
1
9
7
2
1
3
5
7
5
1
5
3
1
2
9
2
4
2
0
1
9
-
0
8
-
2
2
7
5
8
7
9
4
A
l
e
x
A
b
s
o
l
u
t
e
l
y
w
o
n
d
e
r
f
u
l
e
x
p
e
r
i
e
n
c
e
-
-
b
e
a
u
t
i
f
u
l
,
c
l
e
a
n
p
l
a
c
e
t
o
s
t
a
y
i
n
a
g
r
e
a
t
l
o
c
a
t
⋱
1
7
1
2
3
4
0
6
3
5
1
9
6
8
2
9
9
2
0
1
8
-
1
1
-
2
4
1
8
2
5
2
0
3
2
7
C
l
i
n
t
T
h
e
p
l
a
c
e
i
s
c
l
e
a
n
,
s
p
a
c
i
o
u
s
a
n
d
l
o
c
a
t
e
d
c
l
o
s
e
t
o
r
e
s
t
a
u
r
a
n
t
s
a
n
d
s
h
o
p
p
i
n
g
.
G
a
b
r
i
⋱
2
3
9
6
6
0
2
3
0
8
0
4
1
0
0
1
2
0
1
8
-
0
8
-
1
5
5
6
2
7
6
1
7
7
R
o
b
b
i
e
T
h
i
s
h
o
m
e
w
a
s
t
h
e
p
e
r
f
e
c
t
a
c
c
o
m
m
o
d
a
t
i
o
n
s
f
o
r
u
s
!
E
v
e
r
y
t
h
i
n
g
w
a
s
c
l
e
a
n
,
s
p
a
c
i
o
u
s
,
⋱
6
9
8
1
0
2
0
2
9
6
2
7
5
2
2
0
1
7
-
1
0
-
1
3
2
8
3
6
8
4
4
6
A
l
e
x
a
n
d
r
i
a
W
e
w
e
r
e
t
h
r
i
l
l
e
d
w
i
t
h
t
h
e
b
i
r
d
s
n
e
s
t
,
t
h
e
r
e
s
p
o
n
s
e
,
a
n
d
t
h
e
t
h
o
u
g
h
t
f
u
l
t
o
u
c
h
.
T
h
⋱
3
2
7
4
3
5
2
6
4
2
5
6
3
9
2
2
7
2
0
1
9
-
0
3
-
1
8
1
4
1
7
9
8
2
R
o
b
e
r
t
T
h
e
p
l
a
c
e
w
a
s
p
r
i
s
t
i
n
e
a
n
d
v
e
r
y
w
e
l
l
a
p
p
o
i
n
t
e
d
.
E
v
e
r
y
t
h
i
n
g
s
e
e
m
e
d
n
e
w
.
T
h
e
s
e
l
f
⋱
1
7
1
0
9
3
1
4
1
3
5
0
1
2
1
2
8
2
0
1
7
-
0
3
-
0
2
7
2
5
7
9
4
2
0
A
b
r
a
h
a
m
L
o
v
e
l
y
p
l
a
c
e
g
r
e
a
t
l
o
c
a
t
i
o
n
n
i
c
e
h
o
s
t
s
h
i
g
h
l
y
r
e
c
o
m
m
e
n
d
e
d
,
2
2
2
6
6
9
2
7
3
1
8
8
6
8
3
0
6
2
0
1
8
-
0
9
-
0
4
1
1
3
5
6
5
1
4
2
T
a
w
n
a
P
e
r
f
e
c
t
p
l
a
c
e
f
o
r
o
u
r
f
i
r
s
t
t
r
i
p
t
o
A
u
s
t
i
n
!
E
a
s
y
t
o
f
i
n
d
,
c
l
e
a
n
,
r
e
a
l
l
y
c
u
t
e
d
e
c
⋱
1
7
1
0
9
3
1
4
2
2
6
3
9
8
3
6
1
2
0
1
8
-
0
1
-
1
0
1
1
0
4
9
7
3
9
0
J
e
s
s
H
o
n
e
s
t
l
y
w
e
t
h
o
u
g
h
t
t
h
e
p
l
a
c
e
w
a
s
g
r
e
a
t
a
n
d
t
h
e
l
o
c
a
t
i
o
n
w
a
s
g
r
e
a
t
,
b
u
t
t
h
e
f
a
c
t
⋱
6
4
6
3
9
2
1
4
6
0
7
7
7
9
2
0
1
4
-
0
6
-
2
3
9
9
2
0
9
2
3
J
o
e
l
M
y
w
i
f
e
a
n
d
I
s
p
e
n
t
5
n
i
g
h
t
s
a
t
K
a
t
h
l
e
e
n
'
s
B
N
B
.
T
h
e
r
o
o
m
w
a
s
v
e
r
y
c
l
e
a
n
w
h
e
n
w
e
⋱
3
2
9
2
0
9
9
4
4
5
5
4
5
1
8
8
4
2
0
1
9
-
0
5
-
1
9
3
6
1
5
8
4
2
7
B
r
e
T
h
i
s
w
a
s
m
y
s
e
c
o
n
d
t
i
m
e
s
t
a
y
i
n
g
w
i
t
h
J
a
m
e
s
i
n
o
n
e
o
f
h
i
s
c
a
m
p
e
r
s
.
T
h
i
s
s
t
a
y
w
a
s
⋱
Summaries
Summary of all records:
I
n
[
]
:
=
R
e
s
o
u
r
c
e
F
u
n
c
t
i
o
n
[
"
R
e
c
o
r
d
s
S
u
m
m
a
r
y
"
]
[
d
s
R
e
v
i
e
w
s
]
O
u
t
[
]
=
1
l
i
s
t
i
n
g
_
i
d
M
i
n
2
2
6
5
1
s
t
Q
u
5
5
5
8
3
2
4
M
e
a
n
1
.
5
2
9
7
6
×
7
1
0
M
e
d
i
a
n
1
5
9
4
4
3
6
1
3
r
d
Q
u
2
.
2
8
8
4
1
×
7
1
0
M
a
x
3
8
7
2
7
5
1
6
,
2
i
d
M
i
n
8
6
5
1
s
t
Q
u
1
.
5
9
7
6
9
×
8
1
0
M
e
a
n
2
.
8
8
2
1
1
×
8
1
0
M
e
d
i
a
n
2
9
1
2
9
7
2
9
6
3
r
d
Q
u
4
.
2
6
6
5
7
×
8
1
0
M
a
x
5
3
3
5
3
2
9
5
6
,
3
d
a
t
e
2
0
1
9
-
0
5
-
2
7
1
9
2
8
2
0
1
9
-
0
7
-
0
7
1
8
4
1
2
0
1
9
-
0
3
-
3
1
1
8
0
0
2
0
1
9
-
0
3
-
1
7
1
7
6
9
2
0
1
9
-
0
4
-
1
4
1
6
9
0
2
0
1
9
-
0
4
-
2
8
1
6
7
1
(
O
t
h
e
r
)
5
1
4
0
6
0
,
4
r
e
v
i
e
w
e
r
_
i
d
M
i
n
3
1
s
t
Q
u
2
.
4
6
7
×
7
1
0
M
e
d
i
a
n
6
7
4
5
5
0
9
1
M
e
a
n
9
.
0
2
3
4
7
×
7
1
0
3
r
d
Q
u
1
.
4
5
0
2
8
×
8
1
0
M
a
x
2
9
6
1
6
5
0
5
0
,
5
r
e
v
i
e
w
e
r
_
n
a
m
e
M
i
c
h
a
e
l
5
1
3
3
D
a
v
i
d
4
4
0
5
J
o
h
n
3
8
3
6
S
a
r
a
h
3
5
3
5
C
h
r
i
s
3
1
9
9
J
e
s
s
i
c
a
3
0
5
0
(
O
t
h
e
r
)
5
0
1
6
0
1
,
6
c
o
m
m
e
n
t
s
G
r
e
a
t
p
l
a
c
e
!
7
2
9
G
r
e
a
t
p
l
a
c
e
6
3
8
G
r
e
a
t
p
l
a
c
e
t
o
s
t
a
y
!
4
0
7
.
4
0
6
G
r
e
a
t
l
o
c
a
t
i
o
n
!
3
8
4
3
1
2
(
O
t
h
e
r
)
5
2
1
8
8
3
Summary of counts per unique “listing_id” value:
I
n
[
]
:
=
R
e
s
o
u
r
c
e
F
u
n
c
t
i
o
n
[
"
R
e
c
o
r
d
s
S
u
m
m
a
r
y
"
]
[
Q
u
e
r
y
[
G
r
o
u
p
B
y
[
#
[
"
l
i
s
t
i
n
g
_
i
d
"
]
&
]
,
L
e
n
g
t
h
]
@
d
s
R
e
v
i
e
w
s
]
O
u
t
[
]
=
1
c
o
l
u
m
n
1
M
i
n
1
1
s
t
Q
u
3
M
e
d
i
a
n
1
2
M
e
a
n
3
2
.
0
3
8
5
3
r
d
Q
u
3
7
M
a
x
8
5
6
Summary of counts per unique “reviewer_id” value:
I
n
[
]
:
=
R
e
s
o
u
r
c
e
F
u
n
c
t
i
o
n
[
"
R
e
c
o
r
d
s
S
u
m
m
a
r
y
"
]
[
Q
u
e
r
y
[
G
r
o
u
p
B
y
[
#
[
"
r
e
v
i
e
w
e
r
_
i
d
"
]
&
]
,
L
e
n
g
t
h
]
@
d
s
R
e
v
i
e
w
s
]
O
u
t
[
]
=
1
c
o
l
u
m
n
1
1
s
t
Q
u
1
3
r
d
Q
u
1
M
e
d
i
a
n
1
M
i
n
1
M
e
a
n
1
.
1
5
0
5
4
M
a
x
8
0
Summary of counts per unique “id” value:
I
n
[
]
:
=
R
e
s
o
u
r
c
e
F
u
n
c
t
i
o
n
[
"
R
e
c
o
r
d
s
S
u
m
m
a
r
y
"
]
[
Q
u
e
r
y
[
G
r
o
u
p
B
y
[
#
[
"
i
d
"
]
&
]
,
L
e
n
g
t
h
]
@
d
s
R
e
v
i
e
w
s
]
O
u
t
[
]
=
1
c
o
l
u
m
n
1
1
s
t
Q
u
1
3
r
d
Q
u
1
M
a
x
1
M
e
a
n
1
M
e
d
i
a
n
1
M
i
n
1
Pareto Principle adherence
Here we see a typical Pareto principle manifestation for the number of reviews distribution per listing ID:
R
e
s
o
u
r
c
e
F
u
n
c
t
i
o
n
R
e
s
o
u
r
c
e
O
b
j
e
c
t
[
]
[
V
a
l
u
e
s
[
N
o
r
m
a
l
[
Q
u
e
r
y
[
G
r
o
u
p
B
y
[
#
[
"
l
i
s
t
i
n
g
_
i
d
"
]
&
]
,
L
e
n
g
t
h
]
@
d
s
R
e
v
i
e
w
s
]
]
]
O
u
t
[
]
=
Here we see that many reviewers do one review only (within that dataset):
R
e
s
o
u
r
c
e
F
u
n
c
t
i
o
n
R
e
s
o
u
r
c
e
O
b
j
e
c
t
[
]
[
V
a
l
u
e
s
[
N
o
r
m
a
l
[
Q
u
e
r
y
[
G
r
o
u
p
B
y
[
#
[
"
r
e
v
i
e
w
e
r
_
i
d
"
]
&
]
,
L
e
n
g
t
h
]
@
d
s
R
e
v
i
e
w
s
]
]
]
O
u
t
[
]
=
I
n
[
]
:
=
C
o
u
n
t
[
Q
u
e
r
y
[
G
r
o
u
p
B
y
[
#
[
"
r
e
v
i
e
w
e
r
_
i
d
"
]
&
]
,
L
e
n
g
t
h
]
@
d
s
R
e
v
i
e
w
s
,
1
]
%
/
D
i
m
e
n
s
i
o
n
s
[
d
s
R
e
v
i
e
w
s
]
〚
2
〛
/
/
N
O
u
t
[
]
=
4
1
1
1
5
3
O
u
t
[
]
=
6
8
5
2
5
.
5
Contingency matrix
Here we compute the contingency matrix for listings vs reviewers:
I
n
[
]
:
=
A
b
s
o
l
u
t
e
T
i
m
i
n
g
[
m
a
t
N
R
e
v
i
e
w
s
=
R
e
s
o
u
r
c
e
F
u
n
c
t
i
o
n
[
"
C
r
o
s
s
T
a
b
u
l
a
t
e
"
]
[
d
s
R
e
v
i
e
w
s
[
A
l
l
,
{
"
l
i
s
t
i
n
g
_
i
d
"
,
"
r
e
v
i
e
w
e
r
_
i
d
"
}
]
,
"
S
p
a
r
s
e
"
T
r
u
e
]
;
]
O
u
t
[
]
=
{
7
.
1
0
6
3
3
,
N
u
l
l
}
I
n
[
]
:
=
M
a
t
r
i
x
P
l
o
t
[
m
a
t
N
R
e
v
i
e
w
s
[
"
S
p
a
r
s
e
M
a
t
r
i
x
"
]
,
M
a
x
P
l
o
t
P
o
i
n
t
s
5
0
0
]
O
u
t
[
]
=
Latent Semantic Analysis
The Latent Semantic Analysis (LSA) computations are mostly done in order to compute different text derived statistics. (See [AA1, AA2].)
Text data
In this sub-section we extract the text data from the dataset into an association with text values and keys that correspond to the review ID’s.
I
n
[
]
:
=
s
t
o
p
W
o
r
d
s
=
C
o
m
p
l
e
m
e
n
t
[
D
i
c
t
i
o
n
a
r
y
L
o
o
k
u
p
[
"
*
"
]
,
D
e
l
e
t
e
S
t
o
p
w
o
r
d
s
[
D
i
c
t
i
o
n
a
r
y
L
o
o
k
u
p
[
"
*
"
]
]
]
;
I
n
[
]
:
=
a
C
o
m
m
e
n
t
s
=
S
e
l
e
c
t
[
A
s
s
o
c
i
a
t
i
o
n
T
h
r
e
a
d
[
N
o
r
m
a
l
[
d
s
R
e
v
i
e
w
s
[
A
l
l
,
"
i
d
"
]
]
,
N
o
r
m
a
l
[
d
s
R
e
v
i
e
w
s
[
A
l
l
,
"
c
o
m
m
e
n
t
s
"
]
]
]
,
S
t
r
i
n
g
Q
]
;
L
e
n
g
t
h
[
a
C
o
m
m
e
n
t
s
]
O
u
t
[
]
=
5
2
4
7
4
3
I
n
[
]
:
=
R
e
s
o
u
r
c
e
F
u
n
c
t
i
o
n
[
"
R
e
c
o
r
d
s
S
u
m
m
a
r
y
"
]
[
S
t
r
i
n
g
L
e
n
g
t
h
/
@
V
a
l
u
e
s
[
a
C
o
m
m
e
n
t
s
]
]
O
u
t
[
]
=
1
c
o
l
u
m
n
1
M
i
n
0
1
s
t
Q
u
8
7
M
e
d
i
a
n
1
7
6
M
e
a
n
2
4
0
.
9
7
7
3
r
d
Q
u
3
1
5
M
a
x
5
7
7
9
I
n
[
]
:
=
R
e
s
o
u
r
c
e
F
u
n
c
t
i
o
n
[
"
R
e
c
o
r
d
s
S
u
m
m
a
r
y
"
]
[
M
a
p
[
L
e
n
g
t
h
[
S
e
l
e
c
t
[
#
,
S
t
r
i
n
g
L
e
n
g
t
h
[
#
]
>
0
&
]
]
&
,
S
t
r
i
n
g
S
p
l
i
t
[
V
a
l
u
e
s
@
a
C
o
m
m
e
n
t
s
,
P
u
n
c
t
u
a
t
i
o
n
C
h
a
r
a
c
t
e
r
|
W
h
i
t
e
s
p
a
c
e
C
h
a
r
a
c
t
e
r
]
]
]
O
u
t
[
]
=
1
c
o
l
u
m
n
1
M
i
n
0
1
s
t
Q
u
1
5
M
e
d
i
a
n
3
1
M
e
a
n
4
3
.
4
9
3
2
3
r
d
Q
u
5
7
M
a
x
1
0
5
4
Monad objects
Here we import the package [AA2]:
I
n
[
]
:
=
I
m
p
o
r
t
[
"
h
t
t
p
s
:
/
/
r
a
w
.
g
i
t
h
u
b
u
s
e
r
c
o
n
t
e
n
t
.
c
o
m
/
a
n
t
o
n
o
n
c
u
b
e
/
M
a
t
h
e
m
a
t
i
c
a
F
o
r
P
r
e
d
i
c
t
i
o
n
/
m
a
s
t
e
r
/
M
o
n
a
d
i
c
P
r
o
g
r
a
m
m
i
n
g
/
M
o
n
a
d
i
c
L
a
t
e
n
t
S
e
m
a
n
t
i
c
A
n
a
l
y
s
i
s
.
m
"
]
»
I
m
p
o
r
t
i
n
g
f
r
o
m
G
i
t
H
u
b
:
S
t
a
t
e
M
o
n
a
d
C
o
d
e
G
e
n
e
r
a
t
o
r
.
m
»
I
m
p
o
r
t
i
n
g
f
r
o
m
G
i
t
H
u
b
:
S
S
p
a
r
s
e
M
a
t
r
i
x
.
m
»
I
m
p
o
r
t
i
n
g
f
r
o
m
G
i
t
H
u
b
:
D
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
C
o
n
s
t
r
u
c
t
i
o
n
.
m
»
I
m
p
o
r
t
i
n
g
f
r
o
m
G
i
t
H
u
b
:
N
o
n
N
e
g
a
t
i
v
e
M
a
t
r
i
x
F
a
c
t
o
r
i
z
a
t
i
o
n
.
m
»
I
m
p
o
r
t
i
n
g
f
r
o
m
G
i
t
H
u
b
:
I
n
d
e
p
e
n
d
e
n
t
C
o
m
p
o
n
e
n
t
A
n
a
l
y
s
i
s
.
m
»
I
m
p
o
r
t
i
n
g
f
r
o
m
G
i
t
H
u
b
:
C
r
o
s
s
T
a
b
u
l
a
t
e
.
m
»
I
m
p
o
r
t
i
n
g
f
r
o
m
G
i
t
H
u
b
:
O
u
t
l
i
e
r
I
d
e
n
t
i
f
i
e
r
s
.
m
»
I
m
p
o
r
t
i
n
g
f
r
o
m
G
i
t
H
u
b
:
M
a
t
h
e
m
a
t
i
c
a
F
o
r
P
r
e
d
i
c
t
i
o
n
U
t
i
l
i
t
i
e
s
.
m
»
I
m
p
o
r
t
i
n
g
f
r
o
m
G
i
t
H
u
b
:
M
o
s
a
i
c
P
l
o
t
.
m
All reviews
I
n
[
]
:
=
A
b
s
o
l
u
t
e
T
i
m
i
n
g
[
l
s
a
C
o
m
m
e
n
t
s
=
L
S
A
M
o
n
U
n
i
t
[
a
C
o
m
m
e
n
t
s
]
⟹
L
S
A
M
o
n
M
a
k
e
D
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
[
{
}
,
s
t
o
p
W
o
r
d
s
]
⟹
L
S
A
M
o
n
A
p
p
l
y
T
e
r
m
W
e
i
g
h
t
F
u
n
c
t
i
o
n
s
[
"
I
D
F
"
,
"
N
o
n
e
"
,
"
C
o
s
i
n
e
"
]
;
]
O
u
t
[
]
=
{
1
0
4
.
5
0
3
,
N
u
l
l
}
Tighter reviews
For faster LSA computations -- and better results -- we want to use a tighter set of reviews.
Here we select review with string length between 80 and 400 symbols:
I
n
[
]
:
=
A
b
s
o
l
u
t
e
T
i
m
i
n
g
[
l
s
a
T
i
g
h
t
C
o
m
m
e
n
t
s
=
L
S
A
M
o
n
U
n
i
t
[
S
e
l
e
c
t
[
a
C
o
m
m
e
n
t
s
,
8
0
<
=
S
t
r
i
n
g
L
e
n
g
t
h
[
#
]
≤
4
0
0
&
]
]
⟹
L
S
A
M
o
n
M
a
k
e
D
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
[
{
}
,
s
t
o
p
W
o
r
d
s
]
⟹
L
S
A
M
o
n
A
p
p
l
y
T
e
r
m
W
e
i
g
h
t
F
u
n
c
t
i
o
n
s
[
"
I
D
F
"
,
"
N
o
n
e
"
,
"
C
o
s
i
n
e
"
]
;
]
O
u
t
[
]
=
{
5
4
.
3
2
3
8
,
N
u
l
l
}
Document
×
term statistics -- all reviews
I
n
[
]
:
=
l
s
a
C
o
m
m
e
n
t
s
⟹
L
S
A
M
o
n
E
c
h
o
D
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
S
t
a
t
i
s
t
i
c
s
⟹
L
S
A
M
o
n
E
c
h
o
D
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
S
t
a
t
i
s
t
i
c
s
[
"
L
o
g
B
a
s
e
"
1
0
]
;
»
C
o
n
t
e
x
t
v
a
l
u
e
"
d
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
"
:
D
i
m
e
n
s
i
o
n
s
:
{
5
2
4
7
4
3
,
1
1
1
8
9
7
}
D
e
n
s
i
t
y
:
0
.
0
0
0
1
7
0
9
8
7
N
u
m
b
e
r
o
f
d
o
c
u
m
e
n
t
s
p
e
r
t
e
r
m
s
u
m
m
a
r
y
1
#
d
o
c
u
m
e
n
t
s
1
s
t
Q
u
1
M
e
d
i
a
n
1
M
i
n
1
3
r
d
Q
u
5
M
e
a
n
8
9
.
7
2
4
4
M
a
x
2
4
8
7
0
1
»
C
o
n
t
e
x
t
v
a
l
u
e
"
d
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
"
:
D
i
m
e
n
s
i
o
n
s
:
{
5
2
4
7
4
3
,
1
1
1
8
9
7
}
D
e
n
s
i
t
y
:
0
.
0
0
0
1
7
0
9
8
7
L
o
g
1
0
n
u
m
b
e
r
o
f
d
o
c
u
m
e
n
t
s
p
e
r
t
e
r
m
s
u
m
m
a
r
y
1
#
d
o
c
u
m
e
n
t
s
1
s
t
Q
u
0
.
M
e
d
i
a
n
0
.
M
i
n
0
.
M
e
a
n
0
.
4
4
4
9
9
6
3
r
d
Q
u
0
.
6
9
8
9
7
M
a
x
5
.
3
9
5
6
8
Document
×
term statistics -- tight reviews
I
n
[
]
:
=
l
s
a
T
i
g
h
t
C
o
m
m
e
n
t
s
⟹
L
S
A
M
o
n
E
c
h
o
D
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
S
t
a
t
i
s
t
i
c
s
⟹
L
S
A
M
o
n
E
c
h
o
D
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
S
t
a
t
i
s
t
i
c
s
[
"
L
o
g
B
a
s
e
"
1
0
]
;
»
C
o
n
t
e
x
t
v
a
l
u
e
"
d
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
"
:
D
i
m
e
n
s
i
o
n
s
:
{
3
2
1
5
7
9
,
6
8
3
5
2
}
D
e
n
s
i
t
y
:
0
.
0
0
0
2
4
4
6
7
9
N
u
m
b
e
r
o
f
d
o
c
u
m
e
n
t
s
p
e
r
t
e
r
m
s
u
m
m
a
r
y
1
#
d
o
c
u
m
e
n
t
s
1
s
t
Q
u
1
M
e
d
i
a
n
1
M
i
n
1
3
r
d
Q
u
5
M
e
a
n
7
8
.
6
8
3
6
M
a
x
1
5
2
3
5
1
»
C
o
n
t
e
x
t
v
a
l
u
e
"
d
o
c
u
m
e
n
t
T
e
r
m
M
a
t
r
i
x
"
:
D
i
m
e
n
s
i
o
n
s
:
{
3
2
1
5
7
9
,
6
8
3
5
2
}
D
e
n
s
i
t
y
:
0
.
0
0
0
2
4
4
6
7
9
L
o
g
1
0
n
u
m
b
e
r
o
f
d
o
c
u
m
e
n
t
s
p
e
r
t
e
r
m
s
u
m
m
a
r
y
1
#
d
o
c
u
m
e
n
t
s
1
s
t
Q
u
0
.
M
e
d
i
a
n
0
.
M
i
n
0
.
M
e
a
n
0
.
4
6
4
5
1
3
3
r
d
Q
u
0
.
6
9
8
9
7
M
a
x
5
.
1
8
2
8
5
Topics extraction (over tighter reviews)
I
n
[
]
:
=
A
b
s
o
l
u
t
e
T
i
m
i
n
g
[
l
s
a
T
i
g
h
t
C
o
m
m
e
n
t
s
=
l
s
a
T
i
g
h
t
C
o
m
m
e
n
t
s
⟹
L
S
A
M
o
n
E
x
t
r
a
c
t
T
o
p
i
c
s
[
"
N
u
m
b
e
r
O
f
T
o
p
i
c
s
"
4
0
,
"
M
i
n
N
u
m
b
e
r
O
f
D
o
c
u
m
e
n
t
s
P
e
r
T
e
r
m
"
1
0
0
,
M
e
t
h
o
d
"
S
V
D
"
,
M
a
x
S
t
e
p
s
1
2
]
;
]
O
u
t
[
]
=
{
1
8
0
.
1
5
3
,
N
u
l
l
}
I
n
[
]
:
=
l
s
a
T
i
g
h
t
C
o
m
m
e
n
t
s
⟹
L
S
A
M
o
n
E
c
h
o
T
o
p
i
c
s
T
a
b
l
e
[
"
N
u
m
b
e
r
O
f
T
a
b
l
e
C
o
l
u
m
n
s
"
8
]
;
»
t
o
p
i
c
s
t
a
b
l
e
:
1
1
.
0
0
0
g
r
e
a
t
0
.
9
2
9
s
t
a
y
0
.
9
0
4
p
l
a
c
e
0
.
8
4
2
l
o
c
a
t
i
o
n
0
.
8
1
5
c
l
e
a
n
0
.
7
3
1
d
e
f
i
n
i
t
e
l
y
0
.
7
2
8
a
u
s
t
i
n
0
.
6
9
3
r
e
c
o
m
m
e
n
d
0
.
6
8
5
h
o
u
s
e
0
.
6
8
3
n
i
c
e
0
.
6
5
2
h
o
s
t
0
.
6
4
3
c
o
m
f
o
r
t
a
b
l
e
6
1
.
0
0
0
h
o
m
e
-
0
.
7
5
8
r
e
c
o
m
m
e
n
d
-
0
.
7
1
9
h
i
g
h
l
y
0
.
6
8
7
e
a
s
y
0
.
6
6
2
c
h
e
c
k
-
0
.
3
7
3
n
i
c
e
-
0
.
3
4
5
b
e
a
c
h
-
0
.
3
3
0
p
l
a
c
e
0
.
2
8
4
l
i
k
e
0
.
2
7
6
f
e
l
t
0
.
2
6
3
a
w
a
y
-
0
.
2
6
3
d
e
f
i
n
i
t
e
l
y
1
1
1
.
0
0
0
r
e
a
l
l
y
-
0
.
8
4
5
s
p
a
c
e
0
.
7
6
0
e
n
j
o
y
e
d
-
0
.
7
1
1
c
o
m
f
o
r
t
a
b
l
e
0
.
5
4
4
c
h
e
c
k
-
0
.
5
3
6
b
e
d
-
0
.
5
1
2
r
o
o
m
0
.
5
1
1
e
a
s
y
0
.
4
7
7
c
l
o
s
e
0
.
4
1
2
s
t
a
y
i
n
g
-
0
.
3
7
8
s
u
p
e
r
-
0
.
3
4
9
l
o
c
a
t
i
o
n
1
6
1
.
0
0
0
c
l
o
s
e
0
.
9
6
2
s
p
a
c
e
-
0
.
8
9
4
n
i
c
e
0
.
7
7
7
r
e
a
l
l
y
-
0
.
6
3
1
p
e
r
f
e
c
t
0
.
6
1
7
e
n
j
o
y
e
d
-
0
.
5
8
4
j
u
s
t
0
.
5
7
3
d
o
w
n
t
o
w
n
-
0
.
5
7
2
d
i
s
t
a
n
c
e
0
.
5
7
0
b
e
a
u
t
i
f
u
l
-
0
.
5
6
2
w
a
l
k
i
n
g
-
0
.
5
6
1
n
e
e
d
e
d
2
1
1
.
0
0
0
a
m
a
z
i
n
g
-
0
.
7
6
4
n
i
c
e
0
.
6
6
7
q
u
i
e
t
0
.
5
3
5
n
e
i
g
h
b
o
r
h
o
o
d
0
.
4
2
8
r
e
a
l
l
y
-
0
.
3
9
9
d
e
f
i
n
i
t
e
l
y
-
0
.
3
7
0
a
p
a
r
t
m
e
n
t
0
.
3
6
8
e
n
j
o
y
e
d
0
.
3
5
1
n
e
e
d
e
d
-
0
.
3
4
6
a
u
s
t
i
n
-
0
.
3
3
4
t
i
m
e
-
0
.
3
1
8
a
r
e
a
2
6
1
.
0
0
0
a
w
e
s
o
m
e
0
.
5
3
6
b
e
a
u
t
i
f
u
l
0
.
5
2
9
r
o
o
m
-
0
.
5
1
4
n
i
c
e
-
0
.
4
4
9
t
i
m
e
-
0
.
4
4
8
w
o
n
d
e
r
f
u
l
0
.
4
2
8
n
e
e
d
e
d
0
.
3
9
7
h
o
s
t
s
-
0
.
3
7
7
s
u
p
e
r
0
.
3
5
3
c
l
o
s
e
-
0