WolframAlpha.com
WolframCloud.com
All Sites & Public Resources...
Products & Services
Wolfram|One
Mathematica
Wolfram|Alpha Notebook Edition
Finance Platform
System Modeler
Wolfram Player
Wolfram Engine
WolframScript
Enterprise Private Cloud
Application Server
Enterprise Mathematica
Wolfram|Alpha Appliance
Enterprise Solutions
Corporate Consulting
Technical Consulting
Wolfram|Alpha Business Solutions
Resource System
Data Repository
Neural Net Repository
Function Repository
Wolfram|Alpha
Wolfram|Alpha Pro
Problem Generator
API
Data Drop
Products for Education
Mobile Apps
Wolfram Player
Wolfram Cloud App
Wolfram|Alpha for Mobile
Wolfram|Alpha-Powered Apps
Services
Paid Project Support
Wolfram U
Summer Programs
All Products & Services »
Technologies
Wolfram Language
Revolutionary knowledge-based programming language.
Wolfram Cloud
Central infrastructure for Wolfram's cloud products & services.
Wolfram Science
Technology-enabling science of the computational universe.
Wolfram Notebooks
The preeminent environment for any technical workflows.
Wolfram Engine
Software engine implementing the Wolfram Language.
Wolfram Natural Language Understanding System
Knowledge-based broadly deployed natural language.
Wolfram Data Framework
Semantic framework for real-world data.
Wolfram Universal Deployment System
Instant deployment across cloud, desktop, mobile, and more.
Wolfram Knowledgebase
Curated computable knowledge powering Wolfram|Alpha.
All Technologies »
Solutions
Engineering, R&D
Aerospace & Defense
Chemical Engineering
Control Systems
Electrical Engineering
Image Processing
Industrial Engineering
Mechanical Engineering
Operations Research
More...
Finance, Statistics & Business Analysis
Actuarial Sciences
Bioinformatics
Data Science
Econometrics
Financial Risk Management
Statistics
More...
Education
All Solutions for Education
Trends
Machine Learning
Multiparadigm Data Science
Internet of Things
High-Performance Computing
Hackathons
Software & Web
Software Development
Authoring & Publishing
Interface Development
Web Development
Sciences
Astronomy
Biology
Chemistry
More...
All Solutions »
Learning & Support
Learning
Wolfram Language Documentation
Fast Introduction for Programmers
Wolfram U
Videos & Screencasts
Wolfram Language Introductory Book
Webinars & Training
Summer Programs
Books
Need Help?
Support FAQ
Wolfram Community
Contact Support
Premium Support
Paid Project Support
Technical Consulting
All Learning & Support »
Company
About
Company Background
Wolfram Blog
Events
Contact Us
Work with Us
Careers at Wolfram
Internships
Other Wolfram Language Jobs
Initiatives
Wolfram Foundation
MathWorld
Computer-Based Math
A New Kind of Science
Wolfram Technology for Hackathons
Student Ambassador Program
Wolfram for Startups
Demonstrations Project
Wolfram Innovator Awards
Wolfram + Raspberry Pi
Summer Programs
More...
All Company »
Search
WOLFRAM COMMUNITY
Connect with users of Wolfram technologies to learn, solve problems and share ideas
Join
Sign In
Dashboard
Groups
People
Message Boards
Answer
(
Unmark
)
Mark as an Answer
GROUPS:
Staff Picks
Data Science
Dynamic Interactivity
Graphs and Networks
Wolfram Language
Wolfram Cloud
Machine Learning
Wolfram High School Summer Camp
4
Austin Geng
[WSC21] Exploring methods of parsing user agent strings
Austin Geng
Posted
1 year ago
1777 Views
|
1 Reply
|
4 Total Likes
Follow this post
|
Exploring methods of parsing User Agent strings
by
Austin Geng
The objective of this project was to create a User Agent string parser. We begin by breaking up the string into a simpler, more abstract form. We then use the built-in Classify function with multiple forms of preprocessing. Lastly, we construct a graph of user agents based on their common components.
Introduction
User Agent strings are sent by a program communicating with the Web (such as a browser or web crawler bot) to provide information about itself to a server, such as its software, version, and operating system. This allows the server to send content appropriate to the limitations of the user agent.
However, it is up to the server to figure out what features each program supports, and so it may not understand a new or updated program. As a result, there has been a long history of browsers copying each other’s User Agent strings to “trick” servers into giving them the correct features. This has led to almost all modern browsers having convoluted User Agent strings; for example, Chrome pretends to be Safari, Mozilla, and Netscape and pretends to use the WebKit, KHTML, and Gecko layout engines.
Furthermore, although there are standards and conventions for the format of User Agent strings, these are sometimes ignored, and in these cases, anything beyond string-matching becomes tricky.
Although feature detection through the User Agent string is now discouraged, the information found within the string still has uses for operating system detection and web analytics.
Splitting up the string
An example (Chrome) user agent string:
I
n
[
]
:
=
u
s
e
r
A
g
e
n
t
E
x
a
m
p
l
e
=
"
M
o
z
i
l
l
a
/
5
.
0
(
X
1
1
;
L
i
n
u
x
x
8
6
_
6
4
)
A
p
p
l
e
W
e
b
K
i
t
/
5
3
7
.
3
6
(
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
)
C
h
r
o
m
e
/
5
1
.
0
.
2
7
0
4
.
1
0
3
S
a
f
a
r
i
/
5
3
7
.
3
6
"
;
The typical user agent string is composed of a list of
products
(software names, often followed by the version after a slash) and
comments
(parenthesized groups of other terms delimited by semicolons). We can break up these strings with relative ease.
Split the string by spaces (excluding those inside a comment):
t
o
k
e
n
s
=
S
t
r
i
n
g
S
p
l
i
t
[
u
s
e
r
A
g
e
n
t
E
x
a
m
p
l
e
,
R
e
g
u
l
a
r
E
x
p
r
e
s
s
i
o
n
[
"
(
?
!
[
^
(
]
*
\
\
)
)
"
]
]
O
u
t
[
]
=
{
M
o
z
i
l
l
a
/
5
.
0
,
(
X
1
1
;
L
i
n
u
x
x
8
6
_
6
4
)
,
A
p
p
l
e
W
e
b
K
i
t
/
5
3
7
.
3
6
,
(
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
)
,
C
h
r
o
m
e
/
5
1
.
0
.
2
7
0
4
.
1
0
3
,
S
a
f
a
r
i
/
5
3
7
.
3
6
}
Group by whether or not a token is a comment:
g
r
o
u
p
e
d
=
G
r
o
u
p
B
y
[
t
o
k
e
n
s
,
S
t
r
i
n
g
M
a
t
c
h
Q
@
R
e
g
u
l
a
r
E
x
p
r
e
s
s
i
o
n
[
"
\
\
(
.
*
\
\
)
"
]
]
O
u
t
[
]
=
F
a
l
s
e
{
M
o
z
i
l
l
a
/
5
.
0
,
A
p
p
l
e
W
e
b
K
i
t
/
5
3
7
.
3
6
,
C
h
r
o
m
e
/
5
1
.
0
.
2
7
0
4
.
1
0
3
,
S
a
f
a
r
i
/
5
3
7
.
3
6
}
,
T
r
u
e
{
(
X
1
1
;
L
i
n
u
x
x
8
6
_
6
4
)
,
(
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
)
}
Split each product's name and version and convert to an association:
p
r
o
d
u
c
t
s
=
A
s
s
o
c
i
a
t
i
o
n
[
R
u
l
e
@
@
P
a
d
R
i
g
h
t
[
#
,
2
,
M
i
s
s
i
n
g
[
]
]
&
/
@
(
S
t
r
i
n
g
S
p
l
i
t
[
#
,
"
/
"
,
A
l
l
]
&
/
@
R
e
p
l
a
c
e
[
g
r
o
u
p
e
d
[
F
a
l
s
e
]
,
_
M
i
s
s
i
n
g
-
>
{
}
]
)
]
O
u
t
[
]
=
M
o
z
i
l
l
a
5
.
0
,
A
p
p
l
e
W
e
b
K
i
t
5
3
7
.
3
6
,
C
h
r
o
m
e
5
1
.
0
.
2
7
0
4
.
1
0
3
,
S
a
f
a
r
i
5
3
7
.
3
6
Split each comment by semicolons and remove parentheses:
c
o
m
m
e
n
t
s
=
S
t
r
i
n
g
S
p
l
i
t
[
S
t
r
i
n
g
T
a
k
e
[
#
,
{
2
,
-
2
}
]
,
"
;
"
]
&
/
@
R
e
p
l
a
c
e
[
g
r
o
u
p
e
d
[
T
r
u
e
]
,
_
M
i
s
s
i
n
g
-
>
{
}
]
O
u
t
[
]
=
{
{
X
1
1
,
L
i
n
u
x
x
8
6
_
6
4
}
,
{
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
}
}
Combine products and comments:
<
|
"
P
r
o
d
u
c
t
s
"
-
>
p
r
o
d
u
c
t
s
,
"
C
o
m
m
e
n
t
s
"
-
>
c
o
m
m
e
n
t
s
|
>
O
u
t
[
]
=
P
r
o
d
u
c
t
s
M
o
z
i
l
l
a
5
.
0
,
A
p
p
l
e
W
e
b
K
i
t
5
3
7
.
3
6
,
C
h
r
o
m
e
5
1
.
0
.
2
7
0
4
.
1
0
3
,
S
a
f
a
r
i
5
3
7
.
3
6
,
C
o
m
m
e
n
t
s
{
{
X
1
1
,
L
i
n
u
x
x
8
6
_
6
4
}
,
{
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
}
}
All together:
s
p
l
i
t
U
s
e
r
A
g
e
n
t
[
u
s
e
r
A
g
e
n
t
_
S
t
r
i
n
g
]
:
=
M
o
d
u
l
e
[
{
t
o
k
e
n
s
,
g
r
o
u
p
e
d
,
p
r
o
d
u
c
t
s
,
c
o
m
m
e
n
t
s
}
,
t
o
k
e
n
s
=
S
t
r
i
n
g
S
p
l
i
t
[
u
s
e
r
A
g
e
n
t
,
R
e
g
u
l
a
r
E
x
p
r
e
s
s
i
o
n
[
"
(
?
!
[
^
(
]
*
\
\
)
)
"
]
]
;
g
r
o
u
p
e
d
=
G
r
o
u
p
B
y
[
t
o
k
e
n
s
,
S
t
r
i
n
g
M
a
t
c
h
Q
@
R
e
g
u
l
a
r
E
x
p
r
e
s
s
i
o
n
[
"
\
\
(
.
*
\
\
)
"
]
]
;
p
r
o
d
u
c
t
s
=
A
s
s
o
c
i
a
t
i
o
n
[
R
u
l
e
@
@
P
a
d
R
i
g
h
t
[
#
,
2
,
M
i
s
s
i
n
g
[
]
]
&
/
@
(
S
t
r
i
n
g
S
p
l
i
t
[
#
,
"
/
"
,
A
l
l
]
&
/
@
R
e
p
l
a
c
e
[
g
r
o
u
p
e
d
[
F
a
l
s
e
]
,
_
M
i
s
s
i
n
g
-
>
{
}
]
)
]
;
c
o
m
m
e
n
t
s
=
S
t
r
i
n
g
S
p
l
i
t
[
S
t
r
i
n
g
T
a
k
e
[
#
,
{
2
,
-
2
}
]
,
"
;
"
]
&
/
@
R
e
p
l
a
c
e
[
g
r
o
u
p
e
d
[
T
r
u
e
]
,
_
M
i
s
s
i
n
g
-
>
{
}
]
;
<
|
"
P
r
o
d
u
c
t
s
"
-
>
p
r
o
d
u
c
t
s
,
"
C
o
m
m
e
n
t
s
"
-
>
c
o
m
m
e
n
t
s
|
>
]
;
s
p
l
i
t
U
s
e
r
A
g
e
n
t
[
u
s
e
r
A
g
e
n
t
E
x
a
m
p
l
e
]
O
u
t
[
]
=
P
r
o
d
u
c
t
s
M
o
z
i
l
l
a
5
.
0
,
A
p
p
l
e
W
e
b
K
i
t
5
3
7
.
3
6
,
C
h
r
o
m
e
5
1
.
0
.
2
7
0
4
.
1
0
3
,
S
a
f
a
r
i
5
3
7
.
3
6
,
C
o
m
m
e
n
t
s
{
{
X
1
1
,
L
i
n
u
x
x
8
6
_
6
4
}
,
{
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
}
}
Interactive version with Manipulate:
M
a
n
i
p
u
l
a
t
e
[
D
y
n
a
m
i
c
M
o
d
u
l
e
[
{
s
p
l
i
t
}
,
s
p
l
i
t
=
s
p
l
i
t
U
s
e
r
A
g
e
n
t
[
u
s
e
r
A
g
e
n
t
]
;
R
o
w
[
P
a
n
e
[
#
,
B
a
s
e
l
i
n
e
P
o
s
i
t
i
o
n
T
o
p
]
&
/
@
P
r
e
p
e
n
d
[
D
a
t
a
s
e
t
/
@
s
p
l
i
t
[
"
C
o
m
m
e
n
t
s
"
]
,
D
a
t
a
s
e
t
@
s
p
l
i
t
[
"
P
r
o
d
u
c
t
s
"
]
]
,
S
p
a
c
e
r
[
1
0
]
]
]
,
{
{
u
s
e
r
A
g
e
n
t
,
u
s
e
r
A
g
e
n
t
E
x
a
m
p
l
e
,
"
U
s
e
r
A
g
e
n
t
"
}
,
I
n
p
u
t
F
i
e
l
d
[
#
,
S
t
r
i
n
g
,
B
a
s
e
S
t
y
l
e
-
>
1
2
,
F
i
e
l
d
S
i
z
e
-
>
{
{
6
0
.
,
6
0
.
}
,
{
1
.
,
I
n
f
i
n
i
t
y
}
}
]
&
}
,
S
a
v
e
D
e
f
i
n
i
t
i
o
n
s
T
r
u
e
]
O
u
t
[
]
=
Data gathering
I was fortunate enough to find a
sample database
of already-analyzed User Agent strings from the website
whatismybrowser.com
.
Import data:
I
n
[
]
:
=
u
s
e
r
A
g
e
n
t
D
a
t
a
=
C
l
o
u
d
I
m
p
o
r
t
@
C
l
o
u
d
O
b
j
e
c
t
@
"
h
t
t
p
s
:
/
/
w
w
w
.
w
o
l
f
r
a
m
c
l
o
u
d
.
c
o
m
/
o
b
j
/
y
y
g
e
n
g
j
u
n
i
o
r
/
w
h
a
t
i
s
m
y
b
r
o
w
s
e
r
S
a
m
p
l
e
D
a
t
a
"
;
Convert to Dataset and replace blank entries with Missing[]:
I
n
[
]
:
=
u
s
e
r
A
g
e
n
t
D
a
t
a
s
e
t
=
M
a
p
[
I
f
[
#
=
=
=
"
"
,
M
i
s
s
i
n
g
[
]
,
#
]
&
,
D
a
t
a
s
e
t
[
A
s
s
o
c
i
a
t
i
o
n
T
h
r
e
a
d
[
F
i
r
s
t
@
u
s
e
r
A
g
e
n
t
D
a
t
a
-
>
#
]
&
/
@
R
e
s
t
@
u
s
e
r
A
g
e
n
t
D
a
t
a
]
,
2
]
;
Random sample of entries:
R
a
n
d
o
m
S
a
m
p
l
e
[
u
s
e
r
A
g
e
n
t
D
a
t
a
s
e
t
,
5
]
O
u
t
[
]
=
i
d
u
s
e
r
_
a
g
e
n
t
t
i
m
e
s
_
s
e
e
n
s
i
m
p
l
e
_
s
o
f
t
w
a
r
e
_
s
t
r
i
n
g
s
i
m
p
l
e
_
s
u
b
_
d
e
s
c
r
i
p
t
i
o
n
_
s
t
r
i
n
g
s
i
m
p
l
e
_
o
p
e
r
a
t
i
n
g
_
p
l
a
t
f
o
r
m
_
s
t
r
i
n
g
s
o
f
t
w
a
r
e
s
o
f
t
w
a
r
e
_
n
a
m
e
s
o
f
t
w
a
r
e
_
n
a
m
e
_
c
o
d
e
s
o
f
t
w
a
r
e
_
v
e
r
s
i
o
n
6
3
7
6
M
o
z
i
l
l
a
/
5
.
0
(
M
a
c
i
n
t
o
s
h
;
U
;
P
P
C
M
a
c
O
S
X
1
0
.
5
;
e
n
-
U
S
;
r
v
:
1
.
9
.
0
.
1
9
)
G
e
c
k
o
/
2
0
1
0
0
3
1
2
1
8
F
i
r
e
f
o
x
/
3
.
0
.
1
9
1
0
5
F
i
r
e
f
o
x
3
o
n
M
a
c
O
S
X
(
L
e
o
p
a
r
d
)
—
—
F
i
r
e
f
o
x
3
F
i
r
e
f
o
x
f
i
r
e
f
o
x
3
3
6
7
8
M
o
z
i
l
l
a
/
4
.
0
(
c
o
m
p
a
t
i
b
l
e
;
M
S
I
E
7
.
0
;
W
i
n
d
o
w
s
N
T
6
.
1
;
W
O
W
6
4
;
T
r
i
d
e
n
t
/
4
.
0
;
S
L
C
C
2
;
.
N
E
T
C
L
R
2
.
0
.
5
0
7
2
7
;
.
N
E
T
C
L
R
3
.
5
.
3
0
7
2
9
;
.
N
E
T
C
L
R
3
.
0
.
3
0
7
2
9
;
M
e
d
i
a
C
e
n
t
e
r
P
C
6
.
0
;
.
N
E
T
4
.
0
C
;
.
N
E
T
4
.
0
E
;
I
n
f
o
P
a
t
h
.
3
;
m
s
-
o
f
f
i
c
e
)
2
8
O
f
f
i
c
e
o
n
W
i
n
d
o
w
s
7
—
—
O
f
f
i
c
e
O
f
f
i
c
e
o
f
f
i
c
e
—
1
0
2
7
0
M
o
z
i
l
l
a
/
5
.
0
(
W
i
n
d
o
w
s
N
T
5
.
1
;
r
v
:
2
0
.
0
)
G
e
c
k
o
/
2
0
1
0
0
1
0
1
F
i
r
e
f
o
x
/
2
0
.
0
7
7
7
5
F
i
r
e
f
o
x
2
0
o
n
W
i
n
d
o
w
s
X
P
—
—
F
i
r
e
f
o
x
2
0
F
i
r
e
f
o
x
f
i
r
e
f
o
x
2
0
6
5
0
3
M
o
z
i
l
l
a
/
4
.
0
(
c
o
m
p
a
t
i
b
l
e
;
M
S
I
E
8
.
0
;
W
i
n
d
o
w
s
N
T
5
.
1
;
T
r
i
d
e
n
t
/
4
.
0
;
I
n
f
o
P
a
t
h
.
2
;
.
N
E
T
4
.
0
C
;
.
N
E
T
4
.
0
E
;
.
N
E
T
C
L
R
2
.
0
.
5
0
7
2
7
;
.
N
E
T
C
L
R
3
.
0
.
4
5
0
6
.
2
1
5
2
;
.
N
E
T
C
L
R
3
.
5
.
3
0
7
2
9
)
9
9
2
I
n
t
e
r
n
e
t
E
x
p
l
o
r
e
r
8
o
n
W
i
n
d
o
w
s
X
P
—
—
I
n
t
e
r
n
e
t
E
x
p
l
o
r
e
r
8
I
n
t
e
r
n
e
t
E
x
p
l
o
r
e
r
i
n
t
e
r
n
e
t
-
e
x
p
l
o
r
e
r
8
1
0
9
0
M
o
z
i
l
l
a
/
5
.
0
(
L
i
n
u
x
;
U
;
A
n
d
r
o
i
d
4
.
2
.
2
;
e
n
-
u
s
;
G
T
-
P
5
1
0
0
B
u
i
l
d
/
J
D
Q
3
9
)
A
p
p
l
e
W
e
b
K
i
t
/
5
3
4
.
3
0
(
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
)
V
e
r
s
i
o
n
/
4
.
0
S
a
f
a
r
i
/
5
3
4
.
3
0
4
8
A
n
d
r
o
i
d
B
r
o
w
s
e
r
4
o
n
A
n
d
r
o
i
d
(
J
e
l
l
y
B
e
a
n
)
—
S
a
m
s
u
n
g
G
T
-
P
5
1
0
0
A
n
d
r
o
i
d
B
r
o
w
s
e
r
4
A
n
d
r
o
i
d
B
r
o
w
s
e
r
a
n
d
r
o
i
d
-
b
r
o
w
s
e
r
4
c
o
l
u
m
n
s
1
–
1
0
o
f
3
9
Classifying
Many properties in the database look suitable to be treated as classes for Classify to train on. However, we first do some preprocessing on the user-agent string to remove unnecessary components and make it simpler for Classify.
Preprocessing Functions
We use 3 different methods of preprocessing: get the list of comments, get the list of products, and get both.
Get lists of comments and flatten to a single list:
g
e
t
A
l
l
C
o
m
m
e
n
t
s
[
u
s
e
r
A
g
e
n
t
_
S
t
r
i
n
g
]
:
=
F
l
a
t
t
e
n
@
s
p
l
i
t
U
s
e
r
A
g
e
n
t
[
u
s
e
r
A
g
e
n
t
]
[
"
C
o
m
m
e
n
t
s
"
]
;
g
e
t
A
l
l
C
o
m
m
e
n
t
s
[
u
s
e
r
A
g
e
n
t
E
x
a
m
p
l
e
]
O
u
t
[
]
=
{
X
1
1
,
L
i
n
u
x
x
8
6
_
6
4
,
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
}
Get product pairs and concatenate back to slash-separated form:
g
e
t
A
l
l
P
r
o
d
u
c
t
s
[
u
s
e
r
A
g
e
n
t
_
S
t
r
i
n
g
]
:
=
K
e
y
V
a
l
u
e
M
a
p
[
I
f
[
M
i
s
s
i
n
g
Q
@
#
2
,
#
1
,
#
1
<
>
"
/
"
<
>
#
2
]
&
,
s
p
l
i
t
U
s
e
r
A
g
e
n
t
[
u
s
e
r
A
g
e
n
t
]
[
"
P
r
o
d
u
c
t
s
"
]
]
;
g
e
t
A
l
l
P
r
o
d
u
c
t
s
[
u
s
e
r
A
g
e
n
t
E
x
a
m
p
l
e
]
O
u
t
[
]
=
{
M
o
z
i
l
l
a
/
5
.
0
,
A
p
p
l
e
W
e
b
K
i
t
/
5
3
7
.
3
6
,
C
h
r
o
m
e
/
5
1
.
0
.
2
7
0
4
.
1
0
3
,
S
a
f
a
r
i
/
5
3
7
.
3
6
}
Get both comments and products:
g
e
t
A
l
l
C
o
m
p
o
n
e
n
t
s
[
u
s
e
r
A
g
e
n
t
_
S
t
r
i
n
g
]
:
=
g
e
t
A
l
l
C
o
m
m
e
n
t
s
@
u
s
e
r
A
g
e
n
t
⋃
g
e
t
A
l
l
P
r
o
d
u
c
t
s
@
u
s
e
r
A
g
e
n
t
;
g
e
t
A
l
l
C
o
m
p
o
n
e
n
t
s
[
u
s
e
r
A
g
e
n
t
E
x
a
m
p
l
e
]
O
u
t
[
]
=
{
A
p
p
l
e
W
e
b
K
i
t
/
5
3
7
.
3
6
,
C
h
r
o
m
e
/
5
1
.
0
.
2
7
0
4
.
1
0
3
,
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
,
L
i
n
u
x
x
8
6
_
6
4
,
M
o
z
i
l
l
a
/
5
.
0
,
S
a
f
a
r
i
/
5
3
7
.
3
6
,
X
1
1
}
Training Data
Through some experimental testing, we determine the best preprocessor for each property. The results roughly match up with where the information is found in the string: Software information is usually in a product, layout engine information can be either, and other information is usually in comments.
Note that some properties have been excluded, either due to being more complicated, being equivalent to another property, or because they were meant for human reading.
Association between preprocessors and the group of properties they work best with:
I
n
[
]
:
=
c
l
a
s
s
i
f
y
G
r
o
u
p
s
=
<
|
g
e
t
A
l
l
C
o
m
m
e
n
t
s
-
>
{
"
o
p
e
r
a
t
i
n
g
_
s
y
s
t
e
m
_
n
a
m
e
"
,
"
o
p
e
r
a
t
i
n
g
_
s
y
s
t
e
m
_
v
e
r
s
i
o
n
"
,
"
o
p
e
r
a
t
i
n
g
_
s
y
s
t
e
m
_
v
e
r
s
i
o
n
_
f
u
l
l
"
,
"
o
p
e
r
a
t
i
n
g
_
s
y
s
t
e
m
_
f
l
a
v
o
u
r
"
,
"
o
p
e
r
a
t
i
n
g
_
p
l
a
t
f
o
r
m
"
,
"
o
p
e
r
a
t
i
n
g
_
p
l
a
t
f
o
r
m
_
v
e
n
d
o
r
_
n
a
m
e
"
,
"
s
o
f
t
w
a
r
e
_
t
y
p
e
"
,
"
h
a
r
d
w
a
r
e
_
t
y
p
e
"
,
"
h
a
r
d
w
a
r
e
_
s
u
b
_
t
y
p
e
"
}
,
g
e
t
A
l
l
C
o
m
p
o
n
e
n
t
s
-
>
{
"
s
o
f
t
w
a
r
e
_
v
e
r
s
i
o
n
"
,
"
s
o
f
t
w
a
r
e
_
v
e
r
s
i
o
n
_
f
u
l
l
"
,
"
l
a
y
o
u
t
_
e
n
g
i
n
e
_
n
a
m
e
"
,
"
l
a
y
o
u
t
_
e
n
g
i
n
e
_
v
e
r
s
i
o
n
"
}
,
g
e
t
A
l
l
P
r
o
d
u
c
t
s
-
>
{
"
s
o
f
t
w
a
r
e
_
n
a
m
e
"
,
"
s
o
f
t
w
a
r
e
_
s
u
b
_
t
y
p
e
"
}
|
>
;
We now convert this information to train a classifier for each property.
Get unprocessed pairs of user agents and a given property:
g
e
t
R
a
w
E
x
a
m
p
l
e
s
[
p
r
o
p
_
S
t
r
i
n
g
]
:
=
S
e
l
e
c
t
[
R
u
l
e
@
@
@
N
o
r
m
a
l
@
u
s
e
r
A
g
e
n
t
D
a
t
a
s
e
t
[
[
A
l
l
,
{
"
u
s
e
r
_
a
g
e
n
t
"
,
p
r
o
p
}
]
]
,
!
M
i
s
s
i
n
g
Q
@
#
[
[
2
]
]
&
]
;
R
a
n
d
o
m
S
a
m
p
l
e
[
g
e
t
R
a
w
E
x
a
m
p
l
e
s
[
"
o
p
e
r
a
t
i
n
g
_
s
y
s
t
e
m
_
n
a
m
e
"
]
,
5
]
O
u
t
[
]
=
{
M
o
z
i
l
l
a
/
5
.
0
(
L
i
n
u
x
;
A
n
d
r
o
i
d
4
.
4
.
2
;
A
5
1
8
B
u
i
l
d
/
M
o
c
o
r
D
r
o
i
d
)
A
p
p
l
e
W
e
b
K
i
t
/
5
3
7
.
3
6
(
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
)
V
e
r
s
i
o
n
/
4
.
0
C
h
r
o
m
e
/
3
0
.
0
.
0
.
0
M
o
b
i
l
e
S
a
f
a
r
i
/
5
3
7
.
3
6
A
n
d
r
o
i
d
,
M
o
z
i
l
l
a
/
5
.
0
(
L
i
n
u
x
;
A
n
d
r
o
i
d
5
.
0
.
2
;
M
o
t
o
E
2
(
4
G
-
L
T
E
)
B
u
i
l
d
/
L
X
I
2
2
.
5
0
-
2
9
.
1
)
A
p
p
l
e
W
e
b
K
i
t
/
5
3
7
.
3
6
(
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
)
C
h
r
o
m
e
/
3
9
.
0
.
2
1
7
1
.
9
3
M
o
b
i
l
e
S
a
f
a
r
i
/
5
3
7
.
3
6
A
n
d
r
o
i
d
,
M
o
z
i
l
l
a
/
5
.
0
(
W
i
n
d
o
w
s
;
U
;
W
i
n
d
o
w
s
N
T
5
.
1
;
e
n
-
U
S
;
r
v
:
1
.
9
.
1
.
4
)
G
e
c
k
o
/
2
0
0
9
1
0
1
6
F
i
r
e
f
o
x
/
3
.
5
.
4
(
.
N
E
T
C
L
R
3
.
5
.
3
0
7
2
9
)
W
i
n
d
o
w
s
,
M
o
z
i
l
l
a
/
5
.
0
(
W
i
n
d
o
w
s
N
T
6
.
3
;
r
v
:
2
9
.
0
)
G
e
c
k
o
/
2
0
1
0
0
1
0
1
F
i
r
e
f
o
x
/
2
9
.
0
W
i
n
d
o
w
s
,
M
o
z
i
l
l
a
/
4
.
0
(
c
o
m
p
a
t
i
b
l
e
;
M
S
I
E
7
.
0
;
W
i
n
d
o
w
s
N
T
6
.
1
;
W
O
W
6
4
;
T
r
i
d
e
n
t
/
5
.
0
;
S
L
C
C
2
;
.
N
E
T
C
L
R
2
.
0
.
5
0
7
2
7
;
.
N
E
T
C
L
R
3
.
5
.
3
0
7
2
9
;
.
N
E
T
C
L
R
3
.
0
.
3
0
7
2
9
;
M
e
d
i
a
C
e
n
t
e
r
P
C
6
.
0
;
.
N
E
T
4
.
0
C
;
.
N
E
T
4
.
0
E
)
W
i
n
d
o
w
s
}
Process each pair's user agent:
g
e
t
P
r
o
c
e
s
s
e
d
E
x
a
m
p
l
e
s
[
p
r
o
p
_
S
t
r
i
n
g
,
p
r
o
c
e
s
s
o
r
_
]
:
=
M
a
p
A
t
[
p
r
o
c
e
s
s
o
r
,
g
e
t
R
a
w
E
x
a
m
p
l
e
s
@
p
r
o
p
,
{
A
l
l
,
1
}
]
;
R
a
n
d
o
m
S
a
m
p
l
e
[
g
e
t
P
r
o
c
e
s
s
e
d
E
x
a
m
p
l
e
s
[
"
o
p
e
r
a
t
i
n
g
_
s
y
s
t
e
m
_
n
a
m
e
"
,
g
e
t
A
l
l
C
o
m
p
o
n
e
n
t
s
]
,
5
]
O
u
t
[
]
=
{
{
A
p
p
l
e
W
e
b
K
i
t
/
5
3
3
.
3
,
e
n
-
U
S
,
F
i
r
e
s
t
o
r
m
-
R
e
l
e
a
s
e
x
6
4
,
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
,
M
o
z
i
l
l
a
/
5
.
0
,
S
a
f
a
r
i
/
5
3
3
.
3
,
S
e
c
o
n
d
L
i
f
e
/
4
.
7
.
3
.
4
7
3
2
3
,
U
,
v
i
n
t
a
g
e
s
k
i
n
,
W
i
n
d
o
w
s
,
W
i
n
d
o
w
s
N
T
6
.
1
}
W
i
n
d
o
w
s
,
{
c
o
m
p
a
t
i
b
l
e
,
G
T
B
7
.
5
,
M
o
z
i
l
l
a
/
4
.
0
,
M
S
I
E
8
.
0
,
.
N
E
T
4
.
0
C
,
.
N
E
T
4
.
0
E
,
.
N
E
T
C
L
R
1
.
1
.
4
3
2
2
,
.
N
E
T
C
L
R
2
.
0
.
5
0
7
2
7
,
.
N
E
T
C
L
R
3
.
0
.
4
5
0
6
.
2
1
5
2
,
.
N
E
T
C
L
R
3
.
5
.
3
0
7
2
9
,
T
r
i
d
e
n
t
/
4
.
0
,
W
i
n
d
o
w
s
N
T
5
.
1
}
W
i
n
d
o
w
s
,
{
c
o
m
p
a
t
i
b
l
e
,
M
o
z
i
l
l
a
/
5
.
0
,
M
S
I
E
9
.
0
,
T
r
i
d
e
n
t
/
5
.
0
,
W
i
n
d
o
w
s
N
T
6
.
0
,
W
O
W
6
4
}
W
i
n
d
o
w
s
,
{
A
n
d
r
o
i
d
5
.
1
,
A
p
p
l
e
W
e
b
K
i
t
/
5
3
7
.
3
6
,
C
h
r
o
m
e
/
3
9
.
0
.
0
.
0
,
G
o
o
g
l
e
N
e
x
u
s
7
-
5
.
1
.
0
-
A
P
I
2
2
-
8
0
0
x
1
2
8
0
B
u
i
l
d
/
L
M
Y
4
7
D
,
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
,
L
i
n
u
x
,
M
o
z
i
l
l
a
/
5
.
0
,
S
a
f
a
r
i
/
5
3
7
.
3
6
,
V
e
r
s
i
o
n
/
4
.
0
}
A
n
d
r
o
i
d
,
{
A
p
p
l
e
W
e
b
K
i
t
/
5
3
7
.
3
6
,
C
h
r
o
m
e
/
4
4
.
0
.
2
4
0
3
.
1
2
5
,
D
r
a
g
o
n
/
4
4
.
5
.
7
.
2
6
8
,
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
,
M
o
z
i
l
l
a
/
5
.
0
,
S
a
f
a
r
i
/
5
3
7
.
3
6
,
W
i
n
d
o
w
s
N
T
5
.
1
}
W
i
n
d
o
w
s
}
Map each property to its processed training data:
I
n
[
]
:
=
c
l
a
s
s
i
f
y
G
r
o
u
p
s
D
a
t
a
=
M
a
p
I
n
d
e
x
e
d
[
<
|
"
P
r
o
p
e
r
t
y
"
-
>
#
1
,
"
D
a
t
a
"
-
>
g
e
t
P
r
o
c
e
s
s
e
d
E
x
a
m
p
l
e
s
[
#
1
,
#
2
[
[
1
]
]
[
[
1
]
]
]
|
>
&
,
c
l
a
s
s
i
f
y
G
r
o
u
p
s
,
{
2
}
]
;
Distribute each preprocessor over its properties:
I
n
[
]
:
=
c
l
a
s
s
i
f
y
G
r
o
u
p
s
L
i
s
t
=
F
l
a
t
t
e
n
@
V
a
l
u
e
s
@
M
a
p
I
n
d
e
x
e
d
[
P
r
e
p
e
n
d
[
#
1
,
"
P
r
e
p
r
o
c
e
s
s
o
r
"
-
>
#
2
[
[
1
]
]
[
[
1
]
]
]
&
,
c
l
a
s
s
i
f
y
G
r
o
u
p
s
D
a
t
a
,
{
2
}
]
;
Properties we want to know about each classifier:
I
n
[
]
:
=
c
l
a
s
s
i
f
y
P
r
o
p
s
=
{
"
C
l
a
s
s
i
f
i
e
r
F
u
n
c
t
i
o
n
"
,
"
A
c
c
u
r
a
c
y
"
,
"
A
c
c
u
r
a
c
y
B
a
s
e
l
i
n
e
"
,
"
E
v
a
l
u
a
t
i
o
n
T
i
m
e
"
,
"
B
a
t
c
h
E
v
a
l
u
a
t
i
o
n
T
i
m
e
"
}
;
A function to train and test a classifier for a given set of training data:
I
n
[
]
:
=
c
l
a
s
s
i
f
y
T
e
s
t
[
d
a
t
a
:
{
_
_
_
R
u
l
e
}
]
:
=
A
s
s
o
c
i
a
t
i
o
n
T
h
r
e
a
d
[
c
l
a
s
s
i
f
y
P
r
o
p
s
,
C
l
a
s
s
i
f
i
e
r
M
e
a
s
u
r
e
m
e
n
t
s
[
C
l
a
s
s
i
f
y
@
d
a
t
a
,
d
a
t
a
,
c
l
a
s
s
i
f
y
P
r
o
p
s
]
]
;
Results
Train and test a classifier on each entry:
c
l
a
s
s
i
f
y
R
e
s
u
l
t
s
=
D
a
t
a
s
e
t
@
A
s
s
o
c
i
a
t
i
o
n
@
P
a
r
a
l
l
e
l
M
a
p
[
#
P
r
o
p
e
r
t
y
-
>
J
o
i
n
[
<
|
"
E
x
a
m
p
l
e
s
"
-
>
L
e
n
g
t
h
@
#
D
a
t
a
,
"
P
r
e
p
r
o
c
e
s
s
o
r
"
-
>
#
P
r
e
p
r
o
c
e
s
s
o
r
|
>
,
c
l
a
s
s
i
f
y
T
e
s
t
[
#
D
a
t
a
]
]
&
,
c
l
a
s
s
i
f
y
G
r
o
u
p
s
L
i
s
t
]
O
u
t
[
]
=
Note that since "not applicable" is treated as missing, some ClassifierFunctions output nonsense when they don't apply.
Example classification:
u
s
e
r
A
g
e
n
t
E
x
a
m
p
l
e
#
C
l
a
s
s
i
f
i
e
r
F
u
n
c
t
i
o
n
[
#
P
r
e
p
r
o
c
e
s
s
o
r
[
u
s
e
r
A
g
e
n
t
E
x
a
m
p
l
e
]
,
{
"
D
e
c
i
s
i
o
n
"
,
"
T
o
p
P
r
o
b
a
b
i
l
i
t
i
e
s
"
}
]
&
/
@
c
l
a
s
s
i
f
y
R
e
s
u
l
t
s
O
u
t
[
]
=
M
o
z
i
l
l
a
/
5
.
0
(
X
1
1
;
L
i
n
u
x
x
8
6
_
6
4
)
A
p
p
l
e
W
e
b
K
i
t
/
5
3
7
.
3
6
(
K
H
T
M
L
,
l
i
k
e
G
e
c
k
o
)
C
h
r
o
m
e
/
5
1
.
0
.
2
7
0
4
.
1
0
3
S
a
f
a
r
i
/
5
3
7
.
3
6
O
u
t
[
]
=
o
p
e
r
a
t
i
n
g
_
s
y
s
t
e
m
_
n
a
m
e
L
i
n
u
x
{
L
i
n
u
x
0
.
9
9
7
8
8
9
}
o
p
e
r
a
t
i
n
g
_
s
y
s
t
e
m
_
v
e
r
s
i
o
n
1
0
{
1
0
0
.
7
5
0
9
3
6
,
9
0
.
1
8
7
1
9
4
}
o
p
e
r
a
t
i
n
g
_
s
y
s
t
e
m
_
v
e
r
s
i
o
n
_
f
u
l
l
1
0
.
1
{
1
0
.
1
0
.
6
1
8
1
0
6
}
o
p
e
r
a
t
i
n
g
_
s
y
s
t
e
m
_
f
l
a
v
o
u
r
U
b
u
n
t
u
{
U
b
u
n
t
u
0
.
8
5
2
9
8
3
,
F
e
d
o
r
a
0
.
1
0
3
6
2
1
}
o
p
e
r
a
t
i
n
g
_
p
l
a
t
f
o
r
m
i
P
a
d
{
i
P
a
d
0
.
9
5
5
7
7
4
}
o
p
e
r
a
t
i
n
g
_
p
l
a
t
f
o
r
m
_
v
e
n
d
o
r
_
n
a
m
e
A
p
p
l
e
{
A
p
p
l
e
0
.
9
9
7
2
1
5
}
s
o
f
t
w
a
r
e
_
t
y
p
e
b
r
o
w
s
e
r
{
b
r
o
w
s
e
r
0
.
9
9
9
7
2
1
}
h
a
r
d
w
a
r
e
_
t
y
p
e
c
o
m
p
u
t
e
r
{
c
o
m
p
u
t
e
r
0
.
9
8
9
0
7
3
}
h
a
r
d
w
a
r
e
_
s
u
b
_
t
y
p
e
t
a
b
l
e
t
{
t
a
b
l
e
t
0
.
8
3
5
6
2
7
,
p
h
o
n
e
0
.
1
4
9
2
1
6
}
s
o
f
t
w
a
r
e
_
v
e
r
s
i
o
n
4
3
{
…
1
2
}
s
o
f
t
w
a
r
e
_
v
e
r
s
i
o
n
_
f
u
l
l
4
5
.
0
.
2
4
5
4
.
8
5
{
4
5
.
0
.
2
4
5
4
.
8
5
0
.
2
5
8
2
7
9
,
4
5
.
0
.
2
4
5
4
.
8
4
0
.
2
2
3
2
8
3
,
4
4
.
0
.
2
4
0
3
.
1
5
7
0
.
1
4
9
8
6
9
,
4
4
.
0
.
2
4
0
3
.
1
5
5
0
.
0
9
5
1
0
0
2
,
4
4
.
0
.
2
4
0
3
.
1
3
3
0
.
0
6
7
0
9
6
6
,
2
8
.
0
.
1
5
0
0
.
9
4
0
.
0
3
6
0
7
3
,
4
3
.
0
.
2
3
5
7
.
9
3
0
.
0
2
6
7
2
3
8
}
l
a
y
o
u
t
_
e
n
g
i
n
e
_
n
a
m
e
B
l
i
n
k
{
B
l
i
n
k
0
.
9
7
8
1
9
}
l
a
y
o
u
t
_
e
n
g
i
n
e
_
v
e
r
s
i
o
n
[
"
5
3
7
"
,
"
3
6
"
]
{
[
"
5
3
7
"
,
"
3
6
"
]
0
.
7
2
7
0
7
2
,
[
]
0
.
2
7
2
9
2
8
}
s
o
f
t
w
a
r
e
_
n
a
m
e
C
h
r
o
m
e
{
C
h
r
o
m
e
0
.
7
3
6
6
3
5
,
O
p
e
r
a
0
.
0
8
1
2
0
5
6
,
S
a
m
s
u
n
g
B
r
o
w
s
e
r
0
.
0
7
5
2
0
6
3
}
s
o
f
t
w
a
r
e
_
s
u
b
_
t
y
p
e
w
e
b
-
b
r
o
w
s
e
r
{
w
e
b
-
b
r
o
w
s
e
r
0
.
9
3
7
1
7
5
}
Browser Spoofing Graph
We wish to find an efficient way to search through all browsers and programs to find the one that matches a given User Agent string the closest. Since browsers typically copy a different browser's User Agent string and attach their own product, we can create a directed graph of the relationship between the product names in User Agent strings, with each path representing the products added between two User Agent strings.
Procedure
Since two programs may share the same products in their User Agent strings, we can have each vertex represent a set of products, and the label of each vertex lists the programs that use that set of products.
An example graph of the relationship between browsers:
e
x
a
m
p
l
e
G
r
a
p
h
=
e
x
a
m
p
l
e
g
r
a
p
h
O
u
t
[
]
=
A formatted version with more information displayed and tooltips from hovering:
f
o
r
m
a
t
G
r
a
p
h
[
g
_
G
r
a
p
h
]
:
=
G
r
a
p
h
[
g
,
I
m
a
g
e
S
i
z
e
-
>
F
u
l
l
,
V
e
r
t
e
x
L
a
b
e
l
s
-
>
(
#
[
[
1
]
]
-
>
P
l
a
c
e
d
[
T
o
o
l
t
i
p
[
S
t
y
l
e
[
F
r
a
m
e
d
[
#
[
[
2
]
]
[
[
1
]
]
,
F
r
a
m
e
M
a
r
g
i
n
s
-
>
0
]
,
B
a
c
k
g
r
o
u
n
d
-
>
R
G
B
C
o
l
o
r
[
1
,
1
,
1
,
0
.
7
5
]
]
,
C
o
l
u
m
n
[
{
#
[
[
1
]
]
,
M
u
l
t
i
c
o
l
u
m
n
@
S
o
r
t
@
#
[
[
2
]
]
}
]
]
,
C
e
n
t
e
r
]
&
/
@
A
n
n
o
t
a
t
i
o
n
V
a
l
u
e
[
g
,
V
e
r
t
e
x
L
a
b
e
l
s
]
)
,
E
d
g
e
L
a
b
e
l
s
-
>
{
e
:
_
L
i
s
t
_
L
i
s
t
:
>
S
t
y
l
e
[
C
o
m
p
l
e
m
e
n
t
[
e
[
[
2
]
]
,
e
[
[
1
]
]
]
,
B
a
c
k
g
r
o
u
n
d
-
>
R
G
B
C
o
l
o
r
[
1
,
1
,
1
,
0
.
7
5
]
]
}
,
E
d
g
e
S
h
a
p
e
F
u
n
c
t
i
o
n
-
>
(
{
A
r
r
o
w
h
e
a
d
s
[
{
{
M
a
x
[
0
.
0
1
*
N
o
r
m
@
S
u
b
t
r
a
c
t
[
#
[
[
2
]
]
,
#
[
[
1
]
]
]
,
0
.
0
1
]
,
0
.
7
}
}
]
,
A
r
r
o
w
@
#
}
&
)
]
;
e
x
a
m
p
l
e
G
r
a
p
h
/
/
f
o
r
m
a
t
G
r
a
p
h
O
u
t
[
]
=
To add a new program to the graph, we find the closest potential parents (incoming vertices) to the program's product list. These are the vertices whose products are a subset of the new program's products, and whose children (outgoing vertices)'s products are not (since these would be closer parents).
A fake example browser:
I
n
[
]
:
=
e
x
a
m
p
l
e
B
r
o
w
s
e
r
=
"
C
o
o
l
B
r
o
w
s
e
r
"
-
>
{
"
O
p
e
r
a
"
,
"
P
r
e
s
t
o
"
,
"
V
e
r
s
i
o
n
"
,
"
M
o
z
i
l
l
a
"
,
"
G
e
c
k
o
"
,
"
C
h
r
o
m
e
"
}
;
Get the children of a vertex:
I
n
[
]
:
=
o
u
t
g
o
i
n
g
L
i
s
t
[
g
_
G
r
a
p
h
,
v
_
]
:
=
V
e
r
t
e
x
O
u
t
C
o
m
p
o
n
e
n
t
[
g
,
{
v
}
,
{
1
}
]
Find the parent vertices:
g
e
t
P
a
r
e
n
t
V
e
r
t
i
c
e
s
[
g
_
G
r
a
p
h
,
v
:
{
_
_
_
S
t
r
i
n
g
}
]
:
=
R
e
a
p
[
D
e
p
t
h
F
i
r
s
t
S
c
a
n
[
g
,
{
}
,
{
"
P
r
e
v
i
s
i
t
V
e
r
t
e
x
"
-
>
(
I
f
[
L
e
n
g
t
h
@
C
o
m
p
l
e
m
e
n
t
[
#
,
v
]
=
=
0
&
&
A
l
l
T
r
u
e
[
o
u
t
g
o
i
n
g
L
i
s
t
[
g
,
#
]
,
L
e
n
g
t
h
@
C
o
m
p
l
e
m
e
n
t
[
#
,
v
]
!
=
0
&
]
,
S
o
w
[
#
]
]
&
)
}
]
]
[
[
2
]
]
[
[
1
]
]
;
H
i
g
h
l
i
g
h
t
G
r
a
p
h
[
e
x
a
m
p
l
e
G
r
a
p
h
,
g
e
t
P
a
r
e
n
t
V
e
r
t
i
c
e
s
[
e
x
a
m
p
l
e
G
r
a
p
h
,
e
x
a
m
p
l
e
B
r
o
w
s
e
r
[
[
2
]
]
]
]
/
/
f
o
r
m
a
t
G
r
a
p
h