WolframAlpha.com
WolframCloud.com
All Sites & Public Resources...
Products & Services
Wolfram|One
Mathematica
Wolfram|Alpha Notebook Edition
Finance Platform
System Modeler
Wolfram Player
Wolfram Engine
WolframScript
Enterprise Private Cloud
Application Server
Enterprise Mathematica
Wolfram|Alpha Appliance
Enterprise Solutions
Corporate Consulting
Technical Consulting
Wolfram|Alpha Business Solutions
Resource System
Data Repository
Neural Net Repository
Function Repository
Wolfram|Alpha
Wolfram|Alpha Pro
Problem Generator
API
Data Drop
Products for Education
Mobile Apps
Wolfram Player
Wolfram Cloud App
Wolfram|Alpha for Mobile
Wolfram|Alpha-Powered Apps
Services
Paid Project Support
Wolfram U
Summer Programs
All Products & Services »
Technologies
Wolfram Language
Revolutionary knowledge-based programming language.
Wolfram Cloud
Central infrastructure for Wolfram's cloud products & services.
Wolfram Science
Technology-enabling science of the computational universe.
Wolfram Notebooks
The preeminent environment for any technical workflows.
Wolfram Engine
Software engine implementing the Wolfram Language.
Wolfram Natural Language Understanding System
Knowledge-based broadly deployed natural language.
Wolfram Data Framework
Semantic framework for real-world data.
Wolfram Universal Deployment System
Instant deployment across cloud, desktop, mobile, and more.
Wolfram Knowledgebase
Curated computable knowledge powering Wolfram|Alpha.
All Technologies »
Solutions
Engineering, R&D
Aerospace & Defense
Chemical Engineering
Control Systems
Electrical Engineering
Image Processing
Industrial Engineering
Mechanical Engineering
Operations Research
More...
Finance, Statistics & Business Analysis
Actuarial Sciences
Bioinformatics
Data Science
Econometrics
Financial Risk Management
Statistics
More...
Education
All Solutions for Education
Trends
Machine Learning
Multiparadigm Data Science
Internet of Things
High-Performance Computing
Hackathons
Software & Web
Software Development
Authoring & Publishing
Interface Development
Web Development
Sciences
Astronomy
Biology
Chemistry
More...
All Solutions »
Learning & Support
Learning
Wolfram Language Documentation
Fast Introduction for Programmers
Wolfram U
Videos & Screencasts
Wolfram Language Introductory Book
Webinars & Training
Summer Programs
Books
Need Help?
Support FAQ
Wolfram Community
Contact Support
Premium Support
Paid Project Support
Technical Consulting
All Learning & Support »
Company
About
Company Background
Wolfram Blog
Events
Contact Us
Work with Us
Careers at Wolfram
Internships
Other Wolfram Language Jobs
Initiatives
Wolfram Foundation
MathWorld
Computer-Based Math
A New Kind of Science
Wolfram Technology for Hackathons
Student Ambassador Program
Wolfram for Startups
Demonstrations Project
Wolfram Innovator Awards
Wolfram + Raspberry Pi
Summer Programs
More...
All Company »
Search
WOLFRAM COMMUNITY
Connect with users of Wolfram technologies to learn, solve problems and share ideas
Join
Sign In
Dashboard
Groups
People
Message Boards
Answer
(
Unmark
)
Mark as an Answer
GROUPS:
Staff Picks
Data Science
Image Processing
Graphics and Visualization
Wolfram Language
Machine Learning
Wolfram Summer School
Neural Networks
8
José Antonio Fernández
[WSS20] Curve OCR for "AP Calculus"-like "sketched" curves
José Antonio Fernández
Posted
2 years ago
5742 Views
|
2 Replies
|
11 Total Likes
Follow this post
|
Introduction and aim
The goal of my project for this year’s Wolfram Summer School was to perform function recognition of curves of the type that can appear in the AP tests (i.e., rough sketches described qualitatively), and be able to compute the typical results that are generally asked in these exam questions: integrals and derivatives, limits, extremum seeking, etc. The methods we use are image processing and neural networks of different kind, in an attempt to obtain automatically the relevant parts of the image for this goal.
The problem in essence reduces to retrieving the image pixels that belong to the function curve. Those pixels will not match the coordinates of the actual graph, and the coordinate system in the image (with origin at the lower-left corner) will be in a different scale, thus we will also need to apply a transformation afterwards. This defines the second task: identify two points in the picture which belong to the graph, along with their transformed counterparts into plot coordinates. First, those will be considered an input to the problem (as well as the breakpoints), and then, attempts to extract them automatically will be explored.
Image Processing approach
The graphical transformations applied here aim at discarding (or hiding) non-relevant information of the image, letting us choose the important parts out of what remains. These methods do not need tedious manual work, but has one limitation: they are image-dependent, meaning they need fine parameter-tuning for every different picture encountered, which further requires a certain technical knowledge from the user. Two examples are analyzed here using this approach. We will first extract two points of the graph by extracting information for the axes labels. (They need not be curve points, we just need to know their coordinates in both systems.) Afterwards, we will retrieve the curve pixel locations.
Example 1
I
n
[
]
:
=
i
m
g
=
;
We will first invert the colors and standardize the image margins to reduce the differences that can be presented between input images:
p
l
o
t
=
I
m
a
g
e
P
a
d
[
I
m
a
g
e
C
r
o
p
@
C
o
l
o
r
N
e
g
a
t
e
@
R
e
m
o
v
e
A
l
p
h
a
C
h
a
n
n
e
l
@
i
m
g
,
8
,
B
l
a
c
k
]
O
u
t
[
]
=
In this particular case, axes labels are well delimited and are identified as independent image components:
c
o
m
p
o
n
e
n
t
s
=
M
o
r
p
h
o
l
o
g
i
c
a
l
C
o
m
p
o
n
e
n
t
s
[
B
i
n
a
r
i
z
e
@
p
l
o
t
]
;
c
o
m
p
o
n
e
n
t
s
/
/
C
o
l
o
r
i
z
e
O
u
t
[
]
=
We need to set the size of the components we are looking for (estimated to be between 2x10 and 18x24 in size):
a
x
e
s
D
i
g
i
t
s
=
I
m
a
g
e
C
l
i
p
@
I
m
a
g
e
@
S
e
l
e
c
t
C
o
m
p
o
n
e
n
t
s
[
c
o
m
p
o
n
e
n
t
s
,
(
2
<
#
W
i
d
t
h
<
1
8
∧
1
0
<
#
L
e
n
g
t
h
<
2
4
)
&
]
(
*
W
e
d
o
n
'
t
s
e
t
t
h
e
m
i
n
i
m
u
m
w
i
d
t
h
h
i
g
h
e
r
s
i
n
c
e
w
e
c
o
u
l
d
l
o
s
e
d
i
g
i
t
s
1
,
w
h
i
c
h
a
r
e
n
a
r
r
o
w
.
*
)
O
u
t
[
]
=
We will need their image representation (in order to have a network recognize them) and their centroids (in order to reconstruct points in the graph in image coordinates):
I
n
[
]
:
=
d
i
g
i
t
s
=
V
a
l
u
e
s
@
C
o
m
p
o
n
e
n
t
M
e
a
s
u
r
e
m
e
n
t
s
[
B
i
n
a
r
i
z
e
@
p
l
o
t
,
{
"
I
m
a
g
e
"
,
"
C
e
n
t
r
o
i
d
"
}
,
(
2
<
#
W
i
d
t
h
<
1
8
∧
1
2
<
#
L
e
n
g
t
h
<
2
4
)
&
]
;
We will assume there is at least one portion of the graph in the first quadrant and that every digit in the right half of the plane is an x axis label. That implies the points identified in that part of the graph are the actual ones, without negative signs missed in the process.
{
x
D
i
g
i
t
s
,
y
D
i
g
i
t
s
}
=
W
i
t
h
[
{
d
=
#
}
,
S
e
l
e
c
t
[
d
i
g
i
t
s
,
#
〚
2
,
d
〛
>
P
a
r
t
[
I
m
a
g
e
D
i
m
e
n
s
i
o
n
s
@
i
m
g
/
2
,
d
]
&
]
]
&
/
@
{
1
,
2
}
(
*
P
o
i
n
t
s
f
a
l
l
i
n
g
o
n
t
h
e
r
i
g
h
t
a
n
d
u
p
p
e
r
h
a
l
f
o
f
t
h
e
i
m
a
g
e
,
r
e
s
p
e
c
t
i
v
e
l
y
.
*
)
O
u
t
[
]
=
,
{
2
6
6
.
7
2
2
,
6
0
.
8
6
1
1
}
,
,
{
3
1
3
.
8
6
3
,
6
0
.
2
7
5
}
,
,
{
3
6
0
.
9
2
5
,
6
0
.
9
}
,
,
{
1
1
2
.
9
7
2
,
2
2
3
.
1
8
1
}
,
,
{
1
1
2
.
3
5
5
,
1
7
7
.
3
7
}
We revert them back to a normal appearance before passed to a pre-trained net for recognition:
I
n
[
]
:
=
c
l
e
a
r
e
d
D
i
g
i
t
s
=
M
a
p
[
C
o
l
o
r
N
e
g
a
t
e
[
I
m
a
g
e
C
r
o
p
[
#
,
{
2
8
,
2
8
}
]
]
&
,
{
x
D
i
g
i
t
s
〚
A
l
l
,
1
〛
,
y
D
i
g
i
t
s
〚
A
l
l
,
1
〛
}
,
{
2
}
]
;
r
e
c
o
g
n
i
s
e
d
D
i
g
i
t
s
=
N
e
t
M
o
d
e
l
[
"
L
e
N
e
t
T
r
a
i
n
e
d
o
n
M
N
I
S
T
D
a
t
a
"
]
/
@
c
l
e
a
r
e
d
D
i
g
i
t
s
;
Associate each x axis and y axis digit to its horizontal and vertical coordinate respectively (note we have assumed the numbers are aligned with the axes’ ticks):
I
n
[
]
:
=
d
i
g
i
t
P
o
s
i
t
i
o
n
s
=
T
h
r
e
a
d
/
@
T
h
r
e
a
d
[
r
e
c
o
g
n
i
s
e
d
D
i
g
i
t
s
{
x
D
i
g
i
t
s
〚
A
l
l
,
2
,
1
〛
,
y
D
i
g
i
t
s
〚
A
l
l
,
2
,
2
〛
}
]
;
We take two axes labels only (e.g. the first two), which will allow us to reconstruct the two necessary points:
{
{
p
1
H
a
t
,
p
2
H
a
t
}
,
{
p
1
,
p
2
}
}
=
T
h
r
e
a
d
/
@
(
T
h
r
o
u
g
h
[
{
K
e
y
s
,
V
a
l
u
e
s
}
[
d
i
g
i
t
P
o
s
i
t
i
o
n
s
〚
A
l
l
,
;
;
2
〛
]
]
)
O
u
t
[
]
=
{
{
{
3
,
3
}
,
{
4
,
2
}
}
,
{
{
2
6
6
.
7
2
2
,
2
2
3
.
1
8
1
}
,
{
3
1
3
.
8
6
3
,
1
7
7
.
3
7
}
}
}
where Keys gives us the axes labels, and Values the position in the image; by threading them (respectively) we obtain the coordinate pairs.
In order to extract the curve points, we need to get rid of everything but the curve itself, and that is the next task.
The Top Hat Transform, with a thin horizontal row as structuring element, extracts white horizontal peaks:
T
o
p
H
a
t
T
r
a
n
s
f
o
r
m
[
p
l
o
t
,
B
o
x
M
a
t
r
i
x
[
{
0
,
1
6
}
]
]
O
u
t
[
]
=
The same transform, with a thin vertical row instead, extracts white vertical peaks:
T
o
p
H
a
t
T
r
a
n
s
f
o
r
m
[
p
l
o
t
,
B
o
x
M
a
t
r
i
x
[
{
1
6
,
0
}
]
]
O
u
t
[
]
=
By multiplying these two images (pixel-wise), and the following one, we will make 0 every pixel but those of the curve itself:
C
o
l
o
r
N
e
g
a
t
e
[
a
x
e
s
D
i
g
i
t
s
]
O
u
t
[
]
=
c
u
r
v
e
=
B
i
n
a
r
i
z
e
[
T
o
p
H
a
t
T
r
a
n
s
f
o
r
m
[
p
l
o
t
,
B
o
x
M
a
t
r
i
x
[
{
0
,
3
}
]
]
T
o
p
H
a
t
T
r
a
n
s
f
o
r
m
[
p
l
o
t
,
B
o
x
M
a
t
r
i
x
[
{
3
,
0
}
]
]
C
o
l
o
r
N
e
g
a
t
e
[
a
x
e
s
D
i
g
i
t
s
]
]
O
u
t
[
]
=
To retrieve the curve points we can just pattern-match the white pixels, who have a value of 1. That gives array indexes, which need to be converted to image coordinates. These two steps are performed by the function labeledPoints (see end):
(
p
o
i
n
t
s
=
l
a
b
e
l
e
d
P
o
i
n
t
s
[
I
m
a
g
e
D
i
m
e
n
s
i
o
n
s
@
#
,
I
m
a
g
e
D
a
t
a
@
#
,
1
]
&
@
c
u
r
v
e
;
H
i
g
h
l
i
g
h
t
I
m
a
g
e
[
c
u
r
v
e
,
p
o
i
n
t
s
]
)
O
u
t
[
]
=
Furthermore, in cases where the breakpoints have a distinctive feature from the background, they can be extracted. Here we have used an opening operation, which basically deletes thin white lines (in our case, everything but the breakpoints):
b
r
e
a
k
P
o
i
n
t
C
o
m
p
o
n
e
n
t
s
=
B
i
n
a
r
i
z
e
@
O
p
e
n
i
n
g
[
p
l
o
t
,
D
i
s
k
M
a
t
r
i
x
[
4
]
]
O
u
t
[
]
=
b
r
e
a
k
P
o
i
n
t
s
=
F
i
r
s
t
/
@
c
o
n
v
e
r
t
[
V
a
l
u
e
s
@
C
o
m
p
o
n
e
n
t
M
e
a
s
u
r
e
m
e
n
t
s
[
b
r
e
a
k
P
o
i
n
t
C
o
m
p
o
n
e
n
t
s
,
"
C
e
n
t
r
o
i
d
"
]
,
p
1
,
p
2
,
p
1
H
a
t
,
p
2
H
a
t
]
(
*
T
h
e
l
a
s
t
f
o
u
r
a
r
g
u
m
e
n
t
s
d
o
n
o
t
c
h
a
n
g
e
*
)
O
u
t
[
]
=
{
-
2
.
0
0
8
5
,
-
0
.
0
2
1
9
4
3
,
2
.
0
0
6
5
2
,
5
.
0
2
1
1
5
}
Once the points’ coordinates are stored in a list we can use the function convert (see ref [1]; adapted), which receives a list of point coordinates (in image axes) as first argument, the two retrieved required graph coordinate pairs p1 and P2, and their transformed pairs as the rest of arguments:
p
H
a
t
=
c
o
n
v
e
r
t
[
p
o
i
n
t
s
,
p
1
,
p
2
,
p
1
H
a
t
,
p
2
H
a
t
]
;
We can attempt to retrieve the symbolic expression using the built-in function FindFormula. As of now, this is expected to work well only if applied piece-wise,
f
i
n
d
F
o
r
m
u
l
a
P
i
e
c
e
w
i
s
e
[
p
H
a
t
,
b
r
e
a
k
P
o
i
n
t
s
,
S
p
e
c
i
f
i
c
i
t
y
G
o
a
l
"
H
i
g
h
"
,
T
a
r
g
e
t
F
u
n
c
t
i
o
n
s
{
T
i
m
e
s
,
S
q
r
t
,
P
l
u
s
}
,
P
e
r
f
o
r
m
a
n
c
e
G
o
a
l
"
Q
u
a
l
i
t
y
"
]
O
u
t
[
]
=
-
1
.
1
5
5
8
-
1
.
0
4
4
4
4
x
,
-
1
.
1
2
8
6
1
+
2
.
0
7
0
1
1
x
,
-
1
.
2
0
0
4
4
+
2
0
.
9
9
5
2
4
.
6
1
8
6
8
x
+
6
.
7
2
6
2
2
1
.
1
9
8
2
8
x
,
Typically, AP-level functions usually include polynomials, so one choice initially considered to fit the points was spline interpolation, fitting each piece separately. However interpolation parameters do not work equally well for the different functions that can appear in each interval, so it requires user attention to select them appropriately. We will thus use FindFormula from now on.
Example 2
The second example has as purpose show the obstacles that are present in our path towards automation.
I
n
[
]
:
=
i
m
g
=
I
m
a
g
e
R
e
s
i
z
e
,
5
0
0
;
I
n
[
]
:
=
p
l
o
t
=
I
m
a
g
e
P
a
d
[
I
m
a
g
e
C
r
o
p
@
C
o
l
o
r
N
e
g
a
t
e
@
R
e
m
o
v
e
A
l
p
h
a
C
h
a
n
n
e
l
@
i
m
g
,
8
,
B
l
a
c
k
]
O
u
t
[
]
=
As we start getting the components, we see there are numerous difficulties: double-digit numbers are segmented separately, the axes titles might mistakenly be taken as the digits, ... :
I
n
[
]
:
=
c
o
m
p
o
n
e
n
t
s
=
M
o
r
p
h
o
l
o
g
i
c
a
l
C
o
m
p
o
n
e
n
t
s
[
B
i
n
a
r
i
z
e
@
p
l
o
t
]
;
a
x
e
s
D
i
g
i
t
s
=
I
m
a
g
e
C
l
i
p
@
I
m
a
g
e
@
S
e
l
e
c
t
C
o
m
p
o
n
e
n
t
s
[
c
o
m
p
o
n
e
n
t
s
,
(
2
<
#
W
i
d
t
h
<
1
8
∧
5
<
#
L
e
n
g
t
h
<
2
6
)
&
]
;
I
m
a
g
e
R
e
s
i
z
e
[
G
r
a
p
h
i
c
s
R
o
w
[
{
c
o
m
p
o
n
e
n
t
s
/
/
C
o
l
o
r
i
z
e
,
a
x
e
s
D
i
g
i
t
s
}
]
,
1
0
0
0
]
O
u
t
[
]
=
so in this case it is left as manual input.
Attempting again to isolate the curve (note we have needed to significantly change the parameters):
I
n
[
]
:
=
c
u
r
v
e
=
B
i
n
a
r
i
z
e
[
T
o
p
H
a
t
T
r
a
n
s
f
o
r
m
[
p
l
o
t
,
B
o
x
M
a
t
r
i
x
[
{
0
,
1
0
0
}
]
]
T
o
p
H
a
t
T
r
a
n
s
f
o
r
m
[
p
l
o
t
,
B
o
x
M
a
t
r
i
x
[
{
2
0
,
0
}
]
]
C
o
l
o
r
N
e
g
a
t
e
[
a
x
e
s
D
i
g
i
t
s
]
,
.
4
]
O
u
t
[
]
=
This has not been enough on its own this time, so we manage by extracting the larger components of that result:
I
n
[
]
:
=
S
e
l
e
c
t
C
o
m
p
o
n
e
n
t
s
[
M
o
r
p
h
o
l
o
g
i
c
a
l
C
o
m
p
o
n
e
n
t
s
[
B
i
n
a
r
i
z
e
@
c
u
r
v
e
]
,
2
0
<
#
L
e
n
g
t
h
&
]
/
/
C
o
l
o
r
i
z
e
O
u
t
[
]
=
Another technique that might be attempted is Region Growing, directly over the cropped (not color-negated) image in this case:
I
n
[
]
:
=
p
l
o
t
=
I
m
a
g
e
P
a
d
[
I
m
a
g
e
C
r
o
p
@
R
e
m
o
v
e
A
l
p
h
a
C
h
a
n
n
e
l
@
i
m
g
,
8
,
W
h
i
t
e
]
;
I
n
[
]
:
=
c
u
r
v
e
=
S
e
t
A
l
p
h
a
C
h
a
n
n
e
l
[
p
l
o
t
,
C
o
l
o
r
N
e
g
a
t
e
[
R
e
g
i
o
n
B
i
n
a
r
i
z
e
[
p
l
o
t
,
{
{
2
3
0
,
1
2
0
}
}
,
1
.
6
5
]
]
]
O
u
t
[
]
=
I
n
[
]
:
=
S
e
l
e
c
t
C
o
m
p
o
n
e
n
t
s
[
M
o
r
p
h
o
l
o
g
i
c
a
l
C
o
m
p
o
n
e
n
t
s
[
C
o
l
o
r
N
e
g
a
t
e
@
B
i
n
a
r
i
z
e
[
R
a
s
t
e
r
i
z
e
@
c
u
r
v
e
,
.
9
9
9
]
]
,
2
0
<
#
L
e
n
g
t
h
&
]
/
/
C
o
l
o
r
i
z
e
O
u
t
[
]
=
In this case, to extract the coordinates, we would need to manually select two points whose transformation is known, since as we have seen retrieving information from the axes has proved difficult.
Region Growing
works very well here, but only because the line weight of the function curve is notably thicker and darker than the background items, and we need to make sure we do not put the seed points right onto the function curve (otherwise that will be the first thing to disappear). It can in any case help us remove background elements so that the image components are more clearly defined, to then process them with the other techniques more effectively.
Neural Networks approach
We have generated synthetic datasets, each with different types of training rules for the task towards each is directed, in order to teach some networks different specialized tasks, breaking down the problem into more simple steps. We first thus focus on data generation, and later we will show the different models explored.
Training Data Generation
To train the network we need example graphs, representative of those that it will receive in operation. A training rule is an association between the original example image, and what we target as desired predicted output. For example in the segmentation methods we need to tell the network what class every pixel within an input graph is; therefore the target consists of another image (or more precisely, array, called
mask
) with the class number in its corresponding pixel place.
The process has been randomized as much as possible in order to create as varied data as possible, and the method used will allow to generate specific types of graphs, and different kinds of masks, for these and other potential future methods.
Define image resolution:
I
n
[
]
:
=
i
m
g
S
i
z
e
=
{
2
4
0
,
1
6
0
}
(
*
f
o
r
i
m
a
g
e
s
e
g
m
e
n
t
a
t
i
o
n
*
)
;
I
n
[
]
:
=
i
m
g
S
i
z
e
=
{
3
8
4
,
3
8
4
}
(
*
f
o
r
o
b
j
e
c
t
d
e
t
e
c
t
i
o
n
*
)
;
A rich list of function families has been defined, from which we will sample later pieces to plot. They can be specified with parameters “xL” and “xR” as domain limits, so that they be defined within the interval they later happen to fall in:
I
n
[
]
:
=
f
u
n
c
t
i
o
n
s
[
x
L
_
,
x
R
_
]
:
=
L
i
s
t
I
c
o
n
;
I
n
[
]
:
=
n
u
m
b
e
r
F
u
n
c
=
L
e
n
g
t
h
@
f
u
n
c
t
i
o
n
s
[
0
,
0
]
+
1
(
*
n
u
m
b
e
r
o
f
f
u
n
c
t
i
o
n
f
a
m
i
l
i
e
s
c
o
n
t
a
i
n
e
d
i
n
t
h
e
l
i
s
t
"
f
u
n
c
t
i
o
n
s
"
*
)
;
Let’s associate a class number (integer) to the strings that designate each function family in the list of functions:
I
n
[
]
:
=
f
u
n
c
M
a
p
=
A
s
s
o
c
i
a
t
i
o
n
T
h
r
e
a
d
[
R
a
n
g
e
[
2
,
n
u
m
b
e
r
F
u
n
c
]
D
o
w
n
V
a
l
u
e
s
[
f
u
n
c
t
i
o
n
s
]
〚
1
,
2
,
A
l
l
,
1
〛
]
;
The number of function pieces that can appear in a single graph, and the number of graphs, are defined a priori; respectively:
I
n
[
]
:
=
p
=
3
;
q
=
3
0
0
;
The domain of definition:
I
n
[
]
:
=
d
=
2
(
*
M
i
n
i
m
u
m
d
o
m
a
i
n
l
e
n
g
t
h
o
f
e
a
c
h
p
i
e
c
e
.
*
)
;
x
0
=
R
a
n
d
o
m
I
n
t
e
g
e
r
[
{
-
3
,
1
}
,
q
]
;
x
n
=
x
0
+
d
+
R
a
n
d
o
m
I
n
t
e
g
e
r
[
{
0
,
5
}
,
q
]
;
For some methods, we’ll need to know which family each laid-out piece is:
I
n
[
]
:
=
f
u
n
c
L
a
y
o
u
t
s
=
T
a
b
l
e
[
R
a
n
d
o
m
C
h
o
i
c
e
[
R
a
n
g
e
[
2
,
n
u
m
b
e
r
F
u
n
c
]
,
p
]
,
q
]
;
The divisions were the pieces join together are also chosen randomly within the interval, with some care so as to avoid tiny stretches:
I
n
[
]
:
=
d
=
(
x
n
-
x
0
)
/
(
R
a
n
d
o
m
R
e
a
l
[
{
1
,
1
.
5
}
,
q
]
p
)
(
*
M
i
n
i
m
u
m
s
e
p
a
r
a
t
i
o
n
b
e
t
w
e
e
n
d
i
v
i
s
i
o
n
s
.
A
d
e
n
o
m
i
n
a
t
o
r
o
f
p
i
m
p
l
i
e
s
e
v
e
n
l
y
-
s
p
a
c
e
d
d
i
v
i
s
i
o
n
s
b
y
a
d
i
s
t
a
n
c
e
d
*
)
;
d
i
v
i
s
i
o
n
s
=
W
i
t
h
[
{
i
=
#
}
,
F
o
l
d
L
i
s
t
[
R
a
n
d
o
m
R
e
a
l
@
{
#
1
+
d
〚
i
〛
,
x
n
〚
i
〛
-
d
〚
i
〛
#
2
}
&
,
x
0
〚
i
〛
,
R
e
v
e
r
s
e
@
R
a
n
g
e
[
p
-
1
]
]
~
J
o
i
n
~
{
x
n
〚
i
〛
}
]
&
/
@
R
a
n
g
e
[
q
]
(
*
I
n
s
p
i
r
e
d
f
r
o
m
a
n
s
w
e
r
s
i
n
[
2
]
.
R
e
v
e
r
s
e
@
R
a
n
g
e
[
(
p
-
1
)
-
t
]
]
i
s
a
s
e
q
u
e
n
c
e
o
f
t
h
e
d
i
v
i
s
i
o
n
s
r
e
m
a
i
n
i
n
g
t
o
b
e
c
o
m
p
u
t
e
d
w
h
e
n
t
h
e
f
u
n
c
t
i
o
n
i
n
t
h
e
f
i
r
s
t
a
r
g
u
m
e
n
t
i
s
b
e
i
n
g
a
p
p
l
i
e
d
f
o
r
t
h
e
t
-
t
h
t
i
m
e
.
W
e
n
e
e
d
t
o
l
e
a
v
e
r
o
o
m
i
n
t
h
e
i
n
t
e
r
v
a
l
f
o
r
t
h
e
s
e
*
)
;
and those divisions are partitioned into the domain of definition of each piece function:
I
n
[
]
:
=
p
a
r
t
i
t
i
o
n
s
=
F
i
r
s
t
@
F
l
a
t
t
e
n
[
P
a
r
t
i
t
i
o
n
[
d
i
v
i
s
i
o
n
s
,
{
1
,
2
}
,
1
]
,
{
3
}
]
(
*
i
n
t
e
r
v
a
l
s
i
n
w
h
i
c
h
e
a
c
h
p
i
e
c
e
i
s
d
e
f
i
n
e
d
*
)
;
The actual piecewise functions are laid out, picked from the corresponding family :
I
n
[
]
:
=
p
i
e
c
e
L
a
y
o
u
t
s
=
M
a
p
T
h
r
e
a
d
[
R
a
n
d
o
m
C
h
o
i
c
e
@
f
u
n
c
t
i
o
n
s
[
#
2
〚
1
〛
,
#
2
〚
2
〛
]
〚
#
1
,
2
〛
&
,
{
f
u
n
c
L
a
y
o
u
t
s
-
1
,
p
a
r
t
i
t
i
o
n
s
}
,
2
]
;
We’ve laid out the functions randomly, and as a result most functions are discontinuous at all their breakpoints. Typical AP-test functions indeed have some discontinuities, but not to this extent. We’ll fix this by shifting the laid-out pieces by adding constants appropriately, in order to make them continuous at some randomly-selected breakpoints; this is done by the function joinPiecewiseAt, defined at the end.
I
n
[
]
:
=
I
f
[
p
≠
1
,
p
i
e
c
e
L
a
y
o
u
t
s
=
M
a
p
T
h
r
e
a
d
[
j
o
i
n
P
i
e
c
e
w
i
s
e
A
t
[
#
1
,
#
2
,
S
o
r
t
@
R
a
n
d
o
m
S
a
m
p
l
e
[
R
a
n
g
e
[
p
-
1
]
,
R
a
n
d
o
m
I
n
t
e
g
e
r
@
{
1
,
p
-
1
}
]
]
&
,
{
p
i
e
c
e
L
a
y
o
u
t
s
,
d
i
v
i
s
i
o
n
s
}
]
]
;
The following variables are used in the generation of the training plots:
I
n
[
]
:
=
{
y
M
i
n
,
y
M
a
x
}
=
{
M
i
n
/
@
#
,
M
a
x
/
@
#
}
&
@
M
a
p
T
h
r
e
a
d
[
T
a
b
l
e
[
#
1
,
{
x
,
#
2
〚
1
〛
,
#
2
〚
2
〛
,
.
0
1
}
]
&
,
{
p
i
e
c
e
L
a
y
o
u
t
s
,
p
a
r
t
i
t
i
o
n
s
}
,
2
]
(
*
v
e
r
t
i
c
a
l
a
x
i
s
'
l
i
m
i
t
s
*
)
;
t
1
=
T
a
b
l
e
[
R
a
n
d
o
m
C
h
o
i
c
e
@
R
a
n
g
e
[
.
0
0
2
3
,
.
0
1
1
,
.
0
0
1
]
,
{
q
}
]
(
*
t
h
i
c
k
n
e
s
s
e
s
o
f
t
h
e
f
u
n
c
t
i
o
n
s
'
c
u
r
v
e
s
*
)
;
We have defined some plotting options common to all graph styles, as randomized as possible, with tried and tested acceptable ranges in order for them to be resemblant to those that typically appear in plots. The list is ever under construction and has come to grow so long that has better been placed at the end of the document.
To enrich the training data, some plots will be generated featuring large dots at the breakpoints, and some others even “callouts”. As preparations :
I
n
[
]
:
=
q
1
=
F
l
o
o
r
[
q
/
3
]
;
q
2
=
F
l
o
o
r
[
2
q
/
3
]
-
q
1
;
q
3
=
q
-
(
q
2
+
q
1
)
(
*
t
h
e
r
e
'
l
l
b
e
q
1
s
i
m
p
l
e
g
r
a
p
h
s
,
q
2
w
i
t
h
b
r
e
a
k
p
o
i
n
t
s
,
a
n
d
q
3
w
i
t
h
c
a
l
l
o
u
t
s
*
)
;
b
r
e
a
k
P
o
i
n
t
s
=
D
e
l
e
t
e
D
u
p
l
i
c
a
t
e
s
/
@
M
a
p
[
P
o
i
n
t
,
F
l
a
t
t
e
n
[
M
a
p
T
h
r
e
a
d
[
{
{
#
2
〚
1
〛
,
#
1
/
.
x
#
2
〚
1
〛
}
,
{
#
2
〚
2
〛
,
#
1
/
.
x
#
2
〚
2
〛
}
}
&
,
{
p
i
e
c
e
L
a
y
o
u
t
s
〚
#
〛
,
p
a
r
t
i
t
i
o
n
s
〚
#
〛
}
]
&
@
#
,
1
]
&
/
@
R
a
n
g
e
[
q
]
,
{
2
}
]
;
c
a
l
l
o
u
t
s
=
A
p
p
l
y
[
C
a
l
l
o
u
t
[
#
,
"
(
"
<
>
T
o
S
t
r
i
n
g
@
R
o
u
n
d
@
#
〚
1
〛
<
>
"
,
"
<
>
T
o
S
t
r
i
n
g
@
R
o
u
n
d
@
#
〚
2
〛
<
>
"
)
"
,
A
p
p
e
a
r
a
n
c
e
N
o
n
e
,
B
a
c
k
g
r
o
u
n
d
N
o
n
e
,
C
a
l
l
o
u
t
M
a
r
k
e
r
N
o
n
e
]
&
,
T
a
k
e
[
b
r
e
a
k
P
o
i
n
t
s
,
-
q
3
]
,
{
2
}
]
;
c
a
l
l
o
u
t
s
=
D
e
l
e
t
e
D
u
p
l
i
c
a
t
e
s
B
y
[
#
,
#
〚
2
〛
&
]
&
/
@
c
a
l
l
o
u
t
s
(
*
s
a
m
e
-
l
a
b
e
l
e
d
c
a
l
l
o
u
t
s
a
r
e
t
r
e
a
t
e
d
a
s
d
u
p
l
i
c
a
t
e
s
*
)
;
Generation of the simplest graphs of the training set :
I
n
[
]
:
=
o
p
t
i
o
n
s
1
=
{
c
o
m
m
o
n
O
p
t
i
o
n
s
@
#
,
P
l
o
t
T
h
e
m
e
{
"
G
r
i
d
"
}
,
G
r
i
d
L
i
n
e
s
S
t
y
l
e
D
i
r
e
c
t
i
v
e
[
R
a
n
d
o
m
C
h
o
i
c
e
[
{
.
8
,
.
2
,
.
1
}
{
P
l
a
i
n
,
D
a
s
h
e
d
,
D
o
t
t
e
d
}
]
,
R
a
n
d
o
m
C
h
o
i
c
e
[
{
.
8
,
.
2
}
{
B
l
a
c
k
,
G
r
a
y
L
e
v
e
l
@
R
a
n
d
o
m
R
e
a
l
@
.
8
}
]
,
T
h
i
c
k
n
e
s
s
[
t
1
〚
#
〛
/
R
a
n
d
o
m
I
n
t
e
g
e
r
@
{
3
,
6
}
]
]
}
&
(
*
o
p
t
i
o
n
s
s
p
e
c
i
f
i
c
f
o
r
t
h
e
s
i
m
p
l
e
s
t
g
r
a
p
h
s
*
)
;
g
r
a
p
h
s
S
i
m
p
l
e
s
t
=
S
h
o
w
[
L
i
s
t
L
i
n
e
P
l
o
t
[
M
a
p
T
h
r
e
a
d
[
T
a
b
l
e
[
{
x
,
#
1
}
,
{
x
,
#
2
〚
1
〛
,
#
2
〚
2
〛
,
.
0
0
1
}
]
&
,
{
p
i
e
c
e
L
a
y
o
u
t
s
〚
#
〛
,
p
a
r
t
i
t
i
o
n
s
〚
#
〛
}
]
,
P
l
o
t
S
t
y
l
e
D
i
r
e
c
t
i
v
e
[
B
l
a
c
k
,
T
h
i
c
k
n
e
s
s
@
t
1
〚
#
〛
]
,
o
p
t
i
o
n
s
1
@
#
]
,
M
e
t
h
o
d
{
"
A
x
e
s
I
n
F
r
o
n
t
"
a
x
e
s
I
n
F
r
o
n
t
}
]
&
/
@
R
a
n
g
e
[
1
,
q
1
]
;
O
u
t
[
]
=
,
,
,
Generation of the graphs featuring highlighted breakpoints:
I
n
[
]
:
=
o
p
t
i
o
n
s
2
=
{
o
p
t
i
o
n
s
1
@
#
,
E
p
i
l
o
g
{
D
i
r
e
c
t
i
v
e
@
P
o
i
n
t
S
i
z
e
[
R
a
n
d
o
m
R
e
a
l
@
{
.
0
0
8
,
.
0
2
2
}
+
t
1
〚
#
〛
]
,
b
r
e
a
k
P
o
i
n
t
s
〚
#
〛
}
}
&
(
*
o
p
t
i
o
n
s
s
p
e
c
i
f
i
c
f
o
r
t
h
e
g
r
a
p
h
s
f
e
a
t
u
r
i
n
g
b
r
e
a
k
p
o
i
n
t
s
:
s
a
m
e
s
t
y
l
e
a
s
s
i
m
p
l
e
s
t
,
p
l
u
s
b
r
e
a
k
p
o
i
n
t
s
o
n
t
o
p
*
)
;
g
r
a
p
h
s
B
r
e
a
k
P
o
i
n
t
s
=
S
h
o
w
[
L
i
s
t
L
i
n
e
P
l
o
t
[
M
a
p
T
h
r
e
a
d
[
T
a
b
l
e
[
{
x
,
#
1
}
,
{
x
,
#
2
〚
1
〛
,
#
2
〚
2
〛
,
.
0
0
1
}
]
&
,
{
p
i
e
c
e
L
a
y
o
u
t
s
〚
#
〛
,
p
a
r
t
i
t
i
o
n
s
〚
#
〛
}
]
,
P
l
o
t
S
t
y
l
e
D
i
r
e
c
t
i
v
e
[
B
l
a
c
k
,
T
h
i
c
k
n
e
s
s
@
t
1
〚
#
〛
]
,
o
p
t
i
o
n
s
2
@
#
]
,
M
e
t
h
o
d
{
"
A
x
e
s
I
n
F
r
o
n
t
"
a
x
e
s
I
n
F
r
o
n
t
}
]
&
/
@
R
a
n
g
e
[
q
1
+
1
,
q
1
+
q
2
]
;
O
u
t
[
]
=
,
,
,
Generation of the graphs featuring both highlighted breakpoints and “callouts”, instead of grids:
I
n
[
]
:
=
o
p
t
i
o
n
s
3
=
{
c
o
m
m
o
n
O
p
t
i
o
n
s
@
#
,
E
p
i
l
o
g
{
D
i
r
e
c
t
i
v
e
@
P
o
i
n
t
S
i
z
e
[
R
a
n
d
o
m
R
e
a
l
@
{
.
0
0
8
,
.
0
2
2
}
+
t
1
〚
#
〛
]
,
b
r
e
a
k
P
o
i
n
t
s
〚
#
〛
}
}
&
(
*
o
p
t
i
o
n
s
s
p
e
c
i
f
i
c
f
o
r
t
h
e
g
r
a
p
h
s
f
e
a
t
u
r
i
n
g
c
a
l
l
o
u
t
s
*
)
;
a
u
x
C
a
l
l
o
u
t
s
=
S
h
o
w
[
L
i
s
t
L
i
n
e
P
l
o
t
[
M
a
p
T
h
r
e
a
d
[
T
a
b
l
e
[
{
x
,
#
1
}
,
{
x
,
#
2
〚
1
〛
,
#
2
〚
2
〛
,
.
0
0
1
}
]
&
,
{
p
i
e
c
e
L
a
y
o
u
t
s
〚
#
〛
,
p
a
r
t
i
t
i
o
n
s
〚
#
〛
}
]
,
P
l
o
t
S
t
y
l
e
D
i
r
e
c
t
i
v
e
[
B
l
a
c
k
,
T
h
i
c
k
n
e
s
s
@
t
1
〚
#
〛
]
,
o
p
t
i
o
n
s
3
@
#
]
,
M
e
t
h
o
d
{
"
A
x
e
s
I
n
F
r
o
n
t
"
a
x
e
s
I
n
F
r
o
n
t
}
]
&
/
@
R
a
n
g
e
[
q
1
+
q
2
+
1
,
q
]
(
*
a
u
x
i
l
i
a
r
y
g
r
a
p
h
s
w
i
t
h
t
h
e
c
a
l
l
o
u
t
s
n
o
t
y
e
t
i
n
p
l
a
c
e
-
u
s
e
d
t
o
c
o
n
s
t
r
u
c
t
t
h
e
"
t
e
m
p
l
a
t
e
s
"
b
e
l
o
w
*
)
;
g
r
a
p
h
s
C
a
l
l
o
u
t
s
=
M
a
p
I
n
d
e
x
e
d
[
S
h
o
w
[
{
#
1
,
L
i
s
t
P
l
o
t
[
c
a
l
l
o
u
t
s
〚
F
i
r
s
t
@
#
2
〛
,
F
i
r
s
t
@
*
C
a
s
e
s
[
R
u
l
e
[
T
i
c
k
s
S
t
y
l
e
,
D
i
r
e
c
t
i
v
e
@
O
r
d
e
r
l
e
s
s
P
a
t
t
e
r
n
S
e
q
u
e
n
c
e
[
_
_
_
,
R
u
l
e
[
F
o
n
t
S
i
z
e
,
s
_
]
,
s
r
:
R
u
l
e
[
F
o
n
t
S
l
a
n
t
,
_
]
,
f
r
:
R
u
l
e
[
F
o
n
t
F
a
m
i
l
y
,
_
]
]
]
L
a
b
e
l
S
t
y
l
e
D
i
r
e
c
t
i
v
e
[
I
n
t
e
g
e
r
P
a
r
t
@
s
,
s
r
,
f
r
]
]
@
*
O
p
t
i
o
n
s
@
#
1
(
*
c
a
l
l
o
u
t
f
o
n
t
s
i
z
e
a
n
d
f
o
n
t
s
l
a
n
t
m
a
t
c
h
i
n
g
t
h
e
t
i
c
k
s
'
*
)
,
P
l
o
t
M
a
r
k
e
r
s
G
r
a
p
h
i
c
s
@
{
}
]
}
]
&
,
a
u
x
C
a
l
l
o
u
t
s
]
(
*
t
h
e
g
r
a
p
h
s
w
i
t
h
t
h
e
c
a
l
l
o
u
t
s
i
n
p
l
a
c
e
*
)
;
O
u
t
[
]
=
,
,
,
All graphs are put together in one variable and they are rasterized:
I
n
[
]
:
=
g
r
a
p
h
s
=
R
a
s
t
e
r
i
z
e
/
@
F
l
a
t
t
e
n
@
{
g
r
a
p
h
s
S
i
m
p
l
e
s
t
,
g
r
a
p
h
s
B
r
e
a
k
P
o
i
n
t
s
,
g
r
a
p
h
s
C
a
l
l
o
u
t
s
}
;
Since borders contain no information, we crop them. (We will need to do this too as a preprocessing step for new images fed into the network.)
I
n
[
]
:
=
c
r
o
p
A
m
o
u
n
t
=
B
o
r
d
e
r
D
i
m
e
n
s
i
o
n
s
/
@
g
r
a
p
h
s
(
*
a
m
o
u
n
t
t
o
c
r
o
p
t
h
e
g
r
a
p
h
s
.
S
a
v
e
d
s
e
p
a
r
a
t
e
l
y
t
o
c
r
o
p
t
h
e
m
a
s
k
s
i
d
e
n
t
i
c
a
l
l
y
.
*
)
;
g
r
a
p
h
s
=
M
a
p
I
n
d
e
x
e
d
[
I
m
a
g
e
P
a
d
[
#
1
,
-
c
r
o
p
A
m
o
u
n
t
〚
#
2
〚
1
〛
〛
]
&
,
g
r
a
p
h
s
]
;
The images are resized to network dimensions:
I
n
[
]
:
=
g
r
a
p
h
s
=
I
m
a
g
e
R
e
s
i
z
e
[
#
,
i
m
g
S
i
z
e
]
&
/
@
g
r
a
p
h
s
;
For our aim to mask the images, we will need some auxiliary images (or “templates” ):
I
n
[
]
:
=
t
e
m
p
l
a
t
e
s
=
S
h
o
w
[
#
,
O
p
t
i
o
n
s
@
#
/
.
D
i
r
e
c
t
i
v
e
[
d
_
_
]
D
i
r
e
c
t
i
v
e
[
d
,
O
p
a
c
i
t
y
@
0
]
]
&
/
@
F
l
a
t
t
e
n
@
{
g
r
a
p
h
s
S
i
m
p
l
e
s
t
,
g
r
a
p
h
s
B
r
e
a
k
P
o
i
n
t
s
,
a
u
x
C
a
l
l
o
u
t
s
}
(
*
w
e
'
r
e
j
u
s
t
a
d
d
i
n
g
a
n
e
x
t
r
a
o
p
a
c
i
t
y
d
i
r
e
c
t
i
v
e
t
o
t
u
r
n
t
h
e
b
a
c
k
g
r
o
u
n
d
t
r
a
n
s
p
a
r
e
n
t
(
e
v
e
r
y
t
h
i
n
g
e
x
c
e
p
t
t
h
e
f
u
n
c
t
i
o
n
c
u
r
v
e
s
w
e
a
r
e
t
o
i
d
e
n
t
i
f
y
)
*
)
;
Binary curve mask generation
The first task we will consider is segmenting the curve pixels out of the graphs, i.e. classifying each single pixel in the image as being background (class 1), or curve (class 2). Here are generated these targeted outputs for our training examples.
The mask templates we have defined go through the same three steps as the graphs above; now in one go:
I
n
[
]
:
=
c
u
r
v
e
B
i
n
a
r
y
M
a
s
k
s
=
I
m
a
g
e
R
e
s
i
z
e
[
#
,
i
m
g
S
i
z
e
]
&
/
@
M
a
p
I
n
d
e
x
e
d
[
I
m
a
g
e
P
a
d
[
#
1
,
-
c
r
o
p
A
m
o
u
n
t
〚
#
2
〚
1
〛
〛
]
&
,
R
a
s
t
e
r
i
z
e
/
@
t
e
m
p
l
a
t
e
s
]
;
They need additionally to be binarized, prior to putting the actual class number on each curve pixel:
I
n
[
]
:
=
c
u
r
v
e
B
i
n
a
r
y
M
a
s
k
s
=
B
i
n
a
r
i
z
e
/
@
c
u
r
v
e
B
i
n
a
r
y
M
a
s
k
s
;
O
u
t
[
]
=
,
,
,
,
,
,
,
,
,
,
,
I
n
[
]
:
=
c
u
r
v
e
B
i
n
a
r
y
M
a
s
k
s
=
(
(
#
/
.
0
2
)
&
@
I
m
a
g
e
D
a
t
a
@
#
)
&
/
@
c
u
r
v
e
B
i
n
a
r
y
M
a
s
k
s
;
Multiclass curve mask generation
The second approach explored to identify the piece functions is trying to predict what family of functions each pixel point belongs to. Here we are not just told whether it is background or not, but we are seeking to have the pixels labeled with their corresponding function family out of those defined in the array “
functions”
above at the beginning. We would afterwards just be left to find the parameters of the family.
The approach here to construct the masks here is to plot every single piece separately, on the exact same background as originally (still made transparent); this will allow us to put the different classes on the different stretches, to then reconstruct the full curve masked with its corresponding label per stretch as we target.
I
n
[
]
:
=
m
u
l
t
i
c
l
a
s
s
T
e
m
p
l
a
t
e
s
=
W
i
t
h
[
{
i
=
#
}
,
M
a
p
T
h
r
e
a
d
[
S
h
o
w
[
L
i
s
t
L
i
n
e
P
l
o
t
[
T
a
b
l
e
[
{
x
,
#
1
}
,
{
x
,
#
2
〚
1
〛
,
#
2
〚
2
〛
,
.
0
0
1
}
]
,
O
p
t
i
o
n
s
[
F
l
a
t
t
e
n
[
{
g
r
a
p
h
s
S
i
m
p
l
e
s
t
,
g
r
a
p
h
s
B
r
e
a
k
P
o
i
n
t
s
,
a
u
x
C
a
l
l
o
u
t
s
}
]
〚
i
〛
]
/
.
D
i
r
e
c
t
i
v
e
[
d
_
_
]
D
i
r
e
c
t
i
v
e
[
d
,
O
p
a
c
i
t
y
@
0
]
,
P
l
o
t
S
t
y
l
e
D
i
r
e
c
t
i
v
e
[
B
l
a
c
k
,
T
h
i
c
k
n
e
s
s
@
t
1
〚
i
〛
]
]
,
M
e
t
h
o
d
{
"
A
x
e
s
I
n
F
r
o
n
t
"
F
a
l
s
e
}
]
&
,
{
p
i
e
c
e
L
a
y
o
u
t
s
〚
#
〛
,
p
a
r
t
i
t
i
o
n
s
〚
#
〛
}
]
]
&
/
@
R
a
n
g
e
[
q
]
(
*
W
e
'
r
e
p
l
o
t
t
i
n
g
e
v
e
r
y
p
i
e
c
e
s
e
p
a
r
a
t
e
l
y
,
o
n
t
h
e
s
a
m
e
b
a
c
k
g
r
o
u
n
d
a
s
t
h
e
o
r
i
g
i
n
a
l
p
i
c
t
u
r
e
n
o
w
m
a
d
e
t
r
a
n
s
p
a
r
e
n
t
.
*
)
;
We use the same three functions as for the original graphs and the binary masks; however here we have “p” separate templates per graph, so we need to map each of them at one deeper level:
I
n
[
]
:
=
c
u
r
v
e
M
u
l
t
i
c
l
a
s
s
M
a
s
k
s
=
B
i
n
a
r
i
z
e
/
@
#
&
/
@
(
I
m
a
g
e
R
e
s
i
z
e
[
#
,
i
m
g
S
i
z
e
]
&
/
@
#
&
/
@
M
a
p
I
n
d
e
x
e
d
[
I
m
a
g
e
P
a
d
[
#
1
,
-
c
r
o
p
A
m
o
u
n
t
〚
F
i
r
s
t
@
#
2
〛
]
&
,
R
a
s
t
e
r
i
z
e
/
@
#
&
/
@
m
u
l
t
i
c
l
a
s
s
T
e
m
p
l
a
t
e
s
,
{
2
}
]
)
;
To replace the curve pixels with the corresponding function class in every separated curve stretch, we first obtain (via ArrayRules) the position of the curve pixels (by selecting the 0-valued pixel positions). Afterwards these positions are associated with their corresponding class (family) value, what will constitute the new array rules. Doing that for every piece leaves us with as many lists of array rules (or equivalently, images) as pieces. Joining these lists will give a single array (or image), with all positions of the different stretches in it, and thus the full curve masked.
I
n
[
]
:
=
c
u
r
v
e
M
u
l
t
i
c
l
a
s
s
M
a
s
k
s
=
S
p
a
r
s
e
A
r
r
a
y
[
J
o
i
n
@
@
M
a
p
T
h
r
e
a
d
[
(
T
h
r
e
a
d
[
C
a
s
e
s
[
A
r
r
a
y
R
u
l
e
s
[
I
m
a
g
e
D
a
t
a
@
#
1
,
1
]
,
R
u
l
e
[
{
x
_
,
y
_
}
,
0
]
{
x
,
y
}
]
#
2
]
)
&
,
{
c
u
r
v
e
M
u
l
t
i
c
l
a
s
s
M
a
s
k
s
〚
#
〛
,
f
u
n
c
L
a
y
o
u
t
s
〚
#
〛
}
]
,
R
e
v
e
r
s
e
@
i
m
g
S
i
z
e
,
1
]
&
/
@
R
a
n
g
e
[
q
]
;
I
n
[
]
:
=
c
o
l
o
r
R
u
l
e
s
=
T
h
r
e
a
d
[
R
a
n
g
e
@
n
u
m
b
e
r
F
u
n
c
P
r
e
p
e
n
d
[
R
a
n
d
o
m
C
o
l
o
r
[
n
u
m
b
e
r
F
u
n
c
-
1
]
,
W
h
i
t
e
]
]
;
O
u
t
[
]
=
,
,
,
,
,
,
,
,
,
,
,
Doing this in this way solves the problems that would otherwise arise when curves overlap a bit at the breakpoints, since repeated array rules are all but the first ignored by SparseArray. One of these two curves will then just override the other on these tiny overlapping regions around the breakpoints.
Masks for the breakpoints
The next segmentation task will be an attempt to predict the breakpoints. We note it is not an usual segmentation problem and we could not try to classify this way a handful of pixels out of the tens of thousands that are in the image. Thus, in order to place more importance in those tiny areas of interest (scarce, single pixels) and prevent them from being ignored, we have artificially placed large (pixel-richer) dots on them, regardless of how they look in the original plot.
We could argue this is in reality not an actual “OCR” task, and more of an analytical one. We will see the points here are going to be detected for how they look to the eye, and though this is in principle undesirable, they come in handy and
FindFormula
will welcome them to do a better job approximating the curves.
Breakpoint masks will be created by laying the breakpoint dots after the whole graph has been made transparent (or “masked” out). So we need to to turn the curve line from the templates transparent as well:
I
n
[
]
:
=
t
r
a
n
s
p
a
r
e
n
t
G
r
a
p
h
s
=
t
e
m
p
l
a
t
e
s
/
.
{
o
p
t
i
o
n
s
:
E
x
c
e
p
t
[
_
L
i
n
e
]
.
.
,
c
u
r
v
e
s
:
_
_
L
i
n
e
}
{
o
p
t
i
o
n
s
,
O
p
a
c
i
t
y
@
.
0
,
c
u
r
v
e
s
}
;
(
*
T
h
e
c
u
r
v
e
'
s
o
p
t
i
o
n
s
c
a
n
n
o
t
b
e
a
c
c
e
s
s
e
d
v
i
a
O
p
t
i
o
n
s
,
a
n
d
a
r
e
l
o
c
a
t
e
d
t
o
g
e
t
h
e
r
w
i
t
h
t
h
e
c
u
r
v
e
p
r
i
m
i
t
i
v
e
s
t
h
e
m
s
e
l
v
e
s
.
T
h
i
s
w
o
r
k
s
b
e
c
a
u
s
e
o
u
r
g
r
a
p
h
s
h
a
v
e
n
o
o
t
h
e
r
e
x
p
r
e
s
s
i
o
n
s
w
i
t
h
h
e
a
d
L
i
n
e
,
o
t
h
e
r
w
i
s
e
w
e
'
d
n
e
e
d
t
o
b
e
m
o
r
e
s
p
e
c
i
f
i
c
o
n
w
h
e
r
e
w
e
'
r
e
d
o
i
n
g
t
h
e
r
e
p
l
a
c
e
m
e
n
t
.
*
)
;
The large dots are placed on the breakpoint locations:
I
n
[
]
:
=
b
r
e
a
k
P
o
i
n
t
M
a
s
k
s
=
S
h
o
w
[
t
r
a
n
s
p
a
r
e
n
t
G
r
a
p
h
s
〚
#
〛
,
E
p
i
l
o
g
{
D
i
r
e
c
t
i
v
e
@
P
o
i
n
t
S
i
z
e
@
0
.
0
5
,
b
r
e
a
k
P
o
i
n
t
s
〚
#
〛
}
]
&
/
@
R
a
n
g
e
[
q
]
;
And finally they go through the same steps as the curve masks above. In one go:
I
n
[
]
:
=
b
r
e
a
k
P
o
i
n
t
M
a
s
k
s
=
B
i
n
a
r
i
z
e
/
@
(
I
m
a
g
e
R
e
s
i
z
e
[
#
,
i
m
g
S
i
z
e
]
&
/
@
M
a
p
I
n
d
e
x
e
d
[
I
m
a
g
e
P
a
d
[
#
1
,
-
c
r
o
p
A
m
o
u
n
t
〚
F
i
r
s
t
@
#
2
〛
]
&
,
R
a
s
t
e
r
i
z
e
/
@
b
r
e
a
k
P
o
i
n
t
M
a
s
k
s
]
)
;
O
u
t
[
]
=
,
,
,
,
,
,
,
,
,
,
,
By having created the image masks including all the original background elements, made transparent, we ensure pixels will accurately match those they correspond to in the original image. (Callouts are not in there but are not a problem, since they’re superimposed onto the graph after the actual plot is generated.)
I
n
[
]
:
=
b
r
e
a
k
P
o
i
n
t
M
a
s
k
s
=
(
(
#
/
.
0
2
)
&
@
I
m
a
g
e
D
a
t
a
@
#
)
&
/
@
b
r
e
a
k
P
o
i
n
t
M
a
s
k
s
;
Masks for the axes tick labels
We can further mask out everything in the plots except the boxes bounding the axes’ digits, giving us an easy chance to perform what we have come to call “naive detection”, still sticking to image segmentation. This may well not be a proper “object detection” method, but it happens to give quite very good results with a reasonably small-sized network that does not require heavy training. Furthermore, it still captures a feature of proper “single-shot” detectors (as the result is also given by a single forward pass over the whole image): the network reasons about the whole image and thus objects are detected considering context. This is very important as we do not want to detect any digits that may happen to be laid out around on the image, but we want strictly just those that are tagged to the axes.
To mask the tick labels (i.e. the axes digits) we need to override the opacity and color directives in the fully-transparent graphs we made above specifically for the tick labels:
I
n
[
]
:
=
t
i
c
k
L
a
b
e
l
M
a
s
k
s
=
S
h
o
w
[
#
,
O
p
t
i
o
n
s
@
#
/
.
R
u
l
e
[
T
i
c
k
s
S
t
y
l
e
,
D
i
r
e
c
t
i
v
e
@
O
r
d
e
r
l
e
s
s
P
a
t
t
e
r
n
S
e
q
u
e
n
c
e
[
d
_
_
,
R
u
l
e
[
B
a
c
k
g
r
o
u
n
d
,
_
]
]
]
R
u
l
e
[
T
i
c
k
s
S
t
y
l
e
,
D
i
r
e
c
t
i
v
e
[
d
,
O
p
a
c
i
t
y
@
1
,
B
a
c
k
g
r
o
u
n
d
B
l
a
c
k
,
F
o
n
t
C
o
l
o
r
B
l
a
c
k
(
*
b
l
a
c
k
d
i
g
i
t
s
*
)
,
W
h
i
t
e
(
*
w
h
i
t
e
t
i
c
k
s
*
)
]
]
]
&
/
@
t
r
a
n
s
p
a
r
e
n
t
G
r
a
p
h
s
;
They undergo the same processing as the rest of masks:
I
n
[
]
:
=
t
i
c
k
L
a
b
e
l
M
a
s
k
s
=
B
i
n
a
r
i
z
e
/
@
(
I
m
a
g
e
R
e
s
i
z
e
[
#
,
i
m
g
S
i
z
e
]
&
/
@
M
a
p
I
n
d
e
x
e
d
[
I
m
a
g
e
P
a
d
[
#
1
,
-
c
r
o
p
A
m
o
u
n
t
〚
F
i
r
s
t
@
#
2
〛
]
&
,
R
a
s
t
e
r
i
z
e
/
@
t
i
c
k
L
a
b
e
l
M
a
s
k
s
]
)
;
O
u
t
[
]
=
,
,
,
,
,
,
,
,
,
,
,
I
n
[
]
:
=
t
i
c
k
L
a
b
e
l
M
a
s
k
s
=
(
(
#
/
.
0
2
)
&
@
I
m
a
g
e
D
a
t
a
@
#
)
&
/
@
t
i
c
k
L
a
b
e
l
M
a
s
k
s
;
S
e
g
m
e
n
t
a
t
i
o
n
N
e
t
w
o
r
k
s
T
r
a
i
n
i
n
g
Image Segmentation Results
Binary segmentation tasks
I
n
[
]
:
=
c
o
l
o
r
R
u
l
e
s
=
{
1
W
h
i
t
e
,
2
B
l
a
c
k
}
;
O
u
t
[
]
=
N
e
t
C
h
a
i
n
I
n
p
u
t
p
o
r
t
:
i
m
a
g
e
O
u
t
p
u
t
p
o
r
t
:
a
r
r
a
y
(
s
i
z
e
:
1
6
0
×
2
4
0
×
2
)
O
u
t
[
]
=
N
e
t
C
h
a
i
n
I
n
p
u
t
p
o
r
t
:
i
m
a
g
e
O
u
t
p
u
t
p
o
r
t
:
a
r
r
a
y
(
s
i
z
e
:
1
6
0
×
2
4
0
×
2
)
O
u
t
[
]
=
N
e
t
C
h
a
i
n
I
n
p
u
t
p
o
r
t
:
i
m
a
g
e
O
u
t
p
u
t
p
o
r
t
:
a
r
r
a
y
(
s
i
z
e
:
1
6
0
×
2
4
0
×
2
)
Two segmentation network architectures have been tried: a vanilla one form the documentation, and a UNET architecture, already implemented and generated using the code from [3]. Performance on the synthetic datasets is already very good, so we skip these and jump directly to some real-world data examples.
The UNET has performed better in all cases than the vanilla version. (The comparison is not fully fair since it has been trained for longer and in an updated dataset, but it showed more learning potential, taking more time but showing more steady learning with already quite better results on the old datasets.)
Figures in order: exam plot, prediction using a vanilla net, prediction using UNET architecture.
O
u
t
[
]
=
,
,
We thus pick the UNET as our choice for all our segmentation tasks. Some more exam examples with it for all tasks:
( Figures in order: exam plot, curve extraction, predicted mask for label bounding boxes, predicted mask for breakpoints.)
O
u
t
[
]
=
,
,
,
O
u
t
[