WolframAlpha.com
WolframCloud.com
All Sites & Public Resources...
Products & Services
Wolfram|One
Mathematica
Wolfram|Alpha Notebook Edition
Finance Platform
System Modeler
Wolfram Player
Wolfram Engine
WolframScript
Enterprise Private Cloud
Application Server
Enterprise Mathematica
Wolfram|Alpha Appliance
Enterprise Solutions
Corporate Consulting
Technical Consulting
Wolfram|Alpha Business Solutions
Resource System
Data Repository
Neural Net Repository
Function Repository
Wolfram|Alpha
Wolfram|Alpha Pro
Problem Generator
API
Data Drop
Products for Education
Mobile Apps
Wolfram Player
Wolfram Cloud App
Wolfram|Alpha for Mobile
Wolfram|Alpha-Powered Apps
Services
Paid Project Support
Wolfram U
Summer Programs
All Products & Services »
Technologies
Wolfram Language
Revolutionary knowledge-based programming language.
Wolfram Cloud
Central infrastructure for Wolfram's cloud products & services.
Wolfram Science
Technology-enabling science of the computational universe.
Wolfram Notebooks
The preeminent environment for any technical workflows.
Wolfram Engine
Software engine implementing the Wolfram Language.
Wolfram Natural Language Understanding System
Knowledge-based broadly deployed natural language.
Wolfram Data Framework
Semantic framework for real-world data.
Wolfram Universal Deployment System
Instant deployment across cloud, desktop, mobile, and more.
Wolfram Knowledgebase
Curated computable knowledge powering Wolfram|Alpha.
All Technologies »
Solutions
Engineering, R&D
Aerospace & Defense
Chemical Engineering
Control Systems
Electrical Engineering
Image Processing
Industrial Engineering
Mechanical Engineering
Operations Research
More...
Finance, Statistics & Business Analysis
Actuarial Sciences
Bioinformatics
Data Science
Econometrics
Financial Risk Management
Statistics
More...
Education
All Solutions for Education
Trends
Machine Learning
Multiparadigm Data Science
Internet of Things
High-Performance Computing
Hackathons
Software & Web
Software Development
Authoring & Publishing
Interface Development
Web Development
Sciences
Astronomy
Biology
Chemistry
More...
All Solutions »
Learning & Support
Learning
Wolfram Language Documentation
Fast Introduction for Programmers
Wolfram U
Videos & Screencasts
Wolfram Language Introductory Book
Webinars & Training
Summer Programs
Books
Need Help?
Support FAQ
Wolfram Community
Contact Support
Premium Support
Paid Project Support
Technical Consulting
All Learning & Support »
Company
About
Company Background
Wolfram Blog
Events
Contact Us
Work with Us
Careers at Wolfram
Internships
Other Wolfram Language Jobs
Initiatives
Wolfram Foundation
MathWorld
Computer-Based Math
A New Kind of Science
Wolfram Technology for Hackathons
Student Ambassador Program
Wolfram for Startups
Demonstrations Project
Wolfram Innovator Awards
Wolfram + Raspberry Pi
Summer Programs
More...
All Company »
Search
WOLFRAM COMMUNITY
Connect with users of Wolfram technologies to learn, solve problems and share ideas
Join
Sign In
Dashboard
Groups
People
Message Boards
Answer
(
Unmark
)
Mark as an Answer
GROUPS:
Staff Picks
Data Science
Graphics and Visualization
Wolfram Language
Machine Learning
Wolfram Summer School
Computational Humanities
Wolfram Function Repository
Neural Networks
3
Diya Elizabeth
[WIS22] Emotion classification from face datasets
Diya Elizabeth, Indian Institute of Science Education and Research, Thiruvananthapuram
Posted
7 months ago
1474 Views
|
1 Reply
|
4 Total Likes
Follow this post
|
Emotion classification from face datasets
by
Diya Elizabeth
In this project, we use deep learning to make an emotion recognition convolution neural network by customizing the EfficientNet model pretrained on the ImageNe dataset. We used the FER-2013 dataset available in the Wolfram repository. Seven classes of emotions are considered in the dataset: happy, sad, angry, surprise, disgust, fear, and neutral. We try out different methods to tackle class imbalance. This model can be applied in teaching autistic children, young children in general, and people blind to facial expressions. It can also be used to train robots to interact more efficiently with humans.
Data Exploration
The data set used here is FER-2013. It has 35887 greyscale images of faces with expressions categorized into one of the seven categories: happy, sad, angry, surprise, disgust, fear, neutral. It was made under all challenging conditions, such as varying lighting, different head movements, and variances in facial features due to ethnicity, age, gender, facial hair, and glasses. This is important to not create a bias in our model.
Obtaining the data
r
o
=
R
e
s
o
u
r
c
e
O
b
j
e
c
t
[
"
F
E
R
-
2
0
1
3
"
]
d
a
t
a
=
R
e
s
o
u
r
c
e
D
a
t
a
[
r
o
]
;
O
u
t
[
]
=
R
e
s
o
u
r
c
e
O
b
j
e
c
t
N
a
m
e
:
F
E
R
-
2
0
1
3
»
T
y
p
e
:
D
a
t
a
R
e
s
o
u
r
c
e
D
e
s
c
r
i
p
t
i
o
n
:
T
h
e
F
a
c
i
a
l
E
x
p
r
e
s
s
i
o
n
R
e
c
o
g
n
i
t
i
o
n
2
0
1
3
(
F
E
R
-
2
0
1
3
)
D
a
t
a
s
e
t
Overview of the data
c
l
a
s
s
e
s
=
D
e
l
e
t
e
D
u
p
l
i
c
a
t
e
s
[
V
a
l
u
e
s
[
d
a
t
a
]
]
;
d
a
t
a
P
e
r
C
l
a
s
s
=
M
a
p
[
F
u
n
c
t
i
o
n
[
{
c
l
a
s
s
}
,
S
e
l
e
c
t
[
d
a
t
a
,
S
a
m
e
Q
[
V
a
l
u
e
s
@
#
,
c
l
a
s
s
]
&
]
]
,
c
l
a
s
s
e
s
]
;
I
n
[
]
:
=
C
o
u
n
t
s
[
V
a
l
u
e
s
[
d
a
t
a
]
]
B
a
r
C
h
a
r
t
[
C
o
u
n
t
s
[
V
a
l
u
e
s
[
d
a
t
a
]
]
,
C
h
a
r
t
L
a
b
e
l
s
-
>
K
e
y
s
@
C
o
u
n
t
s
[
V
a
l
u
e
s
[
d
a
t
a
]
]
,
C
h
a
r
t
S
t
y
l
e
-
>
"
D
a
r
k
R
a
i
n
b
o
w
"
,
P
l
o
t
L
a
b
e
l
-
>
"
D
i
s
t
r
i
b
u
t
i
o
n
o
f
c
l
a
s
s
e
s
i
n
t
h
e
d
a
t
a
s
e
t
"
,
A
x
e
s
L
a
b
e
l
-
>
{
A
u
t
o
m
a
t
i
c
,
"
N
u
m
b
e
r
o
f
i
m
a
g
e
s
"
}
]
O
u
t
[
]
=
h
a
p
p
i
n
e
s
s
8
9
8
9
,
n
e
u
t
r
a
l
6
1
9
8
,
s
a
d
n
e
s
s
6
0
7
7
,
a
n
g
e
r
4
9
5
3
,
s
u
r
p
r
i
s
e
4
0
0
2
,
d
i
s
g
u
s
t
5
4
7
,
f
e
a
r
5
1
2
1
O
u
t
[
]
=
Data distribution of classes of train and test sets
We choose 4000 random images from each class for our sample dataset. 70% of this sample dataset is chosen randomly as train dataset and the other 30% as test dataset. The function
RandomSample
is used to ensure that the data was chosen randomly to ensure an accurate representation of the original dataset in the train and test sets.
s
a
m
p
l
e
D
a
t
a
=
F
l
a
t
t
e
n
@
M
a
p
[
R
a
n
d
o
m
S
a
m
p
l
e
[
#
,
U
p
T
o
[
4
0
0
0
]
]
&
,
d
a
t
a
P
e
r
C
l
a
s
s
]
;
{
t
r
a
i
n
,
t
e
s
t
}
=
T
a
k
e
D
r
o
p
[
R
a
n
d
o
m
S
a
m
p
l
e
[
s
a
m
p
l
e
D
a
t
a
]
,
F
l
o
o
r
[
0
.
7
*
L
e
n
g
t
h
[
s
a
m
p
l
e
D
a
t
a
]
]
]
;
I
n
[
]
:
=
d
a
t
a
P
e
r
C
l
a
s
s
=
M
a
p
[
F
u
n
c
t
i
o
n
[
{
c
l
a
s
s
}
,
S
e
l
e
c
t
[
d
a
t
a
,
S
a
m
e
Q
[
V
a
l
u
e
s
@
#
,
c
l
a
s
s
]
&
]
]
,
c
l
a
s
s
e
s
]
;
s
a
m
p
l
e
D
a
t
a
=
F
l
a
t
t
e
n
@
M
a
p
[
R
a
n
d
o
m
S
a
m
p
l
e
[
#
,
U
p
T
o
[
4
0
0
0
]
]
&
,
d
a
t
a
P
e
r
C
l
a
s
s
]
;
{
t
r
a
i
n
,
t
e
s
t
}
=
T
a
k
e
D
r
o
p
[
R
a
n
d
o
m
S
a
m
p
l
e
[
s
a
m
p
l
e
D
a
t
a
]
,
F
l
o
o
r
[
0
.
7
*
L
e
n
g
t
h
[
s
a
m
p
l
e
D
a
t
a
]
]
]
;
I
n
[
]
:
=
B
a
r
C
h
a
r
t
[
C
o
u
n
t
s
[
V
a
l
u
e
s
[
s
a
m
p
l
e
D
a
t
a
]
]
,
C
h
a
r
t
L
a
b
e
l
s
-
>
K
e
y
s
@
C
o
u
n
t
s
[
V
a
l
u
e
s
[
s
a
m
p
l
e
D
a
t
a
]
]
,
C
h
a
r
t
S
t
y
l
e
-
>
"
D
a
r
k
R
a
i
n
b
o
w
"
,
P
l
o
t
L
a
b
e
l
-
>
"
D
i
s
t
r
i
b
u
t
i
o
n
o
f
c
l
a
s
s
e
s
i
n
t
h
e
s
a
m
p
l
e
d
a
t
a
"
,
A
x
e
s
L
a
b
e
l
-
>
{
A
u
t
o
m
a
t
i
c
,
"
N
u
m
b
e
r
o
f
i
m
a
g
e
s
"
}
]
B
a
r
C
h
a
r
t
[
C
o
u
n
t
s
[
V
a
l
u
e
s
[
t
r
a
i
n
]
]
,
C
h
a
r
t
L
a
b
e
l
s
-
>
K
e
y
s
@
C
o
u
n
t
s
[
V
a
l
u
e
s
[
t
r
a
i
n
]
]
,
C
h
a
r
t
S
t
y
l
e
-
>
"
D
a
r
k
R
a
i
n
b
o
w
"
,
P
l
o
t
L
a
b
e
l
-
>
"
D
i
s
t
r
i
b
u
t
i
o
n
o
f
c
l
a
s
s
e
s
i
n
t
h
e
t
r
a
i
n
s
e
t
"
,
A
x
e
s
L
a
b
e
l
-
>
{
A
u
t
o
m
a
t
i
c
,
"
N
u
m
b
e
r
o
f
i
m
a
g
e
s
"
}
]
B
a
r
C
h
a
r
t
[
C
o
u
n
t
s
[
V
a
l
u
e
s
[
t
e
s
t
]
]
,
C
h
a
r
t
L
a
b
e
l
s
-
>
K
e
y
s
@
C
o
u
n
t
s
[
V
a
l
u
e
s
[
t
e
s
t
]
]
,
C
h
a
r
t
S
t
y
l
e
-
>
"
D
a
r
k
R
a
i
n
b
o
w
"
,
P
l
o
t
L
a
b
e
l
-
>
"
D
i
s
t
r
i
b
u
t
i
o
n
o
f
c
l
a
s
s
e
s
i
n
t
h
e
t
e
s
t
s
e
t
"
,
A
x
e
s
L
a
b
e
l
-
>
{
A
u
t
o
m
a
t
i
c
,
"
N
u
m
b
e
r
o
f
i
m
a
g
e
s
"
}
]
O
u
t
[
]
=
O
u
t
[
]
=
O
u
t
[
]
=
Transfer Learning with EfficientNet
To create a neural network, rather than create a new network from scratch, which may take days or weeks to train, we can use a pretrained model as a starting point and customize it for our needs. EfficientNet trained over the dataset ‘ImageNet’ can be adapted to the ‘FER-2013’ dataset as shown below:
Creating the neural network
We start with calling EfficientNet model pretrained on the dataset ‘ImageNet’.
I
n
[
]
:
=
N
e
t
M
o
d
e
l
[
"
E
f
f
i
c
i
e
n
t
N
e
t
T
r
a
i
n
e
d
o
n
I
m
a
g
e
N
e
t
"
]
O
u
t
[
]
=
N
e
t
C
h
a
i
n
I
n
p
u
t
i
m
a
g
e
a
r
r
a
y
(
s
i
z
e
:
3
×
2
2
4
×
2
2
4
)
s
t
e
m
_
c
o
n
v
C
o
n
v
o
l
u
t
i
o
n
L
a
y
e
r
a
r
r
a
y
(
s
i
z
e
:
3
2
×
1
1
2
×
1
1
2
)
s
t
e
m
_
b
n
B
a
t
c
h
N
o
r
m
a
l
i
z
a
t
i
o
n
L
a
y
e
r
a
r
r
a
y
(
s
i
z
e
:
3
2
×
1
1
2
×
1
1
2
)
s
t
e
m
_
a
c
t
i
v
a
t
i
o
n
L
o
g
i
s
t
i
c
S
i
g
m
o
i
d
[
x
]
x
a
r
r
a
y
(
s
i
z
e
:
3
2
×
1
1
2
×
1
1
2
)
b
l
o
c
k
1
a
N
e
t
C
h
a
i
n
(
6
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
1
6
×
1
1
2
×
1
1
2
)
b
l
o
c
k
2
a
N
e
t
C
h
a
i
n
(
9
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
2
4
×
5
6
×
5
6
)
b
l
o
c
k
2
b
N
e
t
G
r
a
p
h
(
1
1
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
2
4
×
5
6
×
5
6
)
b
l
o
c
k
3
a
N
e
t
C
h
a
i
n
(
9
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
4
0
×
2
8
×
2
8
)
b
l
o
c
k
3
b
N
e
t
G
r
a
p
h
(
1
1
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
4
0
×
2
8
×
2
8
)
b
l
o
c
k
4
a
N
e
t
C
h
a
i
n
(
9
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
8
0
×
1
4
×
1
4
)
b
l
o
c
k
4
b
N
e
t
G
r
a
p
h
(
1
1
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
8
0
×
1
4
×
1
4
)
b
l
o
c
k
4
c
N
e
t
G
r
a
p
h
(
1
1
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
8
0
×
1
4
×
1
4
)
b
l
o
c
k
5
a
N
e
t
C
h
a
i
n
(
9
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
1
1
2
×
1
4
×
1
4
)
b
l
o
c
k
5
b
N
e
t
G
r
a
p
h
(
1
1
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
1
1
2
×
1
4
×
1
4
)
b
l
o
c
k
5
c
N
e
t
G
r
a
p
h
(
1
1
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
1
1
2
×
1
4
×
1
4
)
b
l
o
c
k
6
a
N
e
t
C
h
a
i
n
(
9
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
1
9
2
×
7
×
7
)
b
l
o
c
k
6
b
N
e
t
G
r
a
p
h
(
1
1
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
1
9
2
×
7
×
7
)
b
l
o
c
k
6
c
N
e
t
G
r
a
p
h
(
1
1
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
1
9
2
×
7
×
7
)
b
l
o
c
k
6
d
N
e
t
G
r
a
p
h
(
1
1
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
1
9
2
×
7
×
7
)
b
l
o
c
k
7
a
N
e
t
C
h
a
i
n
(
9
n
o
d
e
s
)
a
r
r
a
y
(
s
i
z
e
:
3
2
0
×
7
×
7
)
t
o
p
_
c
o
n
v
C
o
n
v
o
l
u
t
i
o
n
L
a
y
e
r
a
r
r
a
y
(
s
i
z
e
:
1
2
8
0
×
7
×
7
)
t
o
p
_
b
n
B
a
t
c
h
N
o
r
m
a
l
i
z
a
t
i
o
n
L
a
y
e
r
a
r
r
a
y
(
s
i
z
e
:
1
2
8
0
×
7
×
7
)
t
o
p
_
a
c
t
i
v
a
t
i
o
n
L
o
g
i
s
t
i
c
S
i
g
m
o
i
d
[
x
]
x
a
r
r
a
y
(
s
i
z
e
:
1
2
8
0
×
7
×
7
)
a
v
g
_
p
o
o
l
A
g
g
r
e
g
a
t
i
o
n
L
a
y
e
r
v
e
c
t
o
r
(
s
i
z
e
:
1
2
8
0
)
t
o
p
_
d
r
o
p
o
u
t
D
r
o
p
o
u
t
L
a
y
e
r
v
e
c
t
o
r
(
s
i
z
e
:
1
2
8
0
)
p
r
e
d
i
c
t
i
o
n
s
L
i
n
e
a
r
L
a
y
e
r
v
e
c
t
o
r
(
s
i
z
e
:
1
0
0
0
)
p
r
e
d
i
c
t
i
o
n
s
_
a
c
t
i
v
a
t
i
o
n
S
o
f
t
m
a
x
L
a
y
e
r
v
e
c
t
o
r
(
s
i
z
e
:
1
0
0
0
)
O
u
t
p
u
t
c
l
a
s
s
In order to train a new model, we can use this architecture. We remove the last Linear and Softmax layers and add our own layers as required to customize the net to our dataset.
I
n
[
]
:
=
n
e
t
=
N
e
t
T
a
k
e
[
N
e
t
M
o
d
e
l
[
"
E
f
f
i
c
i
e
n
t
N
e
t
T
r
a
i
n
e
d
o
n
I
m
a
g
e
N
e
t
"
]
,
{
1
,
-
3
}
]
O
u
t
[
]
=
N
e
t
C
h
a
i
n
I
n
p
u
t
p
o
r
t
:
i
m
a
g
e
O
u
t
p
u
t
p
o
r
t
:
v
e
c
t
o
r
(
s
i
z
e
:
1
2
8
0
)
Customizing neural network
After the last 2 layers of the model EfficientNet are removed, we construct a customized neural network by adding new layers. This allows the model to learn new features form our dataset.
I
n
[
]
:
=
n
e
w
N
e
t
=
N
e
t
C
h
a
i
n
[
<
|
"
p
r
e
t
r
a
i
n
e
d
N
e
t
"
-
>
n
e
t
,
"
l
i
n
e
a
r
1
"
-
>
L
i
n
e
a
r
L
a
y
e
r
[
6
4
]
,
"
d
p
"
-
>
D
r
o
p
o
u
t
L
a
y
e
r
[
0
.
2
]
,
"
b
n
"
-
>
B
a
t
c
h
N
o
r
m
a
l
i
z
a
t
i
o
n
L
a
y
e
r
[
]
,
"
l
i
n
e
a
r
2
"
-
>
L
i
n
e
a
r
L
a
y
e
r
[
7
]
,
"
s
o
f
t
m
a
x
"
-
>
S
o
f
t
m
a
x
L
a
y
e
r
[
]
|
>
,
"
O
u
t
p
u
t
"
-
>
N
e
t
D
e
c
o
d
e
r
[
{
"
C
l
a
s
s
"
,
c
l
a
s
s
e
s
}
]
]
O
u
t
[
]
=
N
e
t
C
h
a
i
n
u
n
i
n
i
t
i
a
l
i
z
e
d
I
n
p
u
t
p
o
r
t
:
i
m
a
g
e
O
u
t
p
u
t
p
o
r
t
:
c
l
a
s
s
The layers contribute as follows:
◼
LinearLayer: Adds complexity to the neural network.
◼
DropoutLayer: Prevents overfitting by randomly putting off the neurons while training.
◼
BatchNormalizationLayer: Normalizes its input data by learning the data mean and variance.
◼
SoftMaxLayer: Contributes the activation function layer for multiclass classification.
Handling data unbalance using different methodologies
It is evident from the graphs that the emotion class ‘disgust’ has a significantly less amount of data compared to the other classes. Most machine learning algorithms are based on the assumption that data is equally distributed among all classes in the data set and hence, when training the learning develops a bias towards the majority classes. We will look at a few different ways in which we can resolve this issue.
Undersampling
In the Undersampling method, we remove excess number of samples from the majority classes so all classes end up with about the same amount of sample data. Here, the class ‘disgust’ only has 547 samples. By taking 500 samples from each class for our sample data, each of our classes end up with the same amount of data.
I
n
[
]
:
=
s
a
m
p
l
e
D
a
t
a
1
=
F
l
a
t
t
e
n
@
M
a
p
[
R
a
n
d
o
m
S
a
m
p
l
e
[
#
,
U
p
T
o
[
5
0
0
]
]
&
,
d
a
t
a
P
e
r
C
l
a
s
s
]
;
B
a
r
C
h
a
r
t
[
C
o
u
n
t
s
[
V
a
l
u
e
s
[
s
a
m
p
l
e
D
a
t
a
1
]
]
,
C
h
a
r
t
L
a
b
e
l
s
-
>
K
e
y
s
@
C
o
u
n
t
s
[
V
a
l
u
e
s
[
s
a
m
p
l
e
D
a
t
a
1
]
]
,
C
h
a
r
t
S
t
y
l
e
-
>
"
D
a
r
k
R
a
i
n
b
o
w
"
,
P
l
o
t
L
a
b
e
l
-
>
"
D
i
s
t
r
i
b
u
t
i
o
n
o
f
c
l
a
s
s
e
s
f
o
r
t
h
e
s
a
m
p
l
e
d
a
t
a
"
,
A
x
e
s
L
a
b
e
l
-
>
{
A
u
t
o
m
a
t
i
c
,
"
N
u
m
b
e
r
o
f
i
m
a
g
e
s
"
}
]
O
u
t
[
]
=
I
n
[
]
:
=
{
t
r
a
i
n
1
,
t
e
s
t
1
}
=
T
a
k
e
D
r
o
p
[
R
a
n
d
o
m
S
a
m
p
l
e
[
s
a
m
p
l
e
D
a
t
a
1
]
,
F
l
o
o
r
[
0
.
7
*
L
e
n
g
t
h
[
s
a
m
p
l
e
D
a
t
a
1
]
]
]
;
B
a
r
C
h
a
r
t
[
C
o
u
n
t
s
[
V
a
l
u
e
s
[
t
r
a
i
n
1
]
]
,
C
h
a
r
t
L
a
b
e
l
s
-
>
K
e
y
s
@
C
o
u
n
t
s
[
V
a
l
u
e
s
[
t
r
a
i
n
1
]
]
,
C
h
a
r
t
S
t
y
l
e
-
>
"
D
a
r
k
R
a
i
n
b
o
w
"
,
P
l
o
t
L
a
b
e
l
-
>
"
D
i
s
t
r
i
b
u
t
i
o
n
o
f
c
l
a
s
s
e
s
f
o
r
t
h
e
n
e
w
t
r
a
i
n
"
,
A
x
e
s
L
a
b
e
l
-
>
{
A
u
t
o
m
a
t
i
c
,
"
N
u
m
b
e
r
o
f
i
m
a
g
e
s
"
}
]
B
a
r
C
h
a
r
t
[
C
o
u
n
t
s
[
V
a
l
u
e
s
[
t
e
s
t
1
]
]
,
C
h
a
r
t
L
a
b
e
l
s
-
>
K
e
y
s
@
C
o
u
n
t
s
[
V
a
l
u
e
s
[
t
e
s
t
1
]
]
,
C
h
a
r
t
S
t
y
l
e
-
>
"
D
a
r
k
R
a
i
n
b
o
w
"
,
P
l
o
t
L
a
b
e
l
-
>
"
D
i
s
t
r
i
b
u
t
i
o
n
o
f
c
l
a
s
s
e
s
f
o
r
t
h
e
n
e
w
t
e
s
t
"
,
A
x
e
s
L
a
b
e
l
-
>
{
A
u
t
o
m
a
t
i
c
,
"
N
u
m
b
e
r
o
f
i
m
a
g
e
s
"
}
]
O
u
t
[
]
=
O
u
t
[
]
=
Training the net
The model trains with a BatchSize 24 and 2 training rounds.
I
n
[
]
:
=
t
r
a
i
n
e
d
N
e
t
=
N
e
t
T
r
a
i
n
[
n
e
w
N
e
t
,
t
r
a
i
n
,
V
a
l
i
d
a
t
i
o
n
S
e
t
-
>
t
e
s
t
,
B
a
t
c
h
S
i
z
e
2
4
,
M
a
x
T
r
a
i
n
i
n
g
R
o
u
n
d
s
2
]
O
u
t
[
]
=
N
e
t
C
h
a
i
n
I
n
p
u
t
p
o
r
t
:
i
m
a
g
e
O
u
t
p
u
t
p
o
r
t
:
c
l
a
s
s
Net evaluation
To compare how our different methods work, we shall compute the confusion matrices, accuracies, and F1 scores of each.
1
.
Confusion matrix:
I
n
[
]
:
=
N
e
t
M
e
a
s
u
r
e
m
e
n
t
s
[
t
r
a
i
n
e
d
N
e
t
,
t
e
s
t
,
"
C
o
n
f
u
s
i
o
n
M
a
t
r
i
x
P
l
o
t
"
]
O
u
t
[
]
=
a
c
t
u
a
l
c
l
a
s
s
p
r
e
d
i
c
t
e
d
c
l
a
s
s
2
.
F1 score:
I
n
[
]
:
=
N
e
t
M
e
a
s
u
r
e
m
e
n
t
s
[
t
r
a
i
n
e
d
N
e
t
,
t
e
s
t
,
"
F
1
S
c
o
r
e
"
]
O
u
t
[
]
=
h
a
p
p
i
n
e
s
s
0
.
6
7
7
3
1
6
,
n
e
u
t
r
a
l
0
.
4
4
0
6
7
8
,
s
a
d
n
e
s
s
0
.
2
6
8
1
9
9
,
a
n
g
e
r
0
.
2
8
9
2
5
6
,
s
u
r
p
r
i
s
e
0
.
7
0
5
0
8
5
,
d
i
s
g
u
s
t
0
.
6
5
0
6
0
2
,
f
e
a
r
0
.
3
4
9
8
3
5
3
.
Accuracy:
I
n
[
]
:
=
N
e
t
M
e
a
s
u
r
e
m
e
n
t
s
[
t
r
a
i
n
e
d
N
e
t
,
t
e
s
t
,
"
A
c
c
u
r
a
c
y
"
]
O
u
t
[
]
=
0
.
4
9
4
2
8
6
Weighted Class Training
I
n
t
h
i
s
m
e
t
h
o
d
,
w
e
w
e
i
g
h
t
h
e
l
o
s
s
c
o
m
p
u
t
e
d
f
o
r
d
i
f
f
e
r
e
n
t
s
a
m
p
l
e
s
d
e
p
e
n
d
i
n
g
o
n
w
h
e
t
h
e
r
t
h
e
y
b
e
l
o
n
g
t
o
t
h
e
m
a
j
o
r
i
t
y
o
r
m
i
n
o
r
c
l
a
s
s
e
s
.
H
e
r
e
,
w
e
u
s
e
I
n
v
e
r
s
e
o
f
N
u
m
b
e
r
o
f
S
a
m
p
l
e
s
:
w
e
w
e
i
g
h
t
h
e
s
a
m
p
l
e
s
a
s
t
h
e
i
n
v
e
r
s
e
o
f
t
h
e
c
l
a
s
s
f
r
e
q
u
e
n
c
y
f
o
r
t
h
e
i
r
r
e
s
p
e
c
t
i
v
e
c
l
a
s
s
e
s
a
n
d
t
h
e
n
n
o
r
m
a
l
i
z
e
t
h
e
m
o
v
e
r
t
h
e
s
e
v
e
n
c
l
a
s
s
e
s
.
S
a
m
p
l
e
w
e
i
g
h
t
,
,
w
h
e
r
e
c
i
s
t
h
e
c
l
a
s
s
u
m
b
e
r
,
n
i
s
t
h
e
n
u
m
b
e
r
o
f
s
a
m
p
l
e
s
i
n
t
h
e
b
a
t
c
h
I
n
[
]
:
=
w
e
i
g
h
t
s
O
f
S
a
m
p
l
e
s
=
1
/
C
o
u
n
t
s
[
V
a
l
u
e
s
[
t
r
a
i
n
]
]
;
w
e
i
g
h
t
s
O
f
S
a
m
p
l
e
s
=
N
[
w
e
i
g
h
t
s
O
f
S
a
m
p
l
e
s
/
T
o
t
a
l
[
w
e
i
g
h
t
s
O
f
S
a
m
p
l
e
s
]
*
7
]
O
u
t
[
]
=
n
e
u
t
r
a
l
0
.
5
2
1
3
6
8
,
h
a
p
p
i
n
e
s
s
0
.
5
1
7
2
8
6
,
a
n
g
e
r
0
.
5
1
7
8
3
9
,
s
a
d
n
e
s
s
0
.
5
1
3
6
3
,
f
e
a
r
0
.
5
2
3
8
1
,
s
u
r
p
r
i
s
e
0
.
5
1
9
5
0
5
,
d
i
s
g
u
s
t
3
.
8
8
6
5
6
I
n
[
]
:
=
t
r
a
i
n
i
n
g
W
e
i
g
h
t
s
=
M
a
p
[
w
e
i
g
h
t
s
O
f
S
a
m
p
l
e
s
[
#
]
&
,
V
a
l
u
e
s
@
t
r
a
i
n
]
;
Customizing neural network
I
n
[
]
:
=
w
e
i
g
h
t
e
d
C
r
o
s
s
E
n
t
r
o
p
y
=
N
e
t
G
r
a
p
h
[
<
|
"
t
i
m
e
"
-
>
T
h
r
e
a
d
i
n
g
L
a
y
e
r
[
T
i
m
e
s
]
,
"
l
o
s
s
"
-
>
C
r
o
s
s
E
n
t
r
o
p
y
L
o
s
s
L
a
y
e
r
[
"
I
n
d
e
x
"
]
|
>
,
{
{
N
e
t
P
o
r
t
[
"
W
e
i
g
h
t
"
]
,
"
l
o
s
s
"
}
-
>
"
t
i
m
e
"
}
]
O
u
t
[
]
=
N
e
t
G
r
a
p
h
N
u
m
b
e
r
o
f
i
n
p
u
t
s
:
3
O
u
t
p
u
t
p
o
r
t
:
r
e
a
l
I
n
[
]
:
=
n
e
t
W
i
t
h
L
o
s
s
=
N
e
t
G
r
a
p
h
[
{
n
e
w
N
e
t
,
w
e
i
g
h
t
e
d
C
r
o
s
s
E
n
t
r
o
p
y
}
,
{
1
-
>
N
e
t
P
o
r
t
[
2
,
"
I
n
p
u
t
"
]
,
2
-
>
N
e
t
P
o
r
t
[
"
L
o
s
s
"
]
}
,
"
T
a
r
g
e
t
"
-
>
N
e
t
E
n
c
o
d
e
r
[
{
"
C
l
a
s
s
"
,
c
l
a
s
s
e
s
}
]
]
O
u
t
[
]
=
N
e
t
G
r
a
p
h
u
n
i
n
i
t
i
a
l
i
z
e
d
N
u
m
b
e
r
o
f
i
n
p
u
t
s
:
3
L
o
s
s
p
o
r
t
:
r
e
a
l
Training the net
n
e
t
T
r
a
i
n
e
d
W
i
t
h
L
o
s
s
=
N
e
t
T
r
a
i
n
[
n
e
t
W
i
t
h
L
o
s
s
,
<
|
"
W
e
i
g
h
t
"
-
>
t
r
a
i
n
i
n
g
W
e
i
g
h
t
s
,
"
I
n
p
u
t
"
-
>
K
e
y
s
@
t
r
a
i
n
,
"
T
a
r
g
e
t
"
-
>
V
a
l
u
e
s
@
t
r
a
i
n
|
>
,
V
a
l
i
d
a
t
i
o
n
S
e
t
-
>
S
c
a
l
e
d
[
0
.
3
]
,
B
a
t
c
h
S
i
z
e
2
4
,
M
a
x
T
r
a
i
n
i
n
g
R
o
u
n
d
s
2
]
;
t
r
a
i
n
e
d
N
e
t
2
=
N
e
t
R
e
p
l
a
c
e
P
a
r
t
[
N
e
t
E
x
t
r
a
c
t
[
n
e
t
T
r
a
i
n
e
d
W
i
t
h
L
o
s
s
,
1
]
,
{
"
O
u
t
p
u
t
"
-
>
N
e
t
D
e
c
o
d
e
r
[
{
"
C
l
a
s
s
"
,
c
l
a
s
s
e
s
}
]
,
"
I
n
p
u
t
"
-
>
N
e
t
E
n
c
o
d
e
r
[
{
"
I
m
a
g
e
"
,
2
2
4
}
]
}
]
O
u
t
[
]
=
N
e
t
C
h
a
i
n
I
n
p
u
t
p
o
r
t
:
i
m
a
g
e
O
u
t
p
u
t
p
o
r
t
:
c
l
a
s
s
Net Evaluation
1
.
Confusion matrix:
N
e
t
M
e
a
s
u
r
e
m
e
n
t
s
[
t
r
a
i
n
e
d
N
e
t
2
,
t
e
s
t
,
"
C
o
n
f
u
s
i
o
n
M
a
t
r
i
x
P
l
o
t
"
]
O
u
t
[
]
=
a
c
t
u
a
l
c
l
a
s
s
p
r
e
d
i
c
t
e
d
c
l
a
s
s
2
.
F1 score:
N
e
t
M
e
a
s
u
r
e
m
e
n
t
s
[
t
r
a
i
n
e
d
N
e
t
2
,
t
e
s
t
,
"
F
1
S
c
o
r
e
"
]
O
u
t
[
]
=
h
a
p
p
i
n
e
s
s
0
.
7
1
3
1
6
,
n
e
u
t
r
a
l
0
.
4
7
0
3
4
3
,
s
a
d
n
e
s
s
0
.
4
4
3
7
1
1
,
a
n
g
e
r
0
.
4
4
1
4
0
3
,
s
u
r
p
r
i
s
e
0
.
6
5
2
5
6
3
,
d
i
s
g
u
s
t
0
.
3
8
8
8
8
9
,
f
e
a
r
0
.
1
6
2
0
1
5
3
.
Accuracy:
N
e
t
M
e
a
s
u
r
e
m
e
n
t
s
[
t
r
a
i
n
e
d
N
e
t
2
,
t
e
s
t
,
"
A
c
c
u
r
a
c
y
"
]
O
u
t
[
]
=
0
.
4
9
8
9
8
2
Focal Loss Training
F
o
c
a
l
L
o
s
s
c
a
n
b
e
u
s
e
d
w
h
e
n
t
h
e
r
e
i
s
a
n
e
x
t
r
e
m
e
i
m
b
a
l
a
n
c
e
b
e
t
w
e
e
n
t
h
e
m
a
j
o
r
i
t
y
a
n
d
m
i
n
o
r
i
t
y
c
l
a
s
s
e
s
.
I
t
i
s
a
m
o
d
i
f
i
c
a
t
i
o
n
o
f
a
l
p
h
a
-
b
a
l
a
n
c
e
d
C
r
o
s
s
E
n
t
r
o
p
y
L
o
s
s
.
,
,
w
h
e
r
e
p
_
t
i
s
t
h
e
p
r
o
b
a
b
i
l
i
t
y
o
f
b
e
l
o
n
g
i
n
g
t
o
t
h
e
c
l
a
s
s
(
0
,
1
)
a
n
d
\
a
l
p
h
a
_
t
i
s
t
h
e
w
e
i
g
h
t
f
o
r
e
a
c
h
c
l
a
s
s
.
L
a
r
g
e
c
l
a
s
s
i
m
b
a
l
a
n
c
e
s
o
v
e
r
w
h
e
l
m
s
c
r
o
s
s
e
n
t
r
o
p
y
l
o
s
s
a
n
d
d
o
m
i
n
a
t
e
t
h
e
g
r
a
d
i
e
n
t
.
T
h
o
u
g
h
\
a
l
p
h
a
b
a
l
a
n
c
e
s
t
h
e
i
m
p
o
r
t
a
n
c
e
o
f
p
o
s
i
t
i
v
e
/
n
e
g
a
t
i
v
e
e
x
a
m
p
l
e
s
,
i
t
d
o
e
s
n
o
t
d
i
f
f
e
r
e
n
t
i
a
t
e
b
e
t
w
e
e
n
e
a
s
y
/
h
a
r
d
e
x
a
m
p
l
e
s
.
W
e
a
d
d
a
m
o
d
u
l
a
t
i
n
g
f
a
c
t
o
r
,
(
1
-
p
_
t
)
^
{
\
g
a
m
m
a
}
a
n
d
a
f
o
c
u
s
i
n
g
p
a
r
a
m
e
t
e
r
,
\
g
a
m
m
a
t
o
t
h
e
c
r
o
s
s
e
n
t
r
o
p
y
l
o
s
s
t
o
p
r
o
d
u
c
e
t
h
e
f
o
c
a
l
l
o
s
s
:
.
I
n
[
]
:
=
f
o
c
a
l
L
o
s
s
=
W
i
t
h
[
{
a
l
p
h
a
=
0
.
2
5
,
g
a
m
m
a
=
2
.
0
}
,
N
e
t
G
r
a
p
h
[
{
C
r
o
s
s
E
n
t
r
o
p
y
L
o
s
s
L
a
y
e
r
[
"
I
n
d
e
x
"
]
,
F
u
n
c
t
i
o
n
L
a
y
e
r
[
a
l
p
h
a
*
P
o
w
e
r
[
1
-
E
x
p
[
-
#
]
,
g
a
m
m
a
]
*
#
&
]
}
,
{
{
N
e
t
P
o
r
t
[
"
I
n
p
u
t
"
]
,
N
e
t
P
o
r
t
[
"
T
a
r
g
e
t
"
]
}
-
>
1
,
1
-
>
2
,
2
-
>
N
e
t
P
o
r
t
[
"
O
u
t
p
u
t
"
]
}
]
]
Training the net
t
r
a
i
n
e
d
N
e
t
3
=
N
e
t
T
r
a
i
n
[
n
e
w
N
e
t
,
t
r
a
i
n
,
V
a
l
i
d
a
t
i
o
n
S
e
t
-
>
S
c
a
l
e
d
[
0
.
2
]
,
B
a
t
c
h
S
i
z
e
2
4
,
L
o
s
s
F
u
n
c
t
i
o
n
f
o
c
a
l
L
o
s
s
,
M
a
x
T
r
a
i
n
i
n
g
R
o
u
n
d
s
2
]
O
u
t
[
]
=
N
e
t
C
h
a
i
n
I
n
p
u
t
p
o
r
t
:
i
m
a
g
e
O
u
t
p
u
t
p
o
r
t
:
c
l
a
s
s
Net Evaluation
1
.
Confusion matrix:
N
e
t
M
e
a
s
u
r
e
m
e
n
t
s
[
t
r
a
i
n
e
d
N
e
t
3
,
t
e
s
t
,
"
C
o
n
f
u
s
i
o
n
M
a
t
r
i
x
P
l
o
t
"
]
O
u
t
[
]
=
a
c
t
u
a
l
c
l
a
s
s
p
r
e
d
i
c
t
e
d
c
l
a
s
s
2
.
F1 score:
N
e
t
M
e
a
s
u
r
e
m
e
n
t
s
[
t
r
a
i
n
e
d
N
e
t
3
,
t
e
s
t
,
"
F
1
S
c
o
r
e
"
]
O
u
t
[
]
=
h
a
p
p
i
n
e
s
s
0
.
7
6
8
4
7
7
,
n
e
u
t
r
a
l
0
.
4
6
2
8
2
7
,
s
a
d
n
e
s
s
0
.
4
6
4
5
8
3
,
a
n
g
e
r
0
.
5
1
9
0
5
3
,
s
u
r
p
r
i
s
e
0
.
7
1
0
2
5
7
,
d
i
s
g
u
s
t
0
.
3
2
2
5
8
1
,
f
e
a
r
0
.
4
1
4
1
8
7
3
.
Accuracy:
N
e
t
M
e
a
s
u
r
e
m
e
n
t
s
[
t
r
a
i
n
e
d
N
e
t
3
,
t
e
s
t
,
"
A
c
c
u
r
a
c
y
"
]
O
u
t
[
]
=
0
.
5
5
8
3
1
6
Concluding remarks
By comparing the results from the three methods we used to address the data imbalance in the dataset, the accuracies obtained are as follow: Undersampling gives us 0.494286 accuracy, using FocalLoss gives us 0.558316 and weighted sampling, 0.498982. We can see that using FocalLoss gave us the highest accuracy and F1 scores. The weighted class model performed the next best. The accuracies could be improved in the future with more training rounds and network layers.
Keywords
◼
Undersampling
◼
Weighted sampling
◼
Focal Loss
Acknowledgment
There are many people I would like to thank for their support. I would first like to thank Siria Sadeddin for being my mentor, her immense help and guidance allowed me to learn and use many new skills which I could use to complete this project. I would also like to express my sincere gratitude to Tuseeta Banerjee and Mads Bahrami for their time and consideration, the TAs for being available whenever I was in trouble, and Aravind Hanasoge and the entire Wolfram team for conducting this program.
References
◼
Sadeddin, Siria. “Face mask detection: classifying image data.” Wolfram Community. 2021.
https://communit