Background
Instagram has become one of the most populated online community for artwork sharing. Aside from personal artist accounts, there are also many artwork reposting accounts that play an important part in introducing instagram users to the world of contemporary art. Thus, it makes sense to investigate how the aesthetic value of the images influence their social media popularity.
Data
The dataset is made of around 10,000 images from 8 popular contemporary art reposting accounts, downloaded through an open source instagram php scraper. The metadata for each images, the number of comments and the number of likes, were also downloaded.
Classification
Two extractors were used for feature extraction. The first one was composed of content and style layers extracted from the "AdaIN-Style Trained on MS-COCO and Painter by Numbers Data" network, respectively, and combined with a pooling layer. For comparison, the second one was hand-defined using the Block function and several keys used in image processing. Then, the two sets of features were classified using several functions, including FindClusters, Dendrogram, and ClusteringTree. By mapping the classified features back to the relevant images, different sets of clusters of images were obtained. Visually, the AdaIn extractor combined with the FindClusters function did a better job in classifying artworks according to the composition of color and lines. The visualization was also done using the FeatureSpacePlot function.
Popularity Analysis
To find the correlation between image features and metadata, plots with one like-count dimension and one comment-count dimension were made. Before that, the metadata were normalized by dividing them by the number of followers for each account, as there had been observed a linear pattern between the number of followers and the number of comments/likes. Near the x-axis, there appears a long stripe of images, resulted from the fact that many images have many likes but no comments. After studying the comment number distribution of 3 specific accounts, it was determined that the likes were produced by bots. After filtering out the posts of these 3 accounts, there are still many obvious horizontal stripes appearing near the y-axis. After the analysis of the plots of each individual accounts, it was clear that these were caused by the normalization process.
Results
Neural Network provides a clear classification of images with respect to their colour and structural compositions, while there are still many deviations. The like-comment plots show that though the linear relation between like count and comment count is quite clear, there isn't a obvious pattern between image features and like count/comment count, as in different areas on the plots, the visual features of the images appear to be random. The most popular ones according to the two parameters also don't seem to have much similarity.
Further Investigation
It could be possible that generally, features don't determine images' social popularity. Thus, it would be meaningful to investigate the relations between popularity and other metadata, such as caption length.