A human-in-the-loop approach to tracing visual pattern propagation in art history using deep computer vision methods.
Basic information
- Student name: Ludovica Schaerf
- Supervisors: Paul Guhennec (PhD), Frederic Kaplan (Prof.)
- Year: 2021-2022
About: The project Replica, about six years ago, paved the way to computational studies of visual patterns in art history. Simultaneously, it created the possibility for art historians to trace the propagation of patterns throughout the history of art. During the project, an image retrieval network was set up to discover artistic patterns given an input image. Despite successfully serving monographic needs and targeted search attempts, the network does not propose spontaneous discoveries. In this thesis, we eliminate the middle man of the input image, creating clusters of artworks sharing a common pattern propagation. The clusters are integrated further with a 2D coordinate-based visualisation, which provides an organic view of the evolution of the patterns in art history.
In this effort, we demonstrate the effectiveness of fine-tuning deep learning models on a set of visual connections using a compound Hinge loss and ResNeXt architecture. Moreover, we show that clustering the trained visual signatures with OPTICS yields remarkable precision. We emphasise the importance of the semi-supervised learning of the clusters, proving the qualitative and quantitative improvement over generic clustering methods. Furthermore, we close the loop of the semi-supervised clustering through the annotation of the new findings in the clusters proposed, and retraining thereof. In total, we add over 700 new images to the set of slightly over 1800 existing visual connections. We find, in addition, examples of cross-domain, architectural, design and sketch based patterns, which were previously outside the scope of the known visual connections.
Research summary: The base architecture
Among the top competitors of ILSVRC, this paper uses experiments with different architectures and their pre-trained weights as starting points for the retrieval model: ResNet, ResNeXt, and EfficientNet (\cite{resnet, xie_aggregated_2017, tan_efficientnet_2020}). The saved architectures and their weights are downloaded and imported using the \texttt{torchvision} module and the models are then fine-tuned on the morphograph.
Based on the results by \cite{seguin_visual_2016, babenko2014neural}, we use the architectures mentioned above until their last convolutional layer. We include a mean global pooling layer as:
Including the two additions to the Hinge loss, final loss becomes:
include a brief summary of your approaches/implementations and an illlustration of your results.
Installation and Usage
- dependencies: platform, libraries (for Python include a
requirements.txt
file) - compilation (if necessary)
- usage: how to run your code
License
We encourage you to choose an open license (e.g. AGPL, GPL, LGPL or MIT).
License files are already available in GH (add new file, start typing license, choices will appear).
You can also add the following at the end of your README:
semi-supervised-clustering-of-visual-signatures-of-artworks - Ludovica Schaerf
Copyright (c) 2022 EPFL
This program is licensed under the terms of the [license].