Title: Exploring the Effectiveness of AWS Lambda and Knative in a Serverless Web Crawler: A Comparative Study
Author: Davide Pruscini
Supervisor: Prof. Gianluigi Zavattaro
Co-supervisors: Eng. Emanuele Casadio, Dr. Matteo Trentin
Academic Year: 2022/2023
University: Alma Mater Studiorum - University of Bologna
Degree course: Computer Science
The Internet has become a key resource for accessing and sharing information. However, not all content found on it can be considered legitimate, and using tools such as web crawlers can help search for violations. In this thesis, carried out in collaboration with Kopjra, we aim to develop a web crawler application capable of automatically visiting a website, extracting URLs and indexing the HTML documents of its web pages, so as to enable keyword searches. We decided to compare two serverless implementations based on AWS Lamba and Knative, with a third microservice-based one that exploits the resources made available by Kubernetes. It is also possible to choose between two search methodologies: HTTP requests or Browser automation. To support the application, two microservices were developed, comprising the backend and frontend, as well as the deployment of an Elasticsearch cluster, which is necessary for proper ingestion of the content of web pages. Thanks to a series of tests, it is possible to compare the different implementations and understand the critical issues of each.
You can download or view the .pdf file of the thesis here. Please note that this file will never be updated.
This project is licensed under the CC BY-NC-ND 4.0 License - see the AMSLaurea page for details.
The template's skeleton was taken from jjocram/master-thesis.