Web scraper is created as Azure Function, which collect data from web site (scrape it), parse it to model and store in database. In this project we will scrape data about beaches in Croatia.
For testing purposes, functions are implemented as HttpTriggered functions but for production will be used as Time triggered functions (every month).
Limitations enforced by the Azure Web Apps platform
Limit name | Description | Free/Shared/Consumption Limit Basic+ Limit |
---|---|---|
Threads (Number of threads) | 512 | Unlimited (VM limit still applies) |
Processes (Number of processes) | 32 | Unlimited (VM limit still applies) |
Connections (Number of bound sockets outstanding) | 300 | Unlimited (VM limit still applies) |
Named Pipes (Number of named pipes) | 128 | 128 |
Listen Sockets (Number of listen sockets) | 256 | 256 |
Microsoft SQL Database is used for database storage with ORM (objet-orijented mapping) principle (Entity Framework, Code - first). In some cases we use raw SQL because we have to update more then 1000 rows in database. For optimization of connections and queries we make update with Temporary tables
.
ScraperFunction is main project which contains:
- Containers - container builders
- Functions - Azure functions
- GetMarkers - get all markers and store it to DB
- GetQuality - get quality marker / markers and store it to DB
- GetDetails - get details of marker (type, vegetation, wind...)
- Modules - module entities for dependency injection
- json settings
All Database manipulation is implemented inside ScraperLib. ScraperLib contains:
- DAL - Database access layer
- Models - Database models
- DomainServices - services which manipulate with database (using Entity Framework)
- Interfaces - interfaces of domain services
- DomainModels - models for domain services
- ParseModels - models in process of scraping, for parsing Xml