Skip to content

Web scraper is created as Azure Function, which collect data from web site (scrape it), parse it to model and store in database. In this project we will scrape data about beaches in Croatia.

Notifications You must be signed in to change notification settings

bpenovic/beacher-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web scraper

Web scraper is created as Azure Function, which collect data from web site (scrape it), parse it to model and store in database. In this project we will scrape data about beaches in Croatia.

Azure functions

For testing purposes, functions are implemented as HttpTriggered functions but for production will be used as Time triggered functions (every month).

Limitations enforced by the Azure Web Apps platform

Limit name Description Free/Shared/Consumption Limit Basic+ Limit
Threads (Number of threads) 512 Unlimited (VM limit still applies)
Processes (Number of processes) 32 Unlimited (VM limit still applies)
Connections (Number of bound sockets outstanding) 300 Unlimited (VM limit still applies)
Named Pipes (Number of named pipes) 128 128
Listen Sockets (Number of listen sockets) 256 256

Database

Microsoft SQL Database is used for database storage with ORM (objet-orijented mapping) principle (Entity Framework, Code - first). In some cases we use raw SQL because we have to update more then 1000 rows in database. For optimization of connections and queries we make update with Temporary tables.

ScraperFunction

ScraperFunction is main project which contains:

  • Containers - container builders
  • Functions - Azure functions
    • GetMarkers - get all markers and store it to DB
    • GetQuality - get quality marker / markers and store it to DB
    • GetDetails - get details of marker (type, vegetation, wind...)
  • Modules - module entities for dependency injection
  • json settings

ScraperLib

All Database manipulation is implemented inside ScraperLib. ScraperLib contains:

  • DAL - Database access layer
  • Models - Database models
  • DomainServices - services which manipulate with database (using Entity Framework)
    • Interfaces - interfaces of domain services
  • DomainModels - models for domain services
    • ParseModels - models in process of scraping, for parsing Xml

About

Web scraper is created as Azure Function, which collect data from web site (scrape it), parse it to model and store in database. In this project we will scrape data about beaches in Croatia.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages