Skip to content

Commit

Permalink
Fix: update CA to brute scrape
Browse files Browse the repository at this point in the history
  • Loading branch information
kurtismassey committed Aug 23, 2024
1 parent 3127f8f commit aafd926
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 4 deletions.
1 change: 1 addition & 0 deletions caddy_scraper/caddy_scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ def recursive_crawler(self) -> List[str]:
links (List[str]): a list of links found on the pages.
"""
bs_transformer = BeautifulSoupTransformer()
cookie_dict = None
if "advisernet" in self.base_url:
cookie_dict = {
"Cookie": f".CitizensAdviceLogin={os.getenv(
Expand Down
9 changes: 5 additions & 4 deletions caddy_scraper/scrape_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@
{
"base_url": "https://www.citizensadvice.org.uk/",
"sitemap_url": "https://www.citizensadvice.org.uk/sitemap.xml",
"crawling_method": "sitemap",
"crawling_method": "brute",
"scrape_depth": 4,
"output_dir": "citizensadvice_scrape"
},
{
"base_url": "https://www.citizensadvice.org.uk/advisernet",
"crawling_method": "brute",
"output_dir": "advisernet_scrape",
"scrape_depth": 4
"crawling_method": "brute",
"scrape_depth": 4,
"output_dir": "advisernet_scrape"
},
{
"base_url": "https://www.gov.uk/",
Expand Down

0 comments on commit aafd926

Please sign in to comment.