Skip to content

Latest commit

 

History

History
15 lines (12 loc) · 606 Bytes

README.md

File metadata and controls

15 lines (12 loc) · 606 Bytes

crawling-cambridge-dictionary

cambridge.urls.py

  1. cmd: python cambridge.urls.py
  2. goal: used to capture all urls in https://dictionary.cambridge.org/browse/english-chinese-traditional/
  3. implementation:
    1. got guideurls by appending https://dictionary.cambridge.org/browse/english-chinese-traditional/ from a to z
    2. got extendurls by capturing urls from each guideurls

cambridge.dictionary.py

  1. cmd: python cambridge.dictionary.py
  2. goal: crawled all pages in cambridge dictionary
  3. implementation:
    1. accese extendurls and preprocessed html