Skip to content

Scraping-to-labelling pipeline. Pull tweets based on location keyword, process, and store in database. Can then query database and obtain tweets for labelling purposes.

Notifications You must be signed in to change notification settings

bdawton/twitter_scraping_labelling

Repository files navigation

twitter_scraping_labelling

Pulls tweets based on location keyword, and create a labelling application to label them as tourism-relevant or not!

Current status:

  • Pull Tweets using API based on timeframe and keywords
  • Prefilter tweets (only keep Japanese language tweets, or English tweets explicitly mentioning tourist activities. Remove emojis and mentions)
  • Load into database (SQLite or PostgreSQL)
  • Stratified time-based sampling for labelling
  • Working on: Labelling process, and then relevance model (different repo)

About

Scraping-to-labelling pipeline. Pull tweets based on location keyword, process, and store in database. Can then query database and obtain tweets for labelling purposes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published