Skip to content

Latest commit

 

History

History
149 lines (122 loc) · 11.6 KB

README.md

File metadata and controls

149 lines (122 loc) · 11.6 KB

A Frameword For Tracking Legislator's Policy Agendas

PWC

Build Status Scrutinizer Code Quality Code Intelligence Status Code Coverage Maintainability Last commit ask

Mohammad H. Forouhesh

Metodata Inc ®

April 25, 2022

This repository contains the implementation for the following paper:

Tracking Legislators’ Expressed Policy Agendas in Real Time

Table of Contents

  1. TODO
  2. A Brief Overview
    1. Introduction
    2. Main Problem
    3. Illustrative Example
    4. I/O
    5. Motivation
    6. Related Works
    7. Contributions
    8. Proposed Method
    9. Experiments
  3. Implementation details
  4. Reproducing Results

1) Tracking Legislators’ Expressed Policy Agendas in Real Time

TO-DO:

  • Summarizing the paper
  • Outlining the details of implementations
  • Implement Word2Vec
  • Training Word2Vec
  • Seed words
  • Classification heads
  • Results & Analysis
  • Tests&Coverage
  • Documentation
  • CI/CD
  • Smooth Installation

A Brief Overview

  • Introduction:

    This work aims to analyse political orientation of legislators on salient policy issues through their temporally granular tweets, using a word embedding for feature extraction, and a classifier to label all legislators’ past and current relevant tweets according to whether they express a particular issue position over time.
  • Main Problem:

    Is it possible to accurately analyse the temporal evolution of political orientation on salient issues by applying natural language processing techniques on users tweets?
    The issues of concern in this project are immigration, and climate change.
  • Illustrative Example:

    Given a tweet about immigration policy, we first encode it using word2vec enhanced dictionary, then its exclusiveness or inclusiveness can be detected using a classifier. Furthermore these results can be disaggregated to see whether it was posted from a Republican or a Democrat.
  • I/O:

    • Input: Tweets (textual modality)
    • Output: Predicted stance on the salient political issue
  • Motivation:

    1. Using tweets to track shifts in legislators’ rhetoric is highly scalable. It can be used on any topic of interest, by any political actor with a Twitter account, in any country around the world, from the past decade or into the future.
    2. Twitter data has high temporal granularity.
  • Related (Previous) Works:

    According to legislator’s different channels of communications, it is divided into 8 categories:

    1. Stump speeches: Fenno 1978
    2. Campaign mail: Golbeck, Grimes and Rogers 2010
    3. Television advertising: Lau, Sigelman and Rovner 2007
    4. Floor speeches: Martin and Vanberg 2008; Martin 2011; Quinn et al. 2010
    5. Press releases: Grimmer 2010; Grimmer, Westwood and Messing 2014; Klüver and Sagarzazu 2016
    6. Websites: Adler, Gent and Overmeyer 1998; Anstead and Chadwick 2008; Druckman, Kifer and Parkin 2009
    7. RSS feeds: Cormack 2013
    8. Social media posts: Gulati and Williams 2010; Barbera et al. 2018; Radford and Sinclair 2016; Shapiro et al. 2014; Lilleker and Koc-Michalska 2013
  • Contributions:

    1. Simple, transparent, and interpretable approach to tweet classification can achieve satisfactory levels of accuracy across diverse issues.
    2. Automate the process of updating and maintaining the model.
    3. Develop a dynamical, real-time scalable method for tracking elected officials’ expressed policy positions through their tweets.
  • Proposed Method:

    • Stage I: (Feature Extraction)
      They used Word2Vec enhanced dictionary to encode the texts. In particular, a set of stemmed seed words is identified as being relevant to the concept of interest. Then use word embeddings to identify other words that are semantically related to these seed words in the data.
    • Stage II: Classification of political stance on salient issues.
      Choice of classifier: using five-fold cross validation and comparing precision, recall, accuracy, balanced accuracy, and F1 scores to choose the best performing classifier among XGBoost, Naive Bayes, Elastic Net, Lasso.
  • Experiments:

    • Datasets:

      Their own making. Crawled all senators and the vast majority of members of the House tweets using twitter API from any period of interest up to 2020, excluding those who left office or were elected for the first time.

    • Results:

      Trained word embeddings on the entire corpus of legislators’ tweets. The word2vec dictionaries are limited to the 100 most similar words to the seed words and overly general or irrelevant terms are omitted. The detailed results provided in the appendix is summarised in the below table:

    Dataset Issue Classification Method F1-score Recall Precision Accuracy Balanced Accuracy
    Crawled Legislators' Tweets Immigration (Exclusive or Not) Naive Bayes 0.885 0.853 0.921 0.813 0.738
    XGBoost 0.871 0.909 0.836 0.795 0.668
    Elastic Net 0.881 0.967 0.809 0.801 0.615
    Lasso 0.871 0.962 0.797 0.784 0.586
    Immigration (Inclusive or Not) Naive Bayes 0.892 0.865 0.920 0.830 0.781
    XGBoost 0.888 0.916 0.861 0.828 0.746
    Elastic Net 0.890 0.978 0.817 0.821 0.674
    Lasso 0.894 0.974 0.826 0.828 0.691
    Climent Change (No Action or Not) Naive Bayes 0.889 0.874 0.904 0.827 0.742
    XGBoost 0.888 0.896 0.880 0.818 0.698
    Elastic Net 0.891 0.963 0.830 0.811 0.575
    Lasso 0.892 0.965 0.830 0.813 0.576
    Climent Change (Take Action or Not) Naive Bayes 0.687 0.742 0.640 0.758 0.746
    XGBoost 0.678 0.694 0.662 0.736 0.729
    Elastic Net 0.706 0.764 0.655 0.745 0.748
    Lasso 0.700 0.764 0.646 0.738 0.742

Implementation details:

mermaid_kroki)

Reproducing Results for XGB

Dataset Issue Classification Method F1-score Recall Precision Accuracy Balanced Accuracy
Crawled Persian Tweets JCPOA (Relevant or Not) Naive Bayes 0.845 0.901 0.792 0.843 0.839
XGBoost 0.999 0.999 0.999 0.999 0.999
Passive Aggressive 0.991 0.983 0.994 0.992 0.991
Lasso 0.988 0.985 0.983 0.984 0.987
Stock Market (Relevant or Not) Naive Bayes 0.892 0.865 0.920 0.830 0.781
XGBoost 0.999 0.999 1.000 0.999 0.999
Elastic Net 0.890 0.978 0.817 0.821 0.674
Lasso 0.894 0.974 0.826 0.828 0.691
Vaccination (Relevant or Not) Naive Bayes 0.870 0.92 0.82 0.855 0.883
XGBoost 1.000 1.000 1.000 1.000 1.000
Passive Aggressive 0.975 0.945 0.965 0.97 0.95
Lasso 0.971 0.955 0.973 0.970 0.959
Filtering (Relevant or Not) Naive Bayes 0.687 0.742 0.640 0.758 0.746
XGBoost 0.950 0.951 0.958 0.954 0.950
Elastic Net 0.706 0.764 0.655 0.745 0.748
Lasso 0.700 0.764 0.646 0.738 0.742