Skip to content

itisacloud/IDSTA22-Team-Heigit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IDSTA22-Team-Heigit

Proposal links

Introduction

"[E]verything is related to everything else, but near things are more related than distant things". Tobler’s first rule was abruptly brought to the at- tention of the people of Europe, and Germany in particular, as a result of Russia’s attack on Ukraine in february 2022. In addition to the mental factor of having a war not far from the own borders, the population is particularly affected by the strong dependence on Russian resources due to the geograph- ical proximity.

In order to pre-empt an escalation of political tensions within the population, it is therefore useful to create an overview of the spatial distribution of residents who support or do not support the government’s current direction and which events have had an impact on public perception. As one of the most well-known social media platforms that offers the pos- sibility to export content and geolocation of a message, Twitter represents the best opportunity to analyze the influence of socio-economic factors and geopolitical events on social attitudes towards the war in Ukraine.

Tweets

In order to retrieve tweets sent from Germany that address Russia’s attack on Ukraine, we relied on the R package academictwitteR. We were able to retrieve 106.000 tweets.

My Image

Most of the tweets were written in german. Nevertheless, we also identified tweets in russian, polish, turkish and ukrainian.

My Image

The sentiment score of the tweets is depicted below. We calculated the daily mean sentiment score.

My Image

We carried out our analysis with this dataset.

Workflow

My Image

Documentation

Requirements

In order to run this project the following dependencies must be met.

Conda

Conda needs to be installed on the machine as well as an environment that is based on the provided ENV.yml. The environment needs to be imported. To do so one can use the following command.

conda env create -n ENVNAME --file code/ENV.yml

Npm

to run the front end one does require npm and nodejs

to run the front end run the following commands

cd frontend/webapp
npm install
npm update
npm run dev

Elastic search

An elastic search connection must be provided. Please note that currently the security setting (xpack.security.enabled) must be set false in your elasticsearch.yml The Authors used the elastic search version 8.6.2.

Usage

with the following command, different functionalities can be triggered.

python ./code/Python/main.py ["preprocess","process","api","bulk"full","full+bulk"] -c ./code/default.config

Note that a valid config file needs to be provided. For this project the code/default.config is used.

Preprocess

Start the preprocessing, which cleans up the individual tweets and translates them into english.

Process

Applies the models for NER and sentiment classification and writes the results directly to the elastic search index. Note that incase the index is missing, it will be created.

Api

Launches the api using uvicorn. alternative way to run the api:

cd code/Python
uvicorn api:app --reload

Full

Executes the three steps above

Bulk

As a shortcut, the results from the steps preprocessing and processing, have been stored as a json. This command allows to load these directly into the defined Index, without the need of processing.

Full+Bulk

Trigers the bulk command and launches the api.

API Endpoint

The api offers one POST endpoint called /plot. The request requieres a body in the following schema

{
    "name":[
       "name"
    ],
    "layer":[
        "layer"
    ],
    "interval": "monthly" or "weekly" or "daily"
}    

the response constists outofthefollowing elements

{
    "timestamps": [str] (alist of iso formated date strings YYYY-MM-DD)
    "sentiment": [[float,float]] (a list of lists with two values, the first represents the share of positive labeled tweets, the second represents the share of negative labeled tweets)
    "enteties": [{str:{"n":int,"sentiment":[float]}] (a list of dictionaries with the used enteties and some meta data)
    "plot":"str" (json object used to plot the date in the frontend)
}

Models used in this Project

for the sentiment analysis the cardiffnlp/twitter-roberta-base-sentiment-latest with the huggingface sentiment pipeline was used. for the NER analysis the dslim/bert-base-NER has been implemented

we originaly started building our front end provided in the lecture.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •