Skip to content

Open Data of the Early AI-supported Response with Social Listening data (or EARS) for the World Health Organization (WHO)

Notifications You must be signed in to change notification settings

citibeats-labs/who-ears

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

WHO Early AI-supported Response with Social Listening data

The WHO Early AI-supported Response with Social Listening Platform shows real-time information about how people are talking about COVID-19 online, so we can better manage as the infodemic and pandemic evolve.

More information about the initiative and an exploration of the data can be found at WHOinfodemic.citibeats.com.

More information about the methodology, data and definitions can be found at WHOinfodemic.citibeats.com/methodology.

The platform is powered by Citibeats, a text analytics platform specialized in social understanding. More information can be on the methodology page, or found at www.citibeats.com.

Intended use

Listening to people's questions and concerns is an important way for health authorities to learn about what matters to communities in response to COVID-19. This social listening platform aims to show real-time information about how people are talking about COVID-19 online, so we can better manage as the COVID-19 infodemic and pandemic evolve.

While the Early AI-supported Response with Social Listening Platform website facilitates ready made visualizations for easy exploration of the data, the aggregated and anonymized data is made available for anyone wishing to integrate the data into their own research, or integrate the data into their existing workflows.

This is available via the public API as well as this GitHub repository.

Any use of the data should use the citation 'World Health Organization, Early AI-supported Response with Social Listening'.

We welcome you sharing with us how you are using the data, as well as feedback on how to improve the platform for your needs.

Origin of the data

The data has been obtained from public posts related to COVID-19, using the Twitter API and data aggregators of public sources such as forums, message boards, blogs and comments in news. Please keep in mind that this is only a sample of all COVID-19 conversations.

The data is updated each day with new posts.

NOTE: all data is subject to quality, technical, and ethical requirements before being added to the system.

More information about the data can be read here.

How to interpret the data

Since the civic situation is constantly evolving and social needs are wide-ranging, there is a need for real-time data to serve the decisions of key actors. At the same time, this data must be treated carefully, as the data query and categories are subject to change along with the conversation. This repository will be kept up-to-date with revised data, and changes noted here. Note that, as new sources are added to the system, records from earlier dates could change.

Every analysis based on Internet data has to deal with representativity. Not everyone is connected to the Internet, and not everyone shares their opinion.

Further information on the methodology, data and definitions can be read here.

CSV columns:

  • id: code of the country as defined by ISO 3166-1 alpha-3
  • date: day in which the raw data was published in YYYY-MM-DD format (UTC timezone)
  • name: name of the country in English
  • docs-N: number of documents for the category N on a given day
  • docs-delta-percent-N: variation of documents (%) for the category N respect the previous day
  • docs-percent-N: percent of documents in the category N respect all the other categories in that day
  • docs-female-N: number of documents for the category N on a given day, which are estimated to be from females
  • docs-male-N: number of documents for the category N on a given day, which are estimated to be from males
  • docs-questions-N: number of documents for the category N on a given day, which were questions
  • docs-complaints-N: number of documents for the category N on a given day, which were complaints

NOTE: each row in the CSV corresponds to one day of data for a specific country.

Category definitions

Level 1 Level 2 Definition

The cause

How did the virus emerge and how is it spreading?

The cause of the virus Narratives about the origin of SARS-CoV-2.
Stigma about the spread Stigma on people who are thought of spreading the virus: racist expressions, attribution to poor people or immigrants.
Stigma about or by infected people Stigma expressed about or by infected people or have been infected.

The illness

What are the symptoms and how is it transmitted?

Confirmed symptoms Confirmed symptoms as defined by WHO, excluding longer-term symptoms.
Other discussed symptoms Other discussed symptoms that have not yet been confirmed by WHO.
Prolonged symptoms Reports on long covid that may or may not be confirmed by WHO.
Modes of transmission Modes of transmission confirmed and unconfirmed by WHO. This includes discussion of asymptomatic and pre-symptomatic transmission as well as possible ways the virus can be transmitted (for example, aerosols and fomites).
Transmission settings Narratives about settings where transmission can be amplified: closed and semi-closed settings.
Immunity General conversations on re-infection, confusion over immunity after infection or the possibility of being infected more than once.
COVID 19 Variants Narratives and concerns about about the development, spread and impact of new COVID 19 Variants.
Demographic vulnerability & risks Vulnerable and risk groups:
  • elderly
  • individuals with health conditions like lung or heart disease, diabetes or conditions that affect their immune system
  • pregnant women
Impact on mental health Anxiety, depression and other affections derived from the pandemic situation

The treatment

How can it be treated or cured?

Current treatment Medical treatment as per WHO treatment recommendations
COVID-19 vaccine Narratives about the vaccine itself: efficacy, side effects, safety, etc.
Health care workers (HCW) and vaccine Narratives by and about health care workers and vaccine
General vaccine discussion Narratives about vaccines in general, including discussion about others or communities that have different opinions about vaccines; can include any vaccine concerns, not just COVID-19
Science and R&D Comments on new treatment and vaccines from research and development and evidence and scientific processes
Science and R&D Comments on new treatment and vaccines from research and development and evidence and scientific processes
Non proven treatments Discussion about treatments that are not proven to be effective (examples: sunlight, nutrition, herbal remedies, etc)
Myths Specific myths that WHO and partners have reacted to taken steps to debunk reference

The interventions

What is being done by government and health authorities and societal institutions?

Testing Any discussion about tests – everything from reliability, to access to tests, types of tests, requirement to have tests, etc.
Contact tracing Any discussion about the process, requirements and steps involved in contact tracing, use of technology
Supportive care Care given to patients in hospitals by medical personnel
Vaccine distribution and policies on access Narratives about distribution, equity, access to COVID-19 vaccine
Personal measures Individual protection measures recommended by governments/WHO such as wearing masks, handwashing, social distance, isolation when ill...
Measures in public settings Measures implemented by governments in public settings: schools, workplaces, public transport...
Travel measures Measures implemented or suggested by governments/WHO/population/private companies on travel: immunity passports, negative PCR or negative rapid test to enter a country, mandatory quarantine
Immunity pass Vaccine certificates, immunity / health passports, digital and hard copy, including implications for access to businesses, schools, and other services.
Reduction of movement Measures implemented by governments related to movement reduction: lock-down at home, territory lock-down, etc.
Protection: medical equipment Equipment for health workers: PPE advances and accessibility for public.
Health Technology Health technology used to treat patients: medicines, medical devices, vaccines, procedures and systems
Digital health technology Discussions about digital technology used to respond to pandemic: electronic data exchange, electronic notices of passenger lists to health authorities, biometric data coming from wearables, proximity apps (App Covid). Includes people’s attitudes to data privacy, or for modelling and predictive analytics.
Pandemic Fatigue Fatigue from interventions (lock-down, movement restrictions, masks…)
Faith Narratives about faith and religion and COVID-19 (these narratives are recurring, usually around the time of religious holidays and outbreaks in faith based settings)
Industry Narratives about industry, unions and COVID-19
Environment Narratives about the environment and COVID-19 – some examples: shading in environment, waste water, air pollution as a secondary byproduct of lockdowns
Inequalities & Human Rights Narratives about social inequalities and relation to COVID-19
Civil Unrest Narratives about civil unrest and COVID-19
Youth Narratives about youth, effects of pandemic on them, or actions youth is taking

Type of information

What types of information are most engaging?

Statistics & data Conversations about facts, official statistics and data
Misinformation Conversations about misinformation
Mis- and Disinformation Conversations about mis- and disinformation
Sources & influencers Conversations about where people look for information

Updates in the taxonomy of categories

May 2021

  • New category “Variants” added
  • Added keywords in boolean query for Twitter and web comments with variant and vaccine-related keywords.

August 2021

  • Added new categories: Prolonged symptoms, Immunity, Immunity Pass,
  • Merged the following categories:
    • Modes of Transmission, Asymptomatic transmission and pre-symptomatic transmission.
  • Expanded the scope of the following category:
    • Inequalities to include Human Rights

Contact

[email protected]

About

Open Data of the Early AI-supported Response with Social Listening data (or EARS) for the World Health Organization (WHO)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published