20 Jul 01:01

echen102

a8a4c83

Release v2.56

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 7/16/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Chen E, Lerman K, Ferrara E
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
JMIR Public Health Surveillance 2020;6(2):e19273
DOI: 10.2196/19273
PMID: 32427106

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.56)

Number of Tweets : 1,698,506,096

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,074,502,603	63.26%
Spanish	es	215,097,101	12.66%
Portuguese	pt	75,746,301	4.46%
Indonesian	in	53,157,517	3.13%
French	fr	51,269,978	3.02%
Undefined	und	49,221,181	2.9%
German	de	30,264,473	1.78%
Thai	th	22,399,321	1.32%
Japanese	ja	21,047,039	1.24%
Italian	it	17,600,675	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

12 Jul 07:28

echen102

v2.55

8394c08

Release v2.55

This release contains Tweet IDs collected from 1/21/20 - 7/09/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.55)

Number of Tweets : 1,673,691,837

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,060,601,683	63.37%
Spanish	es	212,620,980	12.7%
Portuguese	pt	74,910,293	4.48%
Indonesian	in	50,697,844	3.03%
French	fr	50,158,951	3.0%
Undefined	und	48,311,608	2.89%
German	de	29,901,379	1.79%
Thai	th	21,234,821	1.27%
Japanese	ja	20,680,545	1.24%
Italian	it	17,403,232	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

05 Jul 10:21

echen102

v2.54

1377561

Release v2.54

This release contains Tweet IDs collected from 1/21/20 - 7/02/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.54)

Number of Tweets : 1,652,112,536

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,049,296,324	63.51%
Spanish	es	210,408,040	12.74%
Portuguese	pt	74,145,636	4.49%
French	fr	49,379,199	2.99%
Indonesian	in	48,288,476	2.92%
Undefined	und	47,610,031	2.88%
German	de	29,508,481	1.79%
Japanese	ja	20,339,194	1.23%
Thai	th	19,744,242	1.2%
Italian	it	17,247,577	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

01 Jul 02:00

echen102

v2.53

8096824

Release v2.53

This release contains Tweet IDs collected from 1/21/20 - 6/25/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.53)

Number of Tweets : 1,632,037,714

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,038,482,659	63.63%
Spanish	es	208,010,406	12.75%
Portuguese	pt	73,259,088	4.49%
French	fr	48,713,679	2.98%
Undefined	und	46,948,970	2.88%
Indonesian	in	46,544,400	2.85%
German	de	29,137,611	1.79%
Japanese	ja	19,976,024	1.22%
Thai	th	18,818,996	1.15%
Italian	it	17,095,745	1.05%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

21 Jun 23:34

echen102

v2.52

478b379

Release v2.52

This release contains Tweet IDs collected from 1/21/20 - 6/18/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.52)

Number of Tweets : 1,612,559,607

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,027,932,409	63.75%
Spanish	es	205,595,890	12.75%
Portuguese	pt	71,917,777	4.46%
French	fr	48,182,508	2.99%
Undefined	und	46,339,053	2.87%
Indonesian	in	44,972,264	2.79%
German	de	28,780,793	1.78%
Japanese	ja	19,578,553	1.21%
Thai	th	18,406,162	1.14%
Italian	it	16,940,217	1.05%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

14 Jun 08:40

echen102

v2.51

b5762e7

Release v2.51

This release contains Tweet IDs collected from 1/21/20 - 6/11/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.51)

Number of Tweets : 1,593,329,695

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,016,959,798	63.83%
Spanish	es	203,182,024	12.75%
Portuguese	pt	70,726,299	4.44%
French	fr	47,560,508	2.98%
Undefined	und	45,752,969	2.87%
Indonesian	in	43,955,936	2.76%
German	de	28,437,410	1.78%
Japanese	ja	19,186,485	1.2%
Thai	th	18,112,986	1.14%
Italian	it	16,749,711	1.05%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

07 Jun 19:38

echen102

v2.50

c7136b9

Release v2.50

This release contains Tweet IDs collected from 1/21/20 - 6/04/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.50)

Number of Tweets : 1,573,278,114

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,005,467,073	63.91%
Spanish	es	200,735,189	12.76%
Portuguese	pt	69,251,496	4.4%
French	fr	47,025,565	2.99%
Undefined	und	45,088,419	2.87%
Indonesian	in	43,124,801	2.74%
German	de	28,082,045	1.78%
Japanese	ja	18,807,048	1.2%
Thai	th	17,641,009	1.12%
Italian	it	16,565,048	1.05%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

31 May 20:36

echen102

v2.49

82a3840

Release v2.49

This release contains Tweet IDs collected from 1/21/20 - 5/28/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.49)

Number of Tweets : 1,549,449,110

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	992,110,473	64.03%
Spanish	es	197,848,137	12.77%
Portuguese	pt	67,787,853	4.37%
French	fr	46,489,632	3.0%
Undefined	und	44,339,531	2.86%
Indonesian	in	41,172,308	2.66%
German	de	27,703,802	1.79%
Japanese	ja	18,440,812	1.19%
Thai	th	17,251,109	1.11%
Italian	it	16,392,297	1.06%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

27 May 19:52

echen102

v2.48

b19d909

Release v2.48

This release contains Tweet IDs collected from 1/21/20 - 5/21/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.48)

Number of Tweets : 1,524,052,424

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	977,355,555	64.13%
Spanish	es	195,156,559	12.81%
Portuguese	pt	66,429,223	4.36%
French	fr	45,945,372	3.01%
Undefined	und	43,590,031	2.86%
Indonesian	in	39,106,175	2.57%
German	de	27,289,190	1.79%
Japanese	ja	18,055,774	1.18%
Thai	th	16,678,101	1.09%
Italian	it	16,221,023	1.06%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

17 May 21:12

echen102

v2.47

4c91a2a

Release v2.47

This release contains Tweet IDs collected from 1/21/20 - 5/14/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.47)

Number of Tweets : 1,497,893,426

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	961,556,746	64.19%
Spanish	es	192,648,188	12.86%
Portuguese	pt	65,220,068	4.35%
French	fr	45,338,516	3.03%
Undefined	und	42,809,465	2.86%
Indonesian	in	37,303,016	2.49%
German	de	26,875,910	1.79%
Japanese	ja	17,706,197	1.18%
Thai	th	16,112,255	1.08%
Italian	it	16,056,421	1.07%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

Releases: echen102/COVID-19-TweetIDs

Release v2.56

Data Usage Agreement / How to Cite

Statistics Summary (v2.56)

Known Gaps

Inquiries

Release v2.55

Data Usage Agreement / How to Cite

Statistics Summary (v2.55)

Known Gaps

Inquiries

Release v2.54

Data Usage Agreement / How to Cite

Statistics Summary (v2.54)

Known Gaps

Inquiries

Release v2.53

Data Usage Agreement / How to Cite

Statistics Summary (v2.53)

Known Gaps

Inquiries

Release v2.52

Data Usage Agreement / How to Cite

Statistics Summary (v2.52)

Known Gaps

Inquiries

Release v2.51

Data Usage Agreement / How to Cite

Statistics Summary (v2.51)

Known Gaps

Inquiries

Release v2.50

Data Usage Agreement / How to Cite

Statistics Summary (v2.50)

Known Gaps

Inquiries

Release v2.49

Data Usage Agreement / How to Cite

Statistics Summary (v2.49)

Known Gaps

Inquiries

Release v2.48

Data Usage Agreement / How to Cite

Statistics Summary (v2.48)

Known Gaps

Inquiries

Release v2.47

Data Usage Agreement / How to Cite

Statistics Summary (v2.47)

Known Gaps

Inquiries