30 Jul 07:05

echen102

cae6175

Release v2.6

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 7/24/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Chen E, Lerman K, Ferrara E
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
JMIR Public Health Surveillance 2020;6(2):e19273
DOI: 10.2196/19273
PMID: 32427106

Statistics Summary (v2.6)

Number of Tweets : 360,594,376

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	239,831,253	66.51%
Spanish	es	45,999,640	12.76%
Portuguese	pt	14,204,685	3.94%
Indonesian	in	9,695,719	2.69%
Undefined	und	8,955,350	2.48%
French	fr	7,679,924	2.13%
Japanese	ja	5,956,904	1.65%
Thai	th	4,293,378	1.19%
Hindi	hi	3,887,356	1.08%
Italian	it	3,208,807	0.89%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

20 Jul 07:50

echen102

v2.5

19d98df

Release v2.5

This release contains Tweet IDs collected from 1/21/20 - 7/17/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Chen E, Lerman K, Ferrara E
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
JMIR Public Health Surveill 2020;6(2):e19273
DOI: 10.2196/19273
PMID: 32427106

Statistics Summary (v2.5)

Number of Tweets : 330,683,492

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	219,846,790	66.48%
Spanish	es	41,951,902	12.69%
Portuguese	pt	12,771,927	3.86%
Indonesian	in	8,912,870	2.7%
Undefined	und	8,127,946	2.46%
French	fr	7,225,169	2.18%
Japanese	ja	5,629,746	1.7%
Thai	th	4,093,084	1.24%
Hindi	hi	3,517,176	1.06%
Italian	it	3,018,193	0.91%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

13 Jul 08:56

echen102

v2.4

6bf645f

Release v2.4

This release contains Tweet IDs collected from 1/21/20 - 7/10/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Statistics Summary (v2.4)

Number of Tweets : 302,377,492

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	199,789,377	66.07%
Spanish	es	38,477,687	12.73%
Portuguese	pt	11,865,812	3.92%
Indonesian	in	8,480,320	2.8%
Undefined	und	7,354,640	2.43%
French	fr	6,826,740	2.26%
Japanese	ja	5,327,080	1.76%
Thai	th	3,633,528	1.2%
Hindi	hi	3,213,590	1.06%
Turkish	tr	2,839,437	0.94%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

06 Jul 07:15

echen102

v2.3

1bfc7e4

Release v2.3

We have migrated our data collection to AWS, with upgraded computation and network specifications. This has enabled us to collect significantly more Tweets every hour, and the number of Tweet-IDs we will be uploading each week from release v2.0 onward will be greater than the number of Tweet-IDs we have been able to collect in previous releases. Please see our notes section in the README for further details.

This release contains Tweet IDs collected from 1/21/20 - 7/03/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Statistics Summary (v2.3)

Number of Tweets : 272,346,129

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	178,749,702	65.63%
Spanish	es	34,749,653	12.76%
Portuguese	pt	10,497,310	3.85%
Indonesian	in	7,936,066	2.91%
Undefined	und	6,531,125	2.4%
French	fr	6,368,869	2.34%
Japanese	ja	5,018,438	1.84%
Thai	th	3,508,870	1.29%
Hindi	hi	2,971,606	1.09%
Turkish	tr	2,690,310	0.99%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

29 Jun 07:34

echen102

v2.2

1677c05

Release v2.2

This release contains Tweet IDs collected from 1/21/20 - 6/26/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Statistics Summary (v2.2)

Number of Tweets : 242,400,994

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	157,266,557	64.88%
Spanish	es	31,245,760	12.89%
Portuguese	pt	9,418,332	3.89%
Indonesian	in	7,379,473	3.04%
French	fr	5,970,986	2.46%
Undefined	und	5,726,108	2.36%
Japanese	ja	4,706,208	1.94%
Thai	th	3,348,912	1.38%
Hindi	hi	2,656,781	1.1%
Turkish	tr	2,509,649	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

23 Jun 12:05

echen102

v2.1

94ad279

Release v2.1

This release contains Tweet IDs collected from 1/21/20 - 6/19/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Statistics Summary (v2.1)

Number of Tweets : 212,978,935

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	137,344,992	64.48%
Spanish	es	27,035,278	12.69%
Portuguese	pt	8,193,574	3.85%
Indonesian	in	6,777,050	3.18%
French	fr	5,504,403	2.58%
(undefined)	und	5,003,877	2.35%
Japanese	ja	4,384,617	2.06%
Thai	th	3,266,392	1.53%
Hindi	hi	2,349,801	1.10%
Italian	it	2,291,748	1.08%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

15 Jun 08:09

echen102

v2.0

5ff1736

Release v2.0

This release contains Tweet IDs collected from 1/21/20 - 6/12/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Statistics Summary (v2.0)

Number of Tweets : 183,011,739

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	117,868,338	64.40%
Spanish	es	22,395,793	12.24%
Portuguese	pt	6,900,098	3.77%
Indonesian	in	6,124,708	3.35%
French	fr	4,917,804	2.69%
(undefined)	und	4,242,198	2.32%
Japanese	ja	4,061,424	2.22%
Thai	th	3,154,030	1.72%
Italian	it	2,089,938	1.14%
Hindi	hi	2,008,659	1.10%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

08 Jun 11:05

echen102

v1.12

f62cbfc

Release v1.12

This release contains Tweet IDs collected from 1/21/20 - 6/05/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Statistics Summary (v1.12)

Number of Tweets : 152, 862,137

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	99,753,283	65.26%
Spanish	es	17,678,687	11.57%
Indonesian	in	5,133,446	3.36%
Portuguese	pt	4,850,362	3.17%
French	fr	4,393,918	2.87%
Japanese	ja	3,670,726	2.40%
(undefined)	und	3,451,912	2.26%
Thai	th	2,991,427	1.96%
Italian	it	1,849,528	1.21%
Turkish	tr	1,577,658	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

01 Jun 09:56

echen102

v1.11

cc8143b

Release v1.11

This release contains Tweet IDs collected from 1/21/20 - 5/29/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Statistics Summary (v1.11)

Number of Tweets : 144,747,801

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	94,369,403	65.20%
Spanish	es	16,588,272	11.46%
Indonesian	in	4,914,741	3.40%
Portuguese	pt	4,522,335	3.12%
French	fr	4,241,157	2.93%
Japanese	ja	3,537,748	2.44%
(undefined)	und	3,279,442	2.27%
Thai	th	2,924,431	2.02%
Italian	it	1,782,514	1.23%
Turkish	tr	1,507,370	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

25 May 07:58

echen102

v1.10

282976b

Release v1.10

This release contains Tweet IDs collected from 1/21/20 - 5/22/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.10)

Number of Tweets : 137,339,309

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	89,446,467	65.13%
Spanish	es	15,651,897	11.40%
Indonesian	in	4,703,023	3.42%
Portuguese	pt	4,228,772	3.08%
French	fr	4,100,588	2.99%
Japanese	ja	3,365,432	2.45%
(undefined)	und	3,094,598	2.25%
Thai	th	2,887,217	2.10%
Italian	it	1,726,624	1.26%
Turkish	tr	1,452,652	1.06%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

Releases: echen102/COVID-19-TweetIDs

Release v2.6

Data Usage Agreement

Statistics Summary (v2.6)

Known Gaps

Inquiries

Release v2.5

Data Usage Agreement

Statistics Summary (v2.5)

Known Gaps

Inquiries

Release v2.4

Data Usage Agreement

Statistics Summary (v2.4)

Known Gaps

Inquiries

Release v2.3

Data Usage Agreement

Statistics Summary (v2.3)

Known Gaps

Inquiries

Release v2.2

Data Usage Agreement

Statistics Summary (v2.2)

Known Gaps

Inquiries

Release v2.1

Data Usage Agreement

Statistics Summary (v2.1)

Known Gaps

Inquiries

Release v2.0

Data Usage Agreement

Statistics Summary (v2.0)

Known Gaps

Inquiries

Release v1.12

Data Usage Agreement

Statistics Summary (v1.12)

Known Gaps

Inquiries

Release v1.11

Data Usage Agreement

Statistics Summary (v1.11)

Known Gaps

Inquiries

Release v1.10

Data Usage Agreement

Statistics Summary (v1.10)

Known Gaps

Inquiries