Skip to content

prodriguezsosa/Text-Data

Repository files navigation

Text-Data

In this repository I make available text I've scraped along with the code used to scrape and pre-process.

Available files:

1. Hugo Chavez:
Type of text: political speeches and writings
Source: http://www.todochavezenlaweb.gob.ve (last accessed Aug-03-2018)
Link to Data (56.7 MB)

2. Rafael Caldera:
Type of text: political speeches and writings
Source: http://rafaelcaldera.com (last accessed Aug-03-2018)
Link to Data (694 KB)

3. US Presidential Debates (2016):
Type of text: debate transcripts
Source: http://www.presidency.ucsb.edu/debates.php (last accessed Aug-03-2018)
Link to Data (1.18 MB)

4. Speeches by US Presidential Candidates (2016):
Type of text: political speeches
Source: http://www.presidency.ucsb.edu/2016_election.php (last accessed Aug-03-2018)
Link to Data - Trump (431.88 KB)
Link to Data - Clinton (1.24 MB)
Link to Data - Sanders (340.38 KB)

5. Spanish Legislature (V - XII):
Type of text: legislature transcripts
Source: http://www.congreso.es/portal/page/portal/Congreso/Congreso/Publicaciones (last accessed Oct-08-2018)
Link to Data (714.5 MB)

6. German Legislature (Wahlperiode 14 - 19 (10.15.2018)):
Type of text: legislature transcripts
Source: http://pdok.bundestag.de/index.php?q=plenarprotokoll&dart=Plenarprotokoll (last accessed Oct-15-2018)
Link to Data (497 MB)

7. Immigration in the Tennessean News (2010 - 2019)
Type of text: news headlines and content on the topic of immigration in the U.S.
Source: collected by the text as data team of the Research on Conflict and Collective Action (ROCCA) Lab (last updated Mar-2020)
Link to Repository and Data

About

Make available scraped text.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published