Tweet Archives Unleashed Toolkit (twut)

An open-source toolkit for analyzing line-oriented JSON data from the Twitter v1.1 API or flattened line-oriented JSON data from the Twitter v2 API using Apache Spark.

Dependencies

Java 8 or 11
Python 3
Apache Spark

Getting Started

To get started with twut, you can either use it directly from Maven or download the JAR and ZIP files for Spark or PySpark.

Using the Spark Shell

To use twut with Apache Spark, you can use the following command to include the package:

$ spark-shell --packages "io.archivesunleashed:twut:1.1.0"

Alternatively, you can download the JAR file from the latest release and include it manually:

$ spark-shell --jars /path/to/twut-1.1.0-fatjar.jar

Using PySpark

For Python users, download the ZIP file from the latest release and include it in your PySpark environment:

$ pyspark --py-files /path/to/twut-1.1.0.zip

You will also need to set the PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables.

Documentation and Tutorials

After you have twut built or downloaded, you can follow the basic set of recipes and tutorials here.

License

Licensed under the Apache License, Version 2.0.

Acknowledgments

This work is primarily supported by the Andrew W. Mellon Foundation. Other financial and in-kind support comes from the Social Sciences and Humanities Research Council, Compute Canada, the Ontario Ministry of Research, Innovation, and Science, York University Libraries, Start Smart Labs, and the Faculty of Arts and David R. Cheriton School of Computer Science at the University of Waterloo.

Any opinions, findings, and conclusions or recommendations expressed are those of the researchers and do not necessarily reflect the views of the sponsors.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
config		config
docs		docs
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweet Archives Unleashed Toolkit (twut)

Dependencies

Getting Started

Using the Spark Shell

Using PySpark

Documentation and Tutorials

License

Acknowledgments

About

Releases 4

Packages

Contributors 3

Languages

License

archivesunleashed/twut

Folders and files

Latest commit

History

Repository files navigation

Tweet Archives Unleashed Toolkit (twut)

Dependencies

Getting Started

Using the Spark Shell

Using PySpark

Documentation and Tutorials

License

Acknowledgments

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 3

Languages

Packages