Analysis Framework

About

This is an attempt to create an analysis framework that is very flexible to across multiple servers.

It uses Mongodb as the database.
It uses Spark along with Anaconda as the analysis tool
It uses Celery for background workers

Prerequirement

It is only tested under Ubuntu 14.04 and probably only supports Ubuntu.

Setup

run the following steps in the top directory and accept/yes for every thing

    make setup develop

You need to restart the terminal after that

Start Celery

    celery -A poll.task.tasks worker --loglevel=info

Or you can setup it as part of the startup service

Existing tools

Besides the fact that you can access Celery task or even the database directly. Some tools are already created to simplify the life. You can use -h to see the details.

Accounts

    accounts.py add --username xyz --consumer_key 123 --consumer_secret 456 --access_token_key 789 --access_token_secret abc

Tasks

    tasks.py pull tweet --keyword Trump
    tasks.py pull timeline --screen_name realDonaldTrump

Use Spark with Anaconda

For interactive

    pyspark

To run script

    spark_submit xyz.py

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bin		bin
poll		poll
spark-source		spark-source
spark		spark
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis Framework

About

Prerequirement

Setup

Start Celery

Existing tools

Accounts

Tasks

Use Spark with Anaconda

About

Releases

Packages

Languages

License

hankshz/analysis

Folders and files

Latest commit

History

Repository files navigation

Analysis Framework

About

Prerequirement

Setup

Start Celery

Existing tools

Accounts

Tasks

Use Spark with Anaconda

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages