Streaming news world wide

We will get a big amount of streaming data from the NewsAPI. We will store this requested data into a Kafka cluster through a Java spring boot application. After we will process this data with Spark and store in HBase and Hive.

Structure

collector: Get information from APIs and introduce this data into Kafka Cluster

consumer: Receive information and process via Spark streaming and save it into Hive & HBase

start.sh : Script to start project

test.sh: Script to run tests in both projects

scripts.sh: Scripts to manage kafka and stop servers

config.txt: Configuration to apply in the project (query to run, time intervals...)

Run project

Get into the folder and:

./test.sh ./run.sh

Documentation v1 (Updated) : Google slides Documentation v2 : Google doc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Streaming news world wide

Structure

Run project

Files

README.md

Latest commit

History

README.md

File metadata and controls

Streaming news world wide

Structure

Run project