Earthquake Analysis with PySpark

This project performs analysis on earthquake data using PySpark SQL and DataFrames.

Data

The data is stored in a MySQL database table named neic_earthquakes with the following schema:

CREATE TABLE IF NOT EXISTS neic_earthquakes (
    `Date` DATE,
    `Time` TIME,
    `Latitude` DECIMAL(10, 6),
    `Longitude` DECIMAL(10, 6),
    `Type` VARCHAR(255),
    `Depth` DECIMAL(10, 2),
    `Depth Error` DECIMAL(10, 2),
    `Depth Seismic Stations` INT,
    `Magnitude` DECIMAL(3, 1),
    `Magnitude Type` VARCHAR(255),
    `Magnitude Error` DECIMAL(3, 1),
    `Magnitude Seismic Stations` INT,
    `Azimuthal Gap` DECIMAL(5, 2),
    `Horizontal Distance` DECIMAL(10, 2),
    `Horizontal Error` DECIMAL(10, 2),
    `Root Mean Square` DECIMAL(5, 2),
    `ID` VARCHAR(255),
    `Source` VARCHAR(255),
    `Location Source` VARCHAR(255),
    `Magnitude Source` VARCHAR(255),
    `Status` VARCHAR(255)
    );

Requirements

To run this code, you need:

Python 3
PySpark 3.0+
pandas
MySQL Connector Python module
MySQL JAR Connector - https://dev.mysql.com/downloads/connector/j/

To install python modules, run the following command: pip install -r requirements.txt

Configuration

Important: Update the following constants in the code to match your database configuration:

host = "localhost"
user = ""
password = ""
database = "aidetic" #dbname
csv_file = "../database.csv" #filelocation
table_name = "neic_earthquakes"

Running the Code

To execute the PySpark script for this analysis:

Ensure you meet all the requirements and configuration steps above
Navigate to the project directory: /src/
To upload data from CSV to MySQL table, run the following command: python3 csv_to_mysql_upload.py
Follow these instructions to read data and execute queries:
1. To execute all queries, use the following command: spark-submit --jars ../mysql-connector-j-8.2.0/mysql-connector-j-8.2.0.jar spark_df_queries.py --all all
2. To execute a specific query, use the following command: spark-submit --jars ../mysql-connector-j-8.2.0/mysql-connector-j-8.2.0.jar spark_df_queries.py --questionNum 1
3. For Query-2, the default year is set to 2015. To execute for a different year, add the following argument to the above commands: --yearOfInterest 1995
The output results will be displayed

Questions answered through Analysis

How does the Day of the Week affect the number of earthquakes?
What is the relation between the Day of the month and the number of earthquakes that happened in a year?
What does the average frequency of earthquakes in a month from the year 1965 to 2016 tell us?
What is the relation between the Year and Number of earthquakes that happened in that year?
How has the earthquake magnitude on average been varied over the years?
How does the year impact the standard deviation of the earthquakes?
Does geographic location have anything to do with earthquakes?
Where do earthquakes occur very frequently?
What is the relation between Magnitude, Magnitude Type, Status, and Root Mean Square of the earthquakes?

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Earthquake Analysis with PySpark

Data

Requirements

Configuration

Running the Code

Questions answered through Analysis

About

Releases

Packages

Languages

bhanu-kanamarlapudi/EarthquakeAnalysis-PySpark

Folders and files

Latest commit

History

Repository files navigation

Earthquake Analysis with PySpark

Data

Requirements

Configuration

Running the Code

Questions answered through Analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages