Phonetix

Danger Reported by Machine Learning

This Project was made possible using the UrbanSound8K Dataset available on Kaggle.com

Phonetix Demo Video

Slideshow Presentation

This project was created using Python, JavaScript, and PostgreSQL. Hosted through AWS in an EC2 instance.

The backend uses Flask, the Mobile app is made on React Native and Expo. The Front end was programmed using React.js.

Exploratory Data Analysis

An Overview of the Data (Clean vs Live)

This dataset consists of the UrbanSound8K dataset plus ~500 .wav files classified as "other" to let our model predict "Noise" when the sound picked up matches none of the other classified sounds.

In order to be able to work with the audio data, all the .wav files were converted to Mel Spectrograms, find out more about their meaning and importance through this article. In summary, Mel Spectrograms hold features of audio that are not available to humans due to the way we process sound. Using some fancy math algorithms the sound is converted to image mel spectrograms. Audio can be hard to work with but converting it to an image extracts important features and facilitates the process of classification

Live data was extracted through the app implementation by saving all recorded .wav files into the backend/temp folder and from there were classified according to what they were.

Classifications?

+ Air Conditioner
- Car Horn
+ Children Playing
- Dog Bark
- Drilling
+ Engine Idling
- Gun Shot
- Jackhammer
- Siren
+ Street Music
+ Noise


- Danger
+ Not Danger

Sample Mel Spectrogram of a Gun Shot

Note: The X-axis represents Time

Principal Component Analysis

Principal component analysis is a matrix dimension reduction technique that keeps 95% of the variance in the matrix but reduces in size to perform operations faster and more efficiently.

Note: Each color represents a different component (a different class)

Machine Learning

Convolutional Neural Network

1,401,979 Params
3 Conv Blocks
- Max Pool
- 2 Conv
Flatten
Dense (11 Outputs)

Image Resizing	Batch Size	Callbacks
200 x 200	32	LR on Plateau Early Stopping
Metric	Classification	Validation Accuracy
Accuracy	Softmax	88.1 %

Implementation

Mobile App & Live Audio Feed

The model was deployed on the server and live audio is converted to Base64 encoded strings and sent to the server for classification and prediction. The app uses sessions, through username and phone number, to "authenticate" users. Users can then add their "emergency contacts" through the app, which are stored in a PostgreSQL database. Upon detecting danger, the model sends out real-time notifications to the endangered user's emergency contacts, these notifications are either sent as in-app notifications to registered "emergency contacts" or through SMS using Twilio's SMS API.

The landing page for the flask app is available through this link

The app is only available through Expo Go at this point for a demo video click here

Future Work

So What? How to Improve?

In the end, the "so what?" save money & increase safety around the city. This project simply serves as an example of what is doable in the "bare minimum" scenario and what can be achieved in the future. Perhaps the installation of systems such as these in Ring cameras powered by Amazon or devices around the city that can help increase security and decrease expenditures citywide.

Other ways to train models

An idea that occurred to me, is training on "danger" vs "not danger" and then classifying according to that to increase the recall (false positive effect). Adding more sounds to what is considered danger as well as noise reduction on the sound thats coming in to be able to capture the good features of sound.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
backend		backend
frontend		frontend
hybrid		hybrid
plots		plots
src		src
.gitignore		.gitignore
PROPOSALS.md		PROPOSALS.md
README.md		README.md
deploy_frontend.sh		deploy_frontend.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phonetix

Danger Reported by Machine Learning

This Project was made possible using the UrbanSound8K Dataset available on Kaggle.com

Phonetix Demo Video

Slideshow Presentation

This project was created using Python, JavaScript, and PostgreSQL. Hosted through AWS in an EC2 instance.

Table of Contents

Exploratory Data Analysis

An Overview of the Data (Clean vs Live)

Classifications?

Sample Mel Spectrogram of a Gun Shot

Principal Component Analysis

Machine Learning

Convolutional Neural Network

Implementation

Mobile App & Live Audio Feed

Future Work

So What? How to Improve?

Thank you

[email protected]

dannyyy-jimenez on GitHub

Connect with me on LinkedIn

About

Releases

Packages

Languages

dannyyy-jimenez/ThirdCapstone

Folders and files

Latest commit

History

Repository files navigation

Phonetix

Danger Reported by Machine Learning

This Project was made possible using the UrbanSound8K Dataset available on Kaggle.com

Phonetix Demo Video

Slideshow Presentation

This project was created using Python, JavaScript, and PostgreSQL. Hosted through AWS in an EC2 instance.

Table of Contents

Exploratory Data Analysis

An Overview of the Data (Clean vs Live)

Classifications?

Sample Mel Spectrogram of a Gun Shot

Principal Component Analysis

Machine Learning

Convolutional Neural Network

Implementation

Mobile App & Live Audio Feed

Future Work

So What? How to Improve?

Thank you

[email protected]

dannyyy-jimenez on GitHub

Connect with me on LinkedIn

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages