Skip to content

dannyyy-jimenez/ThirdCapstone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phonetix Icon

Phonetix

Danger Reported by Machine Learning

This Project was made possible using the UrbanSound8K Dataset available on Kaggle.com

Phonetix App


This project was created using Python, JavaScript, and PostgreSQL. Hosted through AWS in an EC2 instance.

The backend uses Flask, the Mobile app is made on React Native and Expo. The Front end was programmed using React.js.


Table of Contents

  1. Exploratory Data Analysis
  • An Overview of the Data (Clean vs Live)
  1. Machine Learning
  • Convolutional Neural Network
  1. Implementation
  • Mobile App & Live Audio Feed
  1. Future Work
  • So What? How to Improve?

“The city's current three-year ShotSpotter contract is worth $33 million.” -ABC7 Chicago


Exploratory Data Analysis

An Overview of the Data (Clean vs Live)

This dataset consists of the UrbanSound8K dataset plus ~500 .wav files classified as "other" to let our model predict "Noise" when the sound picked up matches none of the other classified sounds.

In order to be able to work with the audio data, all the .wav files were converted to Mel Spectrograms, find out more about their meaning and importance through this article. In summary, Mel Spectrograms hold features of audio that are not available to humans due to the way we process sound. Using some fancy math algorithms the sound is converted to image mel spectrograms. Audio can be hard to work with but converting it to an image extracts important features and facilitates the process of classification

Live data was extracted through the app implementation by saving all recorded .wav files into the backend/temp folder and from there were classified according to what they were.

Classifications?
+ Air Conditioner
- Car Horn
+ Children Playing
- Dog Bark
- Drilling
+ Engine Idling
- Gun Shot
- Jackhammer
- Siren
+ Street Music
+ Noise


- Danger
+ Not Danger
Sample Mel Spectrogram of a Gun Shot

Mel Spectrogram

Note: The X-axis represents Time

Principal Component Analysis

Principal component analysis is a matrix dimension reduction technique that keeps 95% of the variance in the matrix but reduces in size to perform operations faster and more efficiently.

PCA

Note: Each color represents a different component (a different class)


Machine Learning

Convolutional Neural Network

  • 1,401,979 Params
  • 3 Conv Blocks
    • Max Pool
    • 2 Conv
  • Flatten
  • Dense (11 Outputs)
Image Resizing Batch Size Callbacks
200 x 200 32 LR on Plateau Early Stopping
Metric Classification Validation Accuracy
Accuracy Softmax 88.1 %

Implementation

Mobile App & Live Audio Feed

The model was deployed on the server and live audio is converted to Base64 encoded strings and sent to the server for classification and prediction. The app uses sessions, through username and phone number, to "authenticate" users. Users can then add their "emergency contacts" through the app, which are stored in a PostgreSQL database. Upon detecting danger, the model sends out real-time notifications to the endangered user's emergency contacts, these notifications are either sent as in-app notifications to registered "emergency contacts" or through SMS using Twilio's SMS API.

The landing page for the flask app is available through this link

Weblink

The app is only available through Expo Go at this point for a demo video click here

App


Future Work

So What? How to Improve?

In the end, the "so what?" save money & increase safety around the city. This project simply serves as an example of what is doable in the "bare minimum" scenario and what can be achieved in the future. Perhaps the installation of systems such as these in Ring cameras powered by Amazon or devices around the city that can help increase security and decrease expenditures citywide.

Other ways to train models

An idea that occurred to me, is training on "danger" vs "not danger" and then classifying according to that to increase the recall (false positive effect). Adding more sounds to what is considered danger as well as noise reduction on the sound thats coming in to be able to capture the good features of sound.

Thank you

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published