This Project was made possible using the UrbanSound8K Dataset available on Kaggle.com
This project was created using Python, JavaScript, and PostgreSQL. Hosted through AWS in an EC2 instance.
The backend uses Flask, the Mobile app is made on React Native and Expo. The Front end was programmed using React.js.
- Exploratory Data Analysis
- An Overview of the Data (Clean vs Live)
- Machine Learning
- Convolutional Neural Network
- Implementation
- Mobile App & Live Audio Feed
- Future Work
- So What? How to Improve?
“The city's current three-year ShotSpotter contract is worth $33 million.” -ABC7 Chicago
This dataset consists of the UrbanSound8K dataset plus ~500 .wav files classified as "other" to let our model predict "Noise" when the sound picked up matches none of the other classified sounds.
In order to be able to work with the audio data, all the .wav files were converted to Mel Spectrograms, find out more about their meaning and importance through this article. In summary, Mel Spectrograms hold features of audio that are not available to humans due to the way we process sound. Using some fancy math algorithms the sound is converted to image mel spectrograms. Audio can be hard to work with but converting it to an image extracts important features and facilitates the process of classification
Live data was extracted through the app implementation by saving all recorded .wav files into the backend/temp folder and from there were classified according to what they were.
+ Air Conditioner
- Car Horn
+ Children Playing
- Dog Bark
- Drilling
+ Engine Idling
- Gun Shot
- Jackhammer
- Siren
+ Street Music
+ Noise
- Danger
+ Not Danger
Note: The X-axis represents Time
Principal component analysis is a matrix dimension reduction technique that keeps 95% of the variance in the matrix but reduces in size to perform operations faster and more efficiently.
Note: Each color represents a different component (a different class)
- 1,401,979 Params
- 3 Conv Blocks
- Max Pool
- 2 Conv
- Flatten
- Dense (11 Outputs)
Image Resizing | Batch Size | Callbacks |
---|---|---|
200 x 200 | 32 | LR on Plateau Early Stopping |
Metric | Classification | Validation Accuracy |
Accuracy | Softmax | 88.1 % |
The model was deployed on the server and live audio is converted to Base64 encoded strings and sent to the server for classification and prediction. The app uses sessions, through username and phone number, to "authenticate" users. Users can then add their "emergency contacts" through the app, which are stored in a PostgreSQL database. Upon detecting danger, the model sends out real-time notifications to the endangered user's emergency contacts, these notifications are either sent as in-app notifications to registered "emergency contacts" or through SMS using Twilio's SMS API.
The landing page for the flask app is available through this link
The app is only available through Expo Go at this point for a demo video click here
In the end, the "so what?" save money & increase safety around the city. This project simply serves as an example of what is doable in the "bare minimum" scenario and what can be achieved in the future. Perhaps the installation of systems such as these in Ring cameras powered by Amazon or devices around the city that can help increase security and decrease expenditures citywide.
Other ways to train models
An idea that occurred to me, is training on "danger" vs "not danger" and then classifying according to that to increase the recall (false positive effect). Adding more sounds to what is considered danger as well as noise reduction on the sound thats coming in to be able to capture the good features of sound.