Skip to content

Speech recognition using Wit.Ai, Nordic nrf52, ST microphone, PDM/PCM, Serial communication and Python. Humidity sensor data notified via Bluetooth Low Energy.

Notifications You must be signed in to change notification settings

TIT8/BLE-sensor_PDM-microphone

Repository files navigation

Speech Recognition with Wit.AI 👂 + 🎤

Given that the Arduino Nano 33 BLE Sense is built on Mbed OS and features an analog-to-digital frontend from a microphone via a PDM-to-PCM chain to memory through DMA transfers (see the nrf52840 documentation), I've used the default RTOS thread to capture microphone input and send audio samples to a Python receiver for processing. The receiver will respond to voice commands by controlling the light in my bedroom, turning it on and off accordingly.

All of this while also sending data to connected BLE devices (using another RTOS thread). 🚀

Humidity Sensor via BLE 📻

I'm beginning to grasp concepts like RSSI and advertising, services and characteristics, default and custom UUIDs, GAP and GATT, as well as peripheral/server and central/client roles in Bluetooth Low Energy, all through the lens of the Arduino Nano 33 BLE Sense.

Here, I'm utilizing the Notify feature of BLE, reminiscent of pub-sub protocols like MQTT. Referencing the Arduino documentation and the specifications outlined for the Environmental Sensing Service 📡 can provide further insight.

Every time the humidity change at least of 1%, new data are written on the BLE Characteristic and different devices can access the values.

What better way to learn something new than with Arduino? 💪

Mbed OS Structure

Two threads are employed: one for BLE and another for PDM. The PDM thread is assigned the highest priority and exclusively utilizes the Serial, preventing access from the BLE thread (though this is unnecessary for proper operation).

I've opted to run the PDM task within the main loop provided by the Arduino framework (at osPriorityNormal), while the BLE task operates on a separate thread at osPriorityBelowNormal. Leveraging the CMSIS-RTOS abstraction API (RTX is the kernel) provided by Mbed enables priority configuration. Additionally, before the main loop restarts, control is yielded to the scheduler to ensure proper execution (the default RTOS tick frequency is 1kHz).

Internally, the PDM library employs a circular buffer, which is repeatedly copied into the user's sample buffer, subsequently copied into the Serial buffer (Arduino-Mbed's Serial API utilizes the Async Writer class, ensuring that each Serial.write call saves the buffer dynamically into a separate memory space, mitigating concerns about timing discrepancies between microphone ADC and Serial writing, see here for more).
I decided not to process incoming data from the ADC/DMA on the Arduino Nano (explained here), but perhaps I can try it in the future. This would reduce the load on the CPU that receives data over Serial since serial communication would be less utilized, but it would increase the CPU usage on the nrf52 and stress the constrained SRAM.

A Python script is employed to receive and process the samples, generating an audio file, interfacing with Wit.AI, and controlling the bedroom light as demonstrated in a previous project upon detection of specific keywords. Also, take a look at the Go program, with its astonishingly simple concurrent model. I've achieved significant efficiency improvement (0.5% of average CPU load) using it.

References

Here, you can find my initial reference for using Mbed OS with Arduino, and here is the forum where you can ask questions (this link will actually take you to an issue that I found interesting to illustrate the simplicity of Mbed OS).

If you don't know what an operating system does, this video is for you.

Requirements

How to test the BLE task?

Utilize nRF Connect on a smartphone, a Python script on PC/MAC, or an ESP32 if available.
And, by the way, no, I'm not sending float directly, but remember this discussion and this example in the future.

Pros and cons of speech recognition task

➕ This approach is working very well: collecting audio samples, sending them to a serial device that manages the connection with the Wit.Ai API and takes action based on the transcribed text. The Wit.Ai AI performs incredibly well, being trained by Meta, and you could also train it to better recognize your keywords. So it's very reliable.

➖/➕ However, latency remains the primary concern. For instance, when I say "accendi luce," it takes 1-2 seconds before the light turns on (although, without errors, it consistently does turn on! 🦾). Nevertheless, from my testing, it executes actions so swiftly that I hesitate to consider it slow. As with any system dealing with human-time-asynchronous events/inputs, there's little that can be done to further reduce the action time.

➖ Data privacy is a concern; hence, the Python receiver I developed sends data to the Wit.Ai API only when the captured audio reaches a sufficiently high volume (as intended, I'm talking to the Arduino with a certain level which I don't use with my brother in the room normally), rather than through continuous streaming.

The two minus can be improved in two ways: if you desire low latency and quick responses to voice input, you need to move the inference/transcribing part onto the device, either the host of the serial connection or the Arduino Nano 33 BLE Sense.

  1. The best alternative I've achieved with success is hosted on Github, thanks to Edge Impulse for rapid prototyping. The speech recognition process run on the nrf52840. However, upon reviewing their documentation and the generated code, I realize I can learn more (and perhaps borrow 🤪) about offline voice recognition. This is a completely different approach, offering more privacy and faster responses, but achieving reliability requires a significant amount of time to train the model for inference, and the results are slightly inferior to "Wit.Ai + Python."

  2. An intermediate solution could be to maintain the speech recognition on external hardware instead of the nRF52840, utilizing an offline speech recognition engine such as PocketSphinx (tested and not working, I'm still a beginner) or Vosk (although it's a bit tricky to get started with, this demo can be helpful, though it hasn't proven entirely reliable in my tests). But I think that transcriptor as good as Wit.Ai are difficult to find for this application, at least for the italian language. And there is a big BUT if you want to use this solution with my Python-receiver code. Actually, this solution can increase latency if you don't have proper hardware.

  3. You can also train your neural network with Tensorflow to recognize specific word and let it work on the external hardware. Tensorflow has a good starting point with pre-trained models. Now (4 months after trying only point 1 and 2) I understand better the world of classification problems and this solution can be more effective than 2.

So, life is full of trade-offs. We're lucky to have more than one solution 🤥.

Goals 😎

  • Direct use of the official nRF SDK.
  • Learn more about Mbed OS (coming from FreeRTOS, I'm flabbergasted by how good it is) ❤️
  • Attempting to replicate functionality on ESP32 via ESP-IDF BLE library.   [DONE ✔️]
  • Offline speech recognition via a pre-trained machine learning model using Tensorflow (TinyML).   [Started 👷]

About

Speech recognition using Wit.Ai, Nordic nrf52, ST microphone, PDM/PCM, Serial communication and Python. Humidity sensor data notified via Bluetooth Low Energy.

Topics

Resources

Stars

Watchers

Forks