This project implements a Long Short-Term Memory (LSTM) model, converted to TensorFlow Lite (TFLite) format, to predict future temperature values based on real-time data received over a TCP connection. Using a sliding window approach, the model uses past temperature data to generate future predictions. The results are then saved in a CSV file.
-
Data Collection: The script listens for incoming temperature data via a TCP socket connection. Each node sends temperature readings and labels in the format
node_name;temperature_label;temperature_value
. -
Preprocessing:
- Each node's temperature data is normalized using a MinMaxScaler.
- Data is stored in sliding windows containing a fixed number of past temperature readings (20 time steps).
-
LSTM Model:
- A pre-trained LSTM model is loaded to perform predictions in TFLite format.
- For each node, once enough data is collected, the model predicts the next temperature value based on the sliding window of past temperatures.
-
Prediction Output:
- The predicted temperature is unscaled and saved to a CSV file along with the actual temperature, label, node ID, and timestamp.
- The model used is an LSTM network trained on historical temperature data, which is then converted to TFLite format for optimized inference on edge devices or limited-resource environments.
# Example of how the LSTM model might be structured
model = Sequential()
model.add(LSTM(128, input_shape=(20,1)))
model.add(Dense(1)) # Output layer for predicting the temperature
To run the script, you will need the following dependencies installed:
- Python 3.x
- NumPy for numerical operations (
numpy==1.26.3
) - Pandas for data manipulation (
pandas==2.1.4
) - h5py for handling HDF5 files (
h5py==3.10.0
) - ujson for fast JSON processing (
ujson==5.4.0
) - TensorFlow for TFLite model inference (
tensorflow==2.16.2
) - Scikit-learn for data preprocessing (
scikit-learn==1.2.2
)
You can install all dependencies with:
pip install numpy==1.26.3 pandas==2.1.4 h5py==3.10.0 ujson==5.4.0 tensorflow==2.16.2 scikit-learn==1.2.2
This script is designed to run in a containerized environment alongside another container responsible for reading, sending data, and creating the TCP server.
The pre-built Docker image is available at pit836/lstm:latest
, and the docker-compose.yml file can spin up the container that performs these operations.
All prediction results are saved in a CSV file (temperature_predictions.csv
), located in the shared volume dbFiles
, mapped to the directory: /dbFiles:/usr/src/app/dist/dbFiles
.
Ensure the LSTM model in TFLite format (lstm.tflite
) is in the working directory. The script will load this model using the TensorFlow Lite interpreter.
The script establishes a TCP connection to receive real-time data from nodes.
- Set the correct
TCP_IP
andTCP_PORT
in the script. - The incoming data should be in the format:
node_name;temperature_label;temperature_value
To start the script, run:
python temperature_prediction.py
This will:
- Continuously listen for data from the TCP server.
- Perform predictions using the LSTM model whenever enough data is collected for a node.
- Save the predictions and other relevant data into a CSV file.
The incoming data is expected to follow this format:
NODE_01;T1;22.5
NODE_02;T0;19.8
...
The predicted temperatures are saved in a CSV file (temperature_predictions.csv
) with the following columns:
- ID: Unique identifier for each node's prediction.
- Node_Name: Name of the node that sent the data.
- Temperature_Actual: Actual temperature value received.
- Temperature_Future: Predicted future temperature.
- Label: Temperature label (e.g., "T_actual").
- Timestamp: The timestamp of the prediction.
Example of output in CSV format:
ID, Node_Name, Temperature_Actual, Temperature_Future, Label, Timestamp
1, NODE_01, 22.5, 23.1, T1, 1701234567
2, NODE_02, 19.8, 20.3, T0, 1701234570
By default, the predictions are saved in the following path:
/usr/src/app/dist/dbFiles/temperature_predictions.csv
- Model Training: The accuracy of the predictions depends on the quality of the pre-trained LSTM model. The current script only performs inference with a TFLite model and does not retrain it.
- Real-Time Data: The script assumes continuous real-time data input from multiple nodes. If data transmission stops, the script will wait for new data.
- TCP Connection: The TCP server and client setup must be configured to ensure data is sent and received correctly.
The container works with a usage of a maximum of 20% of CPU and 260 MB of memory.
The predictions are quite precise, with a RMSE of ????
This project is licensed under the MIT License.
Feel free to open an issue or submit a pull request if you'd like to contribute to the project.