-
car-price-predictor Public
Predicting Car Prices with FastAPI, Streamlit, MLflow, Kafka, and Debezium: A Practical Demonstration
-
-
open-source-data Public
This repository contains structured datasets in various categories
-
llm-rag-mtl-public-hospital Public
Ce projet développe un modèle de type Retrieve-Augment-Generate (RAG) pour répondre aux questions en utilisant les données publiques des avis laissés sur Google pour des hôpitaux à Montréal
Python UpdatedMay 5, 2024 -
-
-Google-Analytics-360 Public
Welcome to the Google Analytics 360 Dataset Project! This repository is designed for anyone interested in working with realistic Google Analytics data. Whether you're a data scientist, a student, o…
Python UpdatedMar 30, 2024 -
To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a music streaming platform, let’s delve into the detailed wor…
-
eventmusic Public
EventMusic Producer is a Dockerized application designed to read data and output them to a Kafka topic, using Avro schemas for data serialization. It integrates seamlessly with Kafka and the Schema…
-
we are thrilled to announce our new PoC project aimed at providing a complete real-time extraction, transformation, and exposure architecture for the new provincial transportation systems.
Python Other UpdatedFeb 11, 2024 -
-
MongoElasticMigrator Public
This tool migrates data from MongoDB collections to Elasticsearch indices. It's built using Rust and supports configurable migrations.
-
build_api_devops_pipeline Public
-
Scalable-RSS-Feed-Pipeline Public
In this article, we'll walk through how to build a scalable ETL pipeline using Apache Airflow, Kafka, and Python, Mongo and Flask
-
investissement Public
Jenkins Delta pipeline
-
ModernDataEngineerPipeline Public
Building a Robust Data Pipeline: Integrating Proxy Rotation, Kafka, MongoDB, Redis, Logstash, Elasticsearch, and MinIO for Efficient Web Scraping
-
-
docSearch Public
Our project is a testament to this need, offering a comprehensive solution that combines modern technologies and architectures to create a powerful document search engine. This engine is not just a…
-
myUberEats_dataPipeline Public
Building a Modern Uber Eats Data Pipeline
-
Dynamic Snake Game: Unleashing Real-Time Streaming Analytics with Redis, Kafka, Flink, ClickHouse & Chart.js in an Online Snake Game via Flask API
-
Gmail-to-MongoDB-Script Public
This script facilitates the automation of fetching emails from a user's Gmail account and storing them into a MongoDB database. The emails fetched are filtered by specific labels such as Promotions…
-
realtime-race-mapper Public
In this rendition, Elastic and Kibana have been replaced with the powerful Splunk, MQTT has been swapped out for ActiveMQ, and instead of the traditional Kafka, we’ve integrated Confluent Cloud.
HCL UpdatedOct 22, 2023 -
-
terraform_snowflake_devops Public
Develop a scalable and secure data infrastructure, Integrate diverse data sources into Snowflake.
-
-
-
-
The objective of this guide is to demonstrate how to automate the deployment of a data pipeline on AWS using Terraform. The pipeline will utilize AWS services such as Lambda, Glue, Crawler, Redshif…
-
real-time flight status data pipeline using a myriad of technologies such as Kafka, Schema Registry, Avro, GraphQL, Postgres, and React.
-
In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO, and Apache Spark.
-
modern-data-pipeline Public
reating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI, and Dash.