🇧🇷 Versão em Português aqui.
This repository contains an application designed for processing CNPJ data (the Brazilian equivalent of a business tax identification number). It's built using the Laravel framework for PHP and utilizes Docker for easy setup and deployment. The application handles large CSV files, processes them, and stores the data in a MySQL/PostgreSQL database for further analysis.
The download of the Receita Federal data files can be done here - last updated in 2024-05-15.
- Process large CSV files with CNPJ data.
- Store processed data in a MySQL/PostgreSQL database.
- Redis integration for performance optimization.
- Nginx as a reverse proxy for the web server.
- Containerized setup with Docker and Docker Compose.
/cnoj-extractor
│
├── /docker
│ ├── docker-compose.yml
│ ├── Dockerfile.app
│ └── /nginx
│ └── default.conf
│
│── /src
│ ├── /app
│ ├── .env.example
│ ├── ...
│
│── /data
/docker
- Docker configuration files./src
- Laravel application source code./data
- Receita Federal data zip files.
Before you begin, ensure you have met the following requirements:
- Docker and Docker Compose installed on your machine.
- Basic knowledge of Laravel, Docker, and PostgreSQL.
To set up the project for development, follow these steps:
- Clone the repository:
git clone https://github.com/jeffersonsalvador/cnpj-extractor.git
cd cnpj-extractor
- Navigate to the docker directory and start the services:
cd docker
make up
This will build and run the following services:
app
: The Laravel application.postgres
: The PostgreSQL database.redis
: The Redis server.
Once the containers are up and running, you can:
Access the application via http://localhost:8080.- Connect to the database using the credentials provided in the .env file.
- Monitor the Redis instance on port 6379.
To process CNPJ data:
- Place your CSV files in the designated directory (as mentioned in the application documentation).
- Use the application's web interface (not finished yet) or CLI commands to start the processing.
In the /docker folder, run the command:
make cnpj-app
php artisan process:cnpj
The zip files will be processed and stored in redis. To process the queue in redis, run the command:
php artisan queue:work
To build and run the application, you will use the Makefile commands:
make up-terminal
to start the necessary services to run the data import script via terminal.
make up
to start the containers and the web application (in development).
Other useful commands:
make down
to stop and remove the containers.make restart
to restart the containers.
In the /docker
configuration folder, run the commands make cnpj-app
to enter bash mode and php artisan migrate
to create the tables in the database.
In this project, Redis is used as a temporary data store during the processing of CSV files. Redis offers fast in-memory storage, which improves performance when dealing with large volumes of data.
During the processing of CSV files:
- Each record is normalized and serialized as JSON.
- The records are temporarily stored in Redis in a list called
processed_records_{$type}
.
After processing:
- Data is read from Redis.
- They are deserialized and batch inserted into the database configured in the .env file.
This method ensures efficiency in data processing and minimizes the load on the database during the insertion of large volumes of records.
Table | Records | Size |
---|---|---|
cities | 5.571 | 600 Kb |
cnaes | 1.359 | 248 Kb |
companies | 57.707.950 | 11 Gb |
countries | 255 | 64 Kb |
establishments | 45.200.973 | 17 Gb |
legal_natures | 90 | 56 Kb |
partners | 23.084.108 | 4.48 Gb |
partners_qualifications | 68 | 24 Kb |
simples | 38.960.381 | 4.71 Gb |
Distributed under the MIT License. See LICENSE for more information.
Jefferson Costa – [email protected]
Project Link: https://github.com/jeffersonsalvador/cnpj-extractor