Welcome to the AWS Glue ETL Boilerplate repository! This is an example AWS Glue application that uses the Serverless Framework to deploy infrastructure and allows local development with AWS Glue Libs, Spark, Jupyter Notebook, and more. It includes jobs using Python Shell and PySpark.
CLICK OR TAP ❲☰❳ TO SHOW TABLE-OF-CONTENTS :D
Are you ready to supercharge your ETL development with AWS Glue? This repository is here to help you quickly set up, develop, and deploy AWS Glue jobs. Streamline your ETL pipelines, harness the power of AWS Glue Libs and Spark, and unlock efficient local development.
Check out the Use Case Scenario to learn more about the motivation behind this example!
- Full AWS Glue Setup: Deploy Glue jobs using Python Shell Script and PySpark.
- Flexible Local Development: Choose between using VSCode + Remote Containers or Docker Compose.
- Comprehensive Documentation: Easy-to-follow guides for development and deployment.
- Reusable Examples: Building upon multiple examples to provide a well-rounded solution.
- Serverless Framework: Utilize Serverless Framework to deploy AWS Glue jobs and other resources.
To quickly start a project using this example, follow these steps:
npx serverless install -u https://github.com/nanlabs/aws-glue-etl-boilerplate -n my-project
This boilerplate was created by combining the best practices from our following examples:
- Serverless Glue example - Deploy AWS Glue jobs using the Serverless Framework.
- AWS Glue docker example - Run AWS Glue jobs locally using Docker Compose.
- VSCode DevContainer example - Run AWS Glue jobs locally using VSCode + Remote Containers.
Choose your preferred local development setup!
- Install Docker
- Install VSCode
- Install the Remote Development extension
- Clone this repository
- Create your application within a container (see gif below)
Once the container is running inside VSCode, you can run the Glue jobs locally as follows:
# Run PySpark job
glue-spark-submit jobs/pyspark_hello_world.py --JOB_NAME job_example --CUSTOM_ARGUMENT custom_value
Refer to the development documentation for detailed steps to set up a local development environment using Docker Compose.
We utilize the Serverless Framework to deploy AWS Glue jobs and other resources. For deployment instructions, check out the deployment documentation.
You can find detailed implementation notes in the Implementation Notes document.
Empowering Threat Intelligence with our AWS Glue ETL Boilerplate
Imagine the scenario:
Objective: Your organization is on a mission to bolster its threat intelligence capabilities by creating a robust datalake that aggregates and analyzes data from various Open Source Intelligence (OSINT) sources. The goal is to enhance security operations and proactively identify potential threats.
Challenge: Traditional threat intelligence methods lack the agility and scalability needed to process the massive influx of data from OSINT sources. Manual data collection and analysis are time-consuming, making it difficult to stay ahead of emerging threats.
Solution: Introducing our AWS Glue ETL Boilerplate – a cutting-edge solution that harnesses the power of AWS Glue, Serverless Framework, and efficient local development techniques. This comprehensive example demonstrates how to build an end-to-end datalake tailored for threat intelligence operations.
Key Features and Benefits:
🔒 Enhanced Security Operations: By centralizing data from OSINT sources, your security team gains a consolidated view of potential threats. Real-time analysis enables quicker responses to emerging incidents.
⚙️ Flexible ETL Infrastructure: The Serverless Framework empowers you to deploy AWS Glue jobs seamlessly, adapting to varying data sources and formats. This flexibility ensures smooth data integration.
💡 Efficient Local Development: Develop and refine your threat intelligence pipeline locally using VSCode + Remote Containers or Docker Compose. Rapid iteration and testing significantly expedite deployment.
📈 Scalability for Data Growth: As your OSINT data volume expands, the solution effortlessly scales to accommodate increasing demands. This ensures your threat intelligence efforts remain effective and up-to-date.
📚 Comprehensive Documentation: A wealth of documentation guides your team through each step – from initial setup to deployment – ensuring successful implementation.