Doações:
Pix: [email protected]
--
.Re-Watch the videos about the AWS Partner Certification Readiness - AWS Partner Network Lectures gived by Kevin Zook, Brady Smith and Andy Kroll.
.Assistir novamente os videos do curso AWS Partner Certification Readiness, ministrados por Kevin Zoook, Brady Smith e Andy Kroll.
........ Data ingestion, is about stream data, database migration and how do i get data into Redshift ?
........ Focus in ECS, EKS, Fargate, Lambda and Glue ( Glue is one of the most important topics of the exam)
........ Step functions, Pipelines process, SNS, SQS, some questions in exam may touch and ask about your SQL skills
........ as a part of applying programming concepts, you may be asked about AWS CDK and Cloud formation.
........ Choose a data store like: S3, lake formation; ...and you may store data in a database like Redshift or
........ You will use the S3 Life cycle policies, example: if you have data that´s 60 days old, how i should handle it ?
........ How you ensure the ACCURACY of DATA ? How to track the lineage of data ingested, his transformations and life cycle ?
........ At this point you ingested and stored data, now is the time to process, and extract insights from the data
........ Focus on STEP FUNCTIONS, AWS GLUE, AWS SAGE MAKER, this exam do not focus on Machine Learning but SAGE MAKER can process data
........ The exam is more focused in DATA than the analytics, but you have to look at ATHENA, QUICKSIGHT
........ To ensure data quality you have to use AWS GLUE, AWS GLUE DATA BREW, example: empty fields, missing data in a data set, etc.
........ At this point you knows about ingestion, storage, the operation aspect and how to process data.
........ Example of governance: PII identified by AWS MACIE and how to integrate it with LAKE FORMATION.
The resume of this 4 Domains is a high level concise content, use exam guide to deep dive in it´s aspects.
Saiba mais ==> https://partners.awscloud.com/rs/302-CJJ-746/images/Program%20Guide_APCR_DEA_NAMER.pdf
........ You have to firmly understand the foundational concepts of AWS Cloud, a lot of this content will help you fix and
...Video will cover: AWS Compute, AWS Networking, AWS Storage, AWS Databases, The Three V´s of DATA and AWS Services per "V"
......... Example: AWS ap-southeast-1 (Singapore Region), has ap-southeast-1a, ap-southeast-1b and ap-southeast-1c
......... POP (point of presence) enables delivering content like data, videos, apps and APIs globally with low latency and higher speed.
......... Remember the price models for EC2 : On-Demand, Saving Plans, Dedicated Hosts, Spot Instances and Reserved Instances.
......... an AMI is kinda off a "container image" not equal but similar in some way, it´s related, but not equal.
......... CONTAINERS: Contain all the code, runtime, system libraries, dependencies and configuration requirede for the app to run.
......... Multiple containers can run on same OS, sharing it´s resources. Containers engines runs the images.
......... A abstraction we can use is "a container" is like a zip file, with a lot of type of files within.
......... Container orchestration automates scheduling, development, networking, scaling, health monitoring and management of your container.
......... Orchestraion is Scheduling and Deployment: ECS Elastic Container Service and EKS Elastic Kubernetes Service they both do the exact same thing that is "orchestration".
......... At HOSTS you can choose EC2 or Fargate, whith EC2 i have to manage the VM and otherwise with Fargate the VM´s is managed by the AWS (serverless).
......... Fargate scales to 16 vCPU and 120GB memory per task to run DATA processing workloads, the serverless solution from AWS to containers.
......... Fleet Management: health, availability of EC2 fleet. Can replace impaired instances and balance capacity across AZs.
......... Scheduled auto scaling allow you to up or down the fleet ahead of known load changes, besides the dynamic scheduling automates the process
......... Amazon EC2 Auto Scaling Groups, is like a "thermostat" to feel how to scales up or down your EC2 fleet.
......... ELB (Elastic Load Balancer) receives incoming traffic and distributes the requests across EC2 and AZs.
......... How a lambda or a EC2 can process information inside a S3 bucket and how you set up that access
......... like routing, subnets, endpoints and the security around that, how could you integrate this and allow computation to access the storage.
......... You have to understand S3 concepts, like: how to protect data, how to integrate a lambda function, how data gets into a S3, how S3 lifecycle works.
......... MemoryDB for Redis: ULTRA FAST performance with microsecond reads, millisecond writes, scalability and enterprise security.
......... USAGE of MemoryDB >> Simplifying architecture with a database with cache, Workload with ultra-fast performance, redis data structures.
......... Amazon KEYSPACES for Apache Cassandra compatible with Cassandra Query Language fully managed by AWS.
......... Amazon DocumentDB is MongoDB compatible database, Stores, query and indexes JSON data natively using fully managed document database service.
......... Volume: Total amount of data coming in to the solution, analogy: a lot of water being ingested or just a few...
......... Variety: Count and type of data sources in the solution, analogy: types of water source, lakes, rain, rivers...like PDFS, images, JSON docs, videos.
......... Velocity: Speed of data flowing throught to be processed, analogy: Fast flowing river versus a slow moving stream, like process in real time ou batch jobs.
......... Veracity: Degree to which data is accurate, precise and trusted, analogy: you have clean water and dirty water, in data coming in can be missing values.
......... Value: Ability to extract meaningful information from the data stored, analogy: there is more value in clean water, same with data dirty data,
......... All analogies is to relate the 5 V´s with thinking about WATER being ingested in a lake or a reservoir.
........ Serverless KEY-VALUE and document database that delivers single-digit millisecond performance.
........ for petabyte scale data processing, interactive analytics and Machine Learning using open-source frameworks.
......... Amazon MSK ingest and process streaming data in REAL TIME with a fully managed APACHE KAFKA
........ Streaming data can be fan-out out multiple consumers like EC2, lambda, Spark or Amazon EMR and Amazon Managed Service for Apache Flink.
........ Amazon Kinesis is about to how i ingest data, in real-time or near real-time and how i process streaming data.
........ Remember the 15 min limitation from Lambda, when you in exam to discard or to choose lambda answers
........
........Question 1:
...........An Amazon Kinesis application is trying to read data from a Kinesis data stream. However, the read data call is rejected.
...........The following error message is displayed: ProvisionedThroughputException.
...........
...........Which combination of steps will resolve the error ? (select TWO)
...........
...........A. Configure ehanced fan-out on the stream
...........B. Enable enhanced monitoring on the stream
...........C. Increase the size of the GetRecords requests.
...........D. Increase the number of shards within the stream to provide enough capacity for the read data calls.
...........E. Make the application retry to read data from the stream.
...........Answer: "D" and "E"
...........
...........
........Question 2:
...........A company is collecting data that is generated by its users for analysis by using an Amazon S3 data lake.
...........Some of the data being collected and stored in Amazon S3 includes personally identifiable information (PII).
...........
...........The company wants a data engineer to design an automated solution to identify new and existing data that needs PII to be masked
...........before analysis is performed. Additionally, the data engineer must provide an overview of the data identified.
...........The task of masking the data will be handled by an application already created in the AWS account.
...........The data engineer needs to design a solution that can invoke this application in real time when PII ins found.
...........
...........Which solution will meet these requirements with the LEAST operational overhead?
...........
...........A. Create an AWS Lambda function to analyze data for PII. Configure notification settings on the S3 bucket to invoke the lambda function
........... when a new object is uploaded.
...........B. Configure notification settings on the S3 bucket. Configure an Amazon EventBridge rule for the default events bus for new object uploads.
........... Set the masking application as the target for the rule.
...........C. Enable Amazon Macie in the AWS account. Create an Amazon EventBridge rule for the default event bus for Macie findings. Set the masking
........... application as the target for the rule.
...........D. Enalbe Amazon Macie in the AWS account. Create an AWS Lambda function to run on a schedule to poll Macie findings and invoke the masking application.
...........Answer: "C"
...........
...........
........Question 3:
...........A company has data in an on-premises NFS file share. The company plans to migrate to AWS. The company uses the data for data analysis.
...........The company has written AWS Lambda functions to analyze the data.
...........The company wants to continue to use NFS for the file system that Lambda Accesses. The data must be shared across all concurrently running Lambda functions.
...........
...........Which solution should the company use for this data migration ?
...........
...........A. Migrate the data into the local storage for each Lambda function. Use the local storage for data access.
...........B. Migrate the data to Amazon Elastic Block Store (Amazon EBS) volumes. Access the EBS volumes from the Lambda functions.
...........C. Migrate the data to Amazon DynamoDB. Ensure the lambda functions have permissions to access the table.
...........D. Migrate the data to Amazon Elastic File System (Amazon EFS). Configure the Lambda Functions to mount the file system.
...........Answer: "D"
...........
...........
...........
......Domain 1: Data Ingestion and Transformation
......Batch and Stream Processing Architectures
........Batch processing is the method computers use to periodically complete high-volume, repetitive data jobs.
........Stream processing requires ingesting a sequence of data, and incrementally updating metrics, reports and summary statistics in response to each arriving data record.
........Better for real-time monitoring and response.
........
......AWS Services for Stream Processing
........Amazon Kinesis Data Analytics and KDA Studio (both MSK and Kinesis Data Streams)
........Amazon MSK Connect for sink from to Amazon MSK
........Kinesis Data Firehose from Kinesis Data Streams, Amazon CloudWatch, etc
........AWS Lambda
........Amazon EMR
........AWS Glue
........Custom consumers
........Streaming data options on AWS: Amazon Kinesis Data Streams, Amazon Data Firehose, Amazon Managed Service for Apache Flink and Amazon Managed Streaming for Apache Kafka
........To exam focus more on Amazon Kinesis Data Streams
........Flow of Data: a bunch of PRODUCERS send DATA to Amazon Kinesis Data Streams, data ingested is temporally stored, when a consumer comes and pulls it down to processing,
........after that DATA can be stored somewhere or generate some visualization.
........ ........Kinesis Data Streams is about ingesting and processing that data.
........Kinesis Data FireHose is about to store the data.
........So if you are to process the data use Kinesis Data Streams otherwise you´re intending to store the data go with Kinesis Firehose ........If you need something as real time as possible use Amazon Kinesis Data Streams, the ingested data in the stream stay in for 24 hours and up to 365 days.
........In Kinesis Firehose, producers send data to Kinesis Firehose, the DATA is added to a "buffer" with max size of 128 mb.
........The deliver of that DATA to conusumer, example a S3, occurs when the buffer is filled with 128 mbr or after 900 seconds. ........You can call Kinesis Firehose a "near real-time".
........With Kinesis Data Streams you can´t do Transformation and conversion with Data.
........With Kinesis Firehose you can do Transformation and conversion using AWS GLUE or AWS Lambda.
........As Data compression you can´t compress data in stream with Amazon Kinesis Data Stream, but you can compress with gzip, Snappy, zip with Kinesis Firehose.
........ETL - Extract, transform and load: the process of combining data from multiple sources ina a large repository called data warehouse.
........ETL uses a set of buiness rules to clean, organize and prepare raw data for storage, data analytics and Machine Learning (ML).
........AWS Glue is the main service you use to do ETL.
........AWS Glue can use JDBC or ODBC to connect to data sources outside AWS.
........Orchestrate data Pipelines on AWS
........Use AWS Step Functions and AMAZON Managed Workflows for Apache Airflow (MWAA) to simplify ETL Workflow management.
........AWS Step Functions workflow types.
........Standard workflows can run for up one year, and have exactly-once workflow transformation.
........Express workflows can run for up five minutes, and have at-least-once workflow transformation.
........Express is more suitable for high-event-rate and streaming data processing.
........For long-running, auditable, debugging and show execution history use Standard workflows.
........
........CI/CD - Continuos integration and continuous delivery
........AWS CodePipeline: AWS CodeCommit, AWS CodeBuild, AWS CodeDeploy
........
........AWS Lambda
........To scale up Lambda Functions you need to add more memory.
........Lambda execution models : Synchronous (push), Asynchronous (event based) and Stream (Poll-based).
........Lambda maximum invocation timeout limit is 15 minutes.
........
........You you not have get-in-depth question on exam about CDK and Cloudformation but you got to know.
........CloudFormation is a foundation of Infrastructure as a code. ........AWS CDK you can write code in programming languages, example: Python.
........and CDK will call CloudFormation to deploy your code.
........AWS SAM Serverless Application Model (AWS SAM).
........
........Apache Parquet and Apache ORC File formats.
........is more compressible and more fast to be queried.
........ ........
........Domain 1 - Exam Questions!
........
........Question 1 - Data Lake Automation
........ ........A legal firm manages various on-premises servers containing documents in text, PDF and CSV files. These contain confidential information related to contracts, lawsuits and customer data.
........The firm aims to migrate and centralize data into a data lake.
........The company wants to implement automated processes for sensitive data verification and segregation, including long-term storage of findings for auditing purposes.
........Which tasks would achieve this goal with minimum effort ? (Select 2 options)
........
........A. Utilize Amazon EFS for data extraction, transformation and loading into the Data Lake.
........B. Configure Amazon S3 bucket policies to segregate sensitive and non-sensitive data.
........C. Deplou Amazon Macie to automatically discover, classify and protect sensitive data.
........D. Use AWS Glue DataBrew to automatically discover, classify and protect sensitive data.
........Answer: "B" and "C"
........
........
Tip: Glue DataBrew is more like a visualization tool.
........
........
........Question 2 - Data Warehouse ingestion
........
........A company is collecting data from sensors located around the world.
........The data is collected and stored in an S3 bucket as JSON files. A data engineer needs to load this data into their data warehouse running in an Amazon Redshift cluster.
........Once the data is loaded, it will be queried and used to create visualizations.
........Which solution would meet the requirements with the least operational overhead ?
........
........A. Create a glue workflow to convert the data to ORC format. Then use the Redshift COPY command to load the data directly in the cluster using the VARCHAR data type.
........B. Create a glue workflow to convert the data to PARQUET format. The use the Redshift COPY command to load the data directly in the cluster using the SUPER data type.
........C. Use the Redshift COPY command to load the data directly in the cluster using the VARCHAR data type.
........D. Use the Redshift COPY command to load the data directly in the cluster using the SUPERE data type.
........Answer: "D", With the introduction of the SUPER data type, AMAZON REDSHIFT provides a rapid and flexible way to ingest JSON data and query it without the need to impose a schema.
........You can now load it directly into AMAZON Redshift without ETL.
........
........
........Question 3 - Hive Metastore Configuration
........
........A company is using a persistent Amazon EMR cluster to process vast amounts of data and store them as external
........tables in an S3 bucket. The Data Analyst must launch several transient EMR clusters to access the same tables
........simultaneously. However, the metadata about the Amazon S3 external tables are stored and defined on the persistent cluster.
........Which of the following is the most efficient way to expose the Hive mestastore with minimal effort?
........
........A. Configure Hive to use an Amazon DynamoDB as its metastore.
........B. Configure Hive to use an External MySQL Database as its metastore.
........C. Configure Hive to use the AWS Glue Data Catalog as its metastore.
........D. Configure Hive to use Amazon Aurora as its metastore.
........Answer: "C" - What is a Metastore ? is "data" about the data, like what is the schema, what is the data types for the data ?
........AWS Glue Data Catalog is a metastore, it's purpose built for that.
........Using Amazon EMR you can configure Hive to use the AWS Glue Data Catalog as its metastore, this is recommended configuration when you
........require a persistent metastore or a metastore shared by different clusters, services, applications or AWS accounts.
........
........
........Question 4 - Infrastructure as Code
........
........A data engineer is using the AWS Cloud Development Kit (CDK) to create a repeatable deployment for their data platform.
........The data engineer has written the code and is ready to provision the resources into their AWS environment.
........Which command will turn the CDK code into a CloudFormation template?
........
........A. cdk diff
........B. cdk deploy
........C. cdk bootstrap
........D. cdk synth
........Answer: "D" - the sequence using CDK is: init, bootstrap, synth and deploy.
........
........
........Question 5 - SQL
........
........
A data engineer is working with an Amazon RDS database instance in an AWS environment.
........The database contains a table named "employees" with the following columns:
........
........- employee_id (INT, Primary Key)
........- first_name (VARCHAR(50))
........- last_name (VARCHAR(50))
........- department (VARCHAR(50))
........
........They need to retrieve a list of all employee´s first names and last names from the "employees" table.
........Which of the following SQL queries would correctly accomplish this task?
........
........A. SELECT first_name, last_name FROM employees;
........B. GET first_name, last_name FROM employees;
........C. RETRIEVE first_name, last_name FROM employees;
........D. SELECT employees FROM first_name, last_name;
........Answer: "A"
........
https://www.linkedin.com/posts/f%C3%A1bio-samuel-dos-santos-canedo-2708b533_aws-dea-dataengineerassociate-activity-7197287819017367553-ja8D?utm_source=share&utm_medium=member_desktop
........On AWS Data Engineer Associate, you need to focus on AWS Redshift, AWS GLUE, AWS S3, AWS Lake Formation, AWS Kinesis Data Streams, AWS Kinesis Firehose, set data pipelines.
........13% of a overall of 200 questions answered right. As a expected low rate. Keep studying.
# 2025-01-13 - Video 05 - Content Review: Development with AWS Services
### Amazon Athena, Amazon QuickSight, Serverless Analytics and AWS Hadoop Fundamentals.
###### Choosing a data store, Understand data cataloging systems, Manage the lifecycle of data anda Design data models and schema evolution.
###### Review Amazon S3 Storage Types: S3 Intelligent-Tiering, S3 Standard, S3 Standard-IA, S3 Glacier, S3 Glacier Deep Archive, S3 One Zone-IA.
###### Review Amazon EFS Elastic File System => Using NFS Protocol, is just a file system designed for multiple lambda functions or EC2 instances
connect to one central file share, it´s grow and shrink as data is added or deleted.
###### Review EBS Amazon Elastioc Block Storage: it´s your storage directly attached to your EC2 instance.
### AWS Transfer Family: SFTP, FTPS, FTP and AS2 Protocols.
### Amaozon RedShift: is a DATA WAREHOUSE
###### DC2 = Dense Compute Node, we have a cluster made of a Leader Node (the Brain, realizes queries and computing) and Compute nodes. All inteligence stays in leader node. ###### Data is really stored in Compute nodes and it has the muscles, in further computer nodes have slices; slices is just a allocation of Ram and Storage and is where the computation actually happens. ###### You have to know for these cluster is with DC2 nodes you have to scale compute and storage together.
###### Amazon Redshift cluster architecture - RA3. ###### In RA3 you have leader node, compute nodes but the difference is you can scale compute and storage separately because RMS (Redshift Managed Storage) manages all for you. ###### RA3 is recommended becaus it has lot of benefits compared to DC2. ###### Redshift Spectrum can run queries on S3, Redshift Federated Query can run queries on RDS.
###### Columnar Data Storage.
###### RDS ->> OLTP ->> CSV ........ or ........ REDSHIFT ->> OLAP ->> Parquet
######
### Data Cataloging Services : AWS GLUE ####### DATA Catalog is basic a index of where the data is located and what the schema of the data. ####### data catalog is a metadata, the data about the data. ####### AWS Glue crawlers: Connects to a data store, extract the schema of your data then populates the Data Catalog. ####### AWS Glu Triggers: Scheduled, Conditional and On-Demand.
####### How i load data from S3 into Redshift? You do that with the Redshift COPY command; and for a bunch files use manifest file. ####### Amazon S3 Lifecycle: Review how data noves between S3 Storage types. ####### Amazon S3 Versioning, once you turn versioning on, a change to an object occurr an another copy of the object is created and stored.
####### Amazon DynamoDB - Time to Live TTL ####### Amazon Redshift Distribution styles ####### AUTO, EVEN, KEY and ALL ####### in AUTO assigns a optimal distribution style based on the size of the table data, when the tables grows larger, Amazon REDSHIFT might change the distribution style, let´s say to KEY. ####### in EVEN is appropriate when a table doesn´t participate in joins or when there isn´t a clear choice between KEY and ALL distribution. ####### in KEY the rows are distributed according to the values in one column. ####### in ALL a entire COPY of the Table is on every node, because of this is not great for large datasets but for small data sets. ####### Look at documentation of REDSHIFT for AMAZON REDSHIFT SCHEMA DESIGN
#### AWS Database Migration Service ####### Trusted way to migrate 1M+ databases with minimal downtime ####### AWS DMS and AWS Schema Conversion Tool (SCT) will use AWS SCT. #######
EdN - Live Escola da Nuvem Bootcamp IA para Startups ONE Brasil - Hello, Oracle ONE! Proz - Arquitetos na Nuvem
2025-01-15 Soft Skills Oracle ONE! 2025-01-16 Soft Skills Oracle ONE! 2025-01-17 Video aula Sebrae AWS 2025-01-18 Finalizado o Curso SEBRAE / AWS
---////
This Redme.md is in development