Winter Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: s2p65

Easiest Solution 2 Pass Your Certification Exams

Data-Engineer-Associate Amazon Web Services AWS Certified Data Engineer - Associate (DEA-C01) Free Practice Exam Questions (2026 Updated)

Prepare effectively for your Amazon Web Services Data-Engineer-Associate AWS Certified Data Engineer - Associate (DEA-C01) certification with our extensive collection of free, high-quality practice questions. Each question is designed to mirror the actual exam format and objectives, complete with comprehensive answers and detailed explanations. Our materials are regularly updated for 2026, ensuring you have the most current resources to build confidence and succeed on your first attempt.

A company stores time-series data that is collected from streaming services in an Amazon S3 bucket. The company must ensure that only workloads that are deployed within the company's VPC can access the data.

Which solution will meet this requirement?

A.

Create an S3 bucket policy that uses a condition to allow access only to traffic that originates from the company's VPC.

B.

Apply a security group to the S3 bucket that allows connections only from the company's VPC CIDR block.

C.

Define an IAM policy that denies access to all users unless the request originates from within the company's VPC.

D.

Use a network ACL on the VPC subnets to allow only specific resources to access the S3 bucket.

A data engineer uses Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to run data pipelines in an AWS account. A workflow recently failed to run. The data engineer needs to use Apache Airflow logs to diagnose the failure of the workflow. Which log type should the data engineer use to diagnose the cause of the failure?

A.

YourEnvironmentName-WebServer

B.

YourEnvironmentName-Scheduler

C.

YourEnvironmentName-DAGProcessing

D.

YourEnvironmentName-Task

A company stores customer data in an Amazon S3 bucket. Multiple teams in the company want to use the customer data for downstream analysis. The company needs to ensure that the teams do not have access to personally identifiable information (PII) about the customers.

Which solution will meet this requirement with LEAST operational overhead?

A.

Use Amazon Macie to create and run a sensitive data discovery job to detect and remove PII.

B.

Use S3 Object Lambda to access the data, and use Amazon Comprehend to detect and remove PII.

C.

Use Amazon Kinesis Data Firehose and Amazon Comprehend to detect and remove PII.

D.

Use an AWS Glue DataBrew job to store the PII data in a second S3 bucket. Perform analysis on the data that remains in the original S3 bucket.

A company uses Amazon S3 to store semi-structured data in a transactional data lake. Some of the data files are small, but other data files are tens of terabytes.

A data engineer must perform a change data capture (CDC) operation to identify changed data from the data source. The data source sends a full snapshot as a JSON file every day and ingests the changed data into the data lake.

Which solution will capture the changed data MOST cost-effectively?

A.

Create an AWS Lambda function to identify the changes between the previous data and the current data. Configure the Lambda function to ingest the changes into the data lake.

B.

Ingest the data into Amazon RDS for MySQL. Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.

C.

Use an open source data lake format to merge the data source with the S3 data lake to insert the new data and update the existing data.

D.

Ingest the data into an Amazon Aurora MySQL DB instance that runs Aurora Serverless. Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.

A marketing company uses Amazon S3 to store marketing data. The company uses versioning in some buckets. The company runs several jobs to read and load data into the buckets.

To help cost-optimize its storage, the company wants to gather information about incomplete multipart uploads and outdated versions that are present in the S3 buckets.

Which solution will meet these requirements with the LEAST operational effort?

A.

Use AWS CLI to gather the information.

B.

Use Amazon S3 Inventory configurations reports to gather the information.

C.

Use the Amazon S3 Storage Lens dashboard to gather the information.

D.

Use AWS usage reports for Amazon S3 to gather the information.

A technology company currently uses Amazon Kinesis Data Streams to collect log data in real time. The company wants to use Amazon Redshift for downstream real-time queries and to enrich the log data.

Which solution will ingest data into Amazon Redshift with the LEAST operational overhead?

A.

Set up an Amazon Data Firehose delivery stream to send data to a Redshift provisioned cluster table.

B.

Set up an Amazon Data Firehose delivery stream to send data to Amazon S3. Configure a Redshift provisioned cluster to load data every minute.

C.

Configure Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to send data directly to a Redshift provisioned cluster table.

D.

Use Amazon Redshift streaming ingestion from Kinesis Data Streams and to present data as a materialized view.

A company receives a daily file that contains customer data in .xls format. The company stores the file in Amazon S3. The daily file is approximately 2 GB in size.

A data engineer concatenates the column in the file that contains customer first names and the column that contains customer last names. The data engineer needs to determine the number of distinct customers in the file.

Which solution will meet this requirement with the LEAST operational effort?

A.

Create and run an Apache Spark job in an AWS Glue notebook. Configure the job to read the S3 file and calculate the number of distinct customers.

B.

Create an AWS Glue crawler to create an AWS Glue Data Catalog of the S3 file. Run SQL queries from Amazon Athena to calculate the number of distinct customers.

C.

Create and run an Apache Spark job in Amazon EMR Serverless to calculate the number of distinct customers.

D.

Use AWS Glue DataBrew to create a recipe that uses the COUNT_DISTINCT aggregate function to calculate the number of distinct customers.

A company plans to use Amazon Kinesis Data Firehose to store data in Amazon S3. The source data consists of 2 MB csv files. The company must convert the .csv files to JSON format. The company must store the files in Apache Parquet format.

Which solution will meet these requirements with the LEAST development effort?

A.

Use Kinesis Data Firehose to convert the csv files to JSON. Use an AWS Lambda function to store the files in Parquet format.

B.

Use Kinesis Data Firehose to convert the csv files to JSON and to store the files in Parquet format.

C.

Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the .csv files to JSON and stores the files in Parquet format.

D.

Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the .csv files to JSON. Use Kinesis Data Firehose to store the files in Parquet format.

A company receives a data file from a partner each day in an Amazon S3 bucket. The company uses a daily AW5 Glue extract, transform, and load (ETL) pipeline to clean and transform each data file. The output of the ETL pipeline is written to a CSV file named Dairy.csv in a second 53 bucket.

Occasionally, the daily data file is empty or is missing values for required fields. When the file is missing data, the company can use the previous day's CSV file.

A data engineer needs to ensure that the previous day's data file is overwritten only if the new daily file is complete and valid.

Which solution will meet these requirements with the LEAST effort?

A.

Invoke an AWS Lambda function to check the file for missing data and to fill in missing values in required fields.

B.

Configure the AWS Glue ETL pipeline to use AWS Glue Data Quality rules. Develop rules in Data Quality Definition Language (DQDL) to check for missing values in required files and empty files.

C.

Use AWS Glue Studio to change the code in the ETL pipeline to fill in any missing values in the required fields with the most common values for each field.

D.

Run a SQL query in Amazon Athena to read the CSV file and drop missing rows. Copy the corrected CSV file to the second S3 bucket.

A company uses AWS Glue Apache Spark jobs to handle extract, transform, and load (ETL) workloads. The company has enabled logging and monitoring for all AWS Glue jobs. One of the AWS Glue jobs begins to fail. A data engineer investigates the error and wants to examine metrics for all individual stages within the job. How can the data engineer access the stage metrics?

A.

Examine the AWS Glue job and stage details in the Spark UI.

B.

Examine the AWS Glue job and stage metrics in Amazon CloudWatch.

C.

Examine the AWS Glue job and stage logs in AWS CloudTrail logs.

D.

Examine the AWS Glue job and stage details by using the run insights feature on the job.

A company has an Amazon Redshift data warehouse that users access by using a variety of IAM roles. More than 100 users access the data warehouse every day.

The company wants to control user access to the objects based on each user's job role, permissions, and how sensitive the data is.

Which solution will meet these requirements?

A.

Use the role-based access control (RBAC) feature of Amazon Redshift.

B.

Use the row-level security (RLS) feature of Amazon Redshift.

C.

Use the column-level security (CLS) feature of Amazon Redshift.

D.

Use dynamic data masking policies in Amazon Redshift.

Files from multiple data sources arrive in an Amazon S3 bucket on a regular basis. A data engineer wants to ingest new files into Amazon Redshift in near real time when the new files arrive in the S3 bucket.

Which solution will meet these requirements?

A.

Use the query editor v2 to schedule a COPY command to load new files into Amazon Redshift.

B.

Use the zero-ETL integration between Amazon Aurora and Amazon Redshift to load new files into Amazon Redshift.

C.

Use AWS Glue job bookmarks to extract, transform, and load (ETL) load new files into Amazon Redshift.

D.

Use S3 Event Notifications to invoke an AWS Lambda function that loads new files into Amazon Redshift.

Copyright © 2014-2026 Solution2Pass. All Rights Reserved