📥 Ingesting and Processing Data - PDE Practice Questions

Build data ingestion pipelines using Dataflow, Dataproc, Cloud Data Fusion, and Transfer Service.

4Questions Available
1Exam Domains

Practice Ingesting & Processing Questions Now

Start a timed practice session focusing on Ingesting and Processing Data topics from the PDE question bank.

Start PDE Practice Quiz →

PDE Ingesting & Processing Question Bank (4 Questions)

Browse all 4 practice questions covering Ingesting and Processing Data for the PDE certification exam. Each question includes the full answer and a detailed explanation to help you understand the concepts.

  1. Question 1Ingesting and Processing Data

    When should you use Cloud Data Fusion instead of writing custom Dataflow pipelines?

    AAlways use custom Dataflow
    BWhen citizen data engineers need a visual, code-free ETL/ELT tool — Data Fusion provides a drag-and-drop UI with pre-built connectors for common sources and transformations
    CNever use Data Fusion
    DData Fusion replaces Dataflow entirely
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    Cloud Data Fusion: visual ETL tool (based on CDAP). Use when: non-developer users, standard transformations (join, filter, aggregate), 200+ pre-built connectors (SAP, Salesforce, databases). vs Dataflow: custom logic, streaming, code-based. Data Fusion actually generates Dataflow jobs under the hood for execution. Editions: Basic (batch), Enterprise (streaming, replication, lineage).

  2. Question 2Ingesting and Processing Data

    What are the core concepts of the Apache Beam programming model used by Dataflow?

    AMap, Reduce, Filter
    BPipeline, PCollection (distributed dataset), PTransform (processing step), I/O connectors — a unified model that works for both batch and streaming with the same code
    CTables and queries
    DTasks and workers
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    Apache Beam model: Pipeline (overall data processing job), PCollection (immutable distributed dataset — bounded for batch, unbounded for streaming), PTransform (processing step — ParDo, GroupByKey, Combine, Flatten), I/O (read/write sources/sinks). Runners: Dataflow (GCP), Spark, Flink. Key advantage: write once, run on any runner. Windowing and triggers for streaming.

  3. Question 3Ingesting and Processing Data

    A team has existing Spark jobs running on an on-premises Hadoop cluster. What is the recommended approach to migrate to Google Cloud?

    ARewrite everything in Dataflow
    BUse Dataproc — migrate Spark jobs with minimal changes, store data in Cloud Storage (HDFS-compatible connector), and use ephemeral clusters for cost optimization
    CUse Compute Engine VMs with Hadoop installed
    DUse Cloud Functions for Spark
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    Dataproc for Spark migration: minimal code changes (HDFS → gs:// paths). Key pattern: ephemeral clusters — create cluster, run job, delete cluster (no idle costs). Store data in Cloud Storage (not HDFS). Dataproc Serverless: fully managed, no cluster management. Workflow Templates: orchestrate multi-step Spark jobs. Dataproc Metastore: managed Hive Metastore for table metadata.

  4. Question 4Designing Data Processing Systems

    When should you use Dataproc instead of Dataflow?

    AAlways
    BFor existing Hadoop/Spark workloads, when you need Spark-specific libraries, or when the team has Spark expertise
    CFor streaming only
    DFor SQL queries only
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    Dataproc is ideal for migrating existing Hadoop/Spark jobs, leveraging Spark ML/GraphX libraries, or when teams have strong Spark expertise. Dataflow is better for new pipelines with unified batch/stream.

Key Ingesting & Processing Concepts for PDE

ingestiondataflowapache beamdataprocsparkdata fusiontransfer service

PDE Ingesting & Processing Exam Tips

Ingesting and Processing Data questions in PDE are typically scenario-based. Focus on service-level decision making aligned to official exam objectives. Priority concepts: ingestion, dataflow, apache beam, dataproc, spark, data fusion.

What PDE Expects

  • Anchor your answer in select the most practical, secure, and scalable answer for the stated scenario.
  • Ingesting & Processing scenarios for PDE are frequently mapped to Domain 2 (~19%), so read the objective carefully before picking controls or architecture.
  • Expect multi-service scenarios where Ingesting & Processing interacts with IAM, networking, storage, or observability patterns rather than appearing as an isolated service question.
  • When two options are both technically valid, prefer the choice that best aligns with the exam's operational scope (Professional) and managed-service best practices.

High-Value Ingesting & Processing Concepts

  • Know the core Ingesting & Processing building blocks cold: ingestion, dataflow, apache beam, dataproc.
  • Review the edge-case features and limits for spark, data fusion; these details are commonly used to differentiate answer choices.
  • Practice service-integration reasoning: how Ingesting & Processing pairs with Data Processing, Analysis in real deployment patterns.
  • For PDE, explain why the chosen Ingesting & Processing design meets reliability, security, and cost expectations better than the alternatives.

Common PDE Traps

  • Watch for answers that partially solve the requirement but miss operational constraints.
  • Questions in Ingesting and Processing Data often include distractors that look correct for Ingesting & Processing but violate least-privilege, durability, or availability requirements.
  • Avoid picking options purely by feature name; validate data path, failure handling, and governance impact before answering.
  • If the prompt hints at automation or repeatability, eliminate manual-only operational answers first.

Fast Review Checklist

  • Can you compare at least two Ingesting & Processing implementation paths and justify which one best fits the scenario?
  • Can you map the chosen answer back to Ingesting and Processing Data (~19%) outcomes for PDE?
  • Can you explain security and access boundaries for Ingesting & Processing without relying on default-open assumptions?
  • Can you describe how Ingesting & Processing integrates with Data Processing and Analysis during failure, scaling, and monitoring events?

Exam Domains Covering Ingesting & Processing

Related Resources

More PDE Study Resources