📊 Applying Site Reliability Engineering Principles - PCDOE Practice Questions

Apply SRE concepts: SLOs, SLIs, error budgets, toil reduction, and reliability engineering practices.

11Questions Available
1Exam Domains

Practice SRE Principles Questions Now

Start a timed practice session focusing on Applying Site Reliability Engineering Principles topics from the PCDOE question bank.

Start PCDOE Practice Quiz →

PCDOE SRE Principles Question Bank (11 Questions)

Browse all 11 practice questions covering Applying Site Reliability Engineering Principles for the PCDOE certification exam. Each question includes the full answer and a detailed explanation to help you understand the concepts.

  1. Question 1Applying Site Reliability Engineering Principles

    How do you define and implement SLIs and SLOs for a Cloud Run service?

    AMonitor only uptime
    BDefine SLIs (latency p99, error rate, availability), set SLO targets (e.g., 99.9% availability), create error budgets, and configure alerts when error budget burn rate is high
    CUse default monitoring only
    DSLOs are not relevant for cloud services
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    SLI/SLO implementation: 1) SLIs: measurable indicators — availability (successful requests / total requests), latency (p50, p99), throughput. 2) SLOs: targets — 99.9% availability = 8.76h downtime/year. 3) Error budget: 1 - SLO = allowed unreliability (0.1% = ~43 min/month). 4) Burn rate: how fast error budget is consumed. 5) Cloud Monitoring: SLO monitoring service (create SLO, track burn rate, alert). 6) Action: freeze deployments when budget exhausted.

  2. Question 2Applying Site Reliability Engineering Practices to a Service

    What is the relationship between SLI, SLO, SLA, and Error Budget?

    AThey are all the same
    BSLI measures performance, SLO sets internal target, SLA is external contractual commitment, Error Budget = 100% - SLO
    CSLA is internal, SLO is external
    DError Budget replaces SLO
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    SLI (indicator) measures actual performance. SLO (objective) sets the internal target (e.g., 99.9%). SLA (agreement) is the external contract. Error Budget (1-SLO) is the tolerable unreliability.

  3. Question 3Managing Incidents and Post-Mortems

    How do you configure alerts based on SLO error budget consumption?

    AAlert when any error occurs
    BMulti-window, multi-burn-rate alerts: fast burn (2% budget in 1h → page), slow burn (10% budget in 6h → ticket) — avoids alert fatigue while catching real issues
    CAlert only when SLO is breached
    DUse fixed thresholds instead of SLOs
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    SLO-based alerting: 1) Fast burn: consuming 2% of 30-day budget in 1 hour (14.4x normal rate) → immediate page. 2) Slow burn: consuming 5% of budget in 6 hours (6x normal rate) → urgent ticket. 3) Window: shorter windows catch acute problems, longer windows catch gradual degradation. 4) Implementation: Cloud Monitoring SLO service, or Prometheus recording rules. 5) Benefit: alerts correlate with user impact (not just component health). 6) Reduces alert fatigue: only alert when users are affected.

  4. Question 4Applying Site Reliability Engineering Principles

    How do you determine the appropriate reliability target (SLO) for a service?

    AAlways target 100% availability
    BBalance: user expectations, business impact of downtime, cost of additional reliability, and dependency reliability — 100% is wrong because it prevents innovation
    CMatch competitor's uptime
    DUse the cloud provider's SLA
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    Setting SLOs: 1) User expectations: payment service needs higher reliability than a blog. 2) Business impact: revenue loss per hour of downtime. 3) Cost curve: 99.9% → 99.99% costs significantly more (redundancy, testing, staffing). 4) Dependencies: can't be more reliable than your least reliable dependency. 5) 100% is wrong: no room for updates, experiments, or feature development. 6) Starting: measure current reliability, set SLO slightly above. 7) Review: quarterly adjustment based on business needs.

  5. Question 5Applying Site Reliability in Practice

    How do you implement database reliability practices as an SRE?

    ADBA handles database reliability
    BAutomated backups with tested restores, connection pooling, query performance monitoring, failover testing, schema migration CI/CD, and database SLOs (query latency, availability)
    CManaged databases handle everything
    DDatabase reliability is different from SRE
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    Database reliability: 1) Backups: automated + PITR + regular restore drills. 2) HA: Cloud SQL regional HA, Spanner multi-region. 3) Monitoring: query latency (p99), connection count, replication lag, storage. 4) SLOs: database availability (99.95%), query latency p99 <100ms. 5) Connection: pooling, proxy, max connection management. 6) Schema: CI/CD migrations (Flyway, tested in staging). 7) Failover: regular failover drills. 8) Capacity: growth planning, autoscaling storage.

  6. Question 6Applying SRE Principles

    What is the relationship between SLO and error budget?

    ANo relationship
    BError budget = 1 - SLO target. If SLO is 99.9%, error budget is 0.1% (43.8 min/month). Error budget is consumed by downtime and errors, balancing reliability investment with feature velocity.
    CError budget is unlimited
    DSLO replaces error budget
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    Error budget: quantifies acceptable unreliability. SLO 99.9% → 0.1% budget → 43.2 min/month downtime allowed. Budget remaining: deploy features, take risks. Budget exhausted: freeze deployments, focus on reliability. Benefits: data-driven reliability vs. velocity decisions, shared accountability (dev + ops), and progressive rollouts (consume budget gradually). Review: weekly/monthly error budget reports.

  7. Question 7Applying Site Reliability Engineering Principles

    What is the relationship between SLA, SLO, and SLI?

    AThey are the same thing
    BSLI is the measurement (metric), SLO is the internal target (objective), SLA is the external contract (with consequences) — SLI feeds SLO which informs SLA
    CSLA is set first, then SLO and SLI
    DOnly SLA matters
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    SLI → SLO → SLA: 1) SLI (indicator): concrete metric (request latency, error rate, availability). Measured from monitoring. 2) SLO (objective): internal target for SLI (99.9% availability, p99 <200ms). Set by engineering + product. 3) SLA (agreement): external contract with customers (99.5% availability with credits). SLA < SLO (buffer). Example: SLI = success rate, SLO = 99.95% (internal), SLA = 99.9% (contractual). The gap: room for maintenance, experimentation.

  8. Question 8Applying Site Reliability Engineering Practices to a Service

    What is an SLO (Service Level Objective) in SRE practices?

    AA contractual agreement with customers
    BA target value for a service level indicator (e.g., 99.9% availability) that defines acceptable reliability
    CA budget allocation for service operations
    DA team structure definition
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    SLOs define target reliability levels (measured by SLIs) that a service should achieve. They balance reliability goals with feature development velocity through error budgets.

  9. Question 9Applying Site Reliability Engineering Practices to a Service

    What is toil in SRE and how do you reduce it?

    AImportant manual work
    BRepetitive, automatable, tactical operational work that scales linearly with service growth — reduce through automation, self-service, and elimination
    CAll operational work
    DCannot be reduced
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    Toil: manual, repetitive, automatable, no lasting value, scales with service (e.g., manual deployments, ticket-driven changes). Reduce: automate (scripts, IaC), self-service (portals, APIs), eliminate (design for zero-touch operations). SRE target: <50% time on toil.

  10. Question 10Applying Site Reliability Engineering Practices to a Service

    What are SLIs, SLOs, and SLAs?

    ASame concept
    BSLI: metric measuring service level (e.g., latency p99). SLO: target for SLI (99.9% requests <200ms). SLA: contractual commitment with consequences for missing SLO.
    CSLI is the contract
    DSLA is a metric
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    SLI (indicator): quantitative measure — availability (successful requests / total), latency (% under threshold), correctness. SLO (objective): target — 99.9% availability over 28 days. SLA (agreement): business contract — if SLO missed, customer gets credits. SLI measures what matters to users.

  11. Question 11Applying SRE Principles

    What is the difference between SLI, SLO, and SLA?

    AAll the same
    BSLI (Service Level Indicator): measurable metric (e.g., 99.95% requests <200ms). SLO (Service Level Objective): target value for SLI (e.g., 99.9% availability). SLA (Service Level Agreement): contractual commitment with consequences.
    CSLA is internal only
    DSLI is a contract
    Show Answer & Explanation
    Correct Answer: B
    Explanation:

    SLI: quantitative measure of service behavior (latency, availability, error rate, throughput). SLO: target for SLI (99.9% of requests succeed). SLA: contractual SLO with consequences (credits, penalties). Relationship: SLI (what you measure) → SLO (what you target, usually higher than SLA) → SLA (what you promise customers). Best practice: SLO should be stricter than SLA (buffer for maintenance, incidents).

Key SRE Principles Concepts for PCDOE

sresloslierror budgettoilreliabilityincident management

PCDOE SRE Principles Exam Tips

Applying Site Reliability Engineering Principles questions in PCDOE are typically scenario-based. Focus on service-level decision making aligned to official exam objectives. Priority concepts: sre, slo, sli, error budget, toil, reliability.

What PCDOE Expects

  • Anchor your answer in select the most practical, secure, and scalable answer for the stated scenario.
  • SRE Principles scenarios for PCDOE are frequently mapped to Domain 1 (~20%), so read the objective carefully before picking controls or architecture.
  • Expect multi-service scenarios where SRE Principles interacts with IAM, networking, storage, or observability patterns rather than appearing as an isolated service question.
  • When two options are both technically valid, prefer the choice that best aligns with the exam's operational scope (Professional) and managed-service best practices.

High-Value SRE Principles Concepts

  • Know the core SRE Principles building blocks cold: sre, slo, sli, error budget.
  • Review the edge-case features and limits for toil, reliability; these details are commonly used to differentiate answer choices.
  • Practice service-integration reasoning: how SRE Principles pairs with CI/CD Pipelines, Service Performance in real deployment patterns.
  • For PCDOE, explain why the chosen SRE Principles design meets reliability, security, and cost expectations better than the alternatives.

Common PCDOE Traps

  • Watch for answers that partially solve the requirement but miss operational constraints.
  • Questions in SRE Principles often include distractors that look correct for SRE Principles but violate least-privilege, durability, or availability requirements.
  • Avoid picking options purely by feature name; validate data path, failure handling, and governance impact before answering.
  • If the prompt hints at automation or repeatability, eliminate manual-only operational answers first.

Fast Review Checklist

  • Can you compare at least two SRE Principles implementation paths and justify which one best fits the scenario?
  • Can you map the chosen answer back to SRE Principles (~20%) outcomes for PCDOE?
  • Can you explain security and access boundaries for SRE Principles without relying on default-open assumptions?
  • Can you describe how SRE Principles integrates with CI/CD Pipelines and Service Performance during failure, scaling, and monitoring events?

Exam Domains Covering SRE Principles

Related Resources

More PCDOE Study Resources