mohammed firdous

systems design

Two practical system design cases with constraints, measurable targets, trade-offs, and implementation decisions.

Pull Request Review Queue

Problem: Engineers needed fast feedback on pull requests, but synchronous checks were slowing merges.

Impact Snapshot

Feedback SLA

< 2 minutes average

Durability goal

0 dropped jobs during spikes

Auditability

100% traceable review comments

Architecture Flow

Git webhooks
Job queue
Worker pool
Rules + model service
Review results store

Requirements / Constraints

  • keep average feedback time under 2 minutes
  • handle release-day spikes without dropping jobs
  • keep a clear audit trail for every review comment

Role & Decisions

  • Designed queue topology and worker concurrency boundaries.
  • Introduced repository-priority lanes to reduce merge-day tail latency.
  • Defined retry strategy for oversized diffs to protect feedback SLA.

Key Trade-offs

  • Queueing improved reliability, but users lost instant responses
  • Shared worker pools improved utilization, but noisy repos affected tail latency

Notes

  • Most delays came from retries on very large diffs, not model latency.
  • Separate queues per repository priority reduced merge-day complaints.

Future Improvements

  • Add diff-size based routing to dedicated workers
  • Cache repeated review hints for common file patterns

Tenant Analytics Pipeline

Problem: Product and support teams needed daily reports plus near real-time usage visibility per tenant.

Impact Snapshot

Dashboard freshness target

< 1 hour

Storage strategy

low-cost raw event retention

Isolation requirement

tenant-level boundaries by design

Architecture Flow

Client events
Ingestion API
Stream bus
ETL jobs
Warehouse marts
Dashboards

Requirements / Constraints

  • strict tenant-level data isolation
  • cheap long-term storage for raw events
  • dashboard freshness within one hour

Role & Decisions

  • Designed ingestion-to-mart flow for daily and near real-time reporting.
  • Set partition and processing profiles to avoid tenant cost imbalance.
  • Added schema-governance guardrails to reduce dashboard breakage risk.

Key Trade-offs

  • Shared stream topics cut cost, but partition mistakes became risky
  • Hourly batch jobs were simple, but incident analysis felt delayed

Notes

  • Schema drift broke charts more often than pipeline failures.
  • Small tenants overpaid when using the same processing profile as large tenants.

Future Improvements

  • Move top KPIs to incremental processing
  • Add producer contract tests to catch schema drift early