systems design

Two practical system design cases with constraints, measurable targets, trade-offs, and implementation decisions.

Pull Request Review Queue

Problem: Engineers needed fast feedback on pull requests, but synchronous checks were slowing merges.

Impact Snapshot

Feedback SLA

< 2 minutes average

Durability goal

0 dropped jobs during spikes

Auditability

100% traceable review comments

Architecture Flow

Git webhooks→

Job queue→

Worker pool→

Rules + model service→

Review results store

Requirements / Constraints

keep average feedback time under 2 minutes
handle release-day spikes without dropping jobs
keep a clear audit trail for every review comment

Role & Decisions

Designed queue topology and worker concurrency boundaries.
Introduced repository-priority lanes to reduce merge-day tail latency.
Defined retry strategy for oversized diffs to protect feedback SLA.

Key Trade-offs

Queueing improved reliability, but users lost instant responses
Shared worker pools improved utilization, but noisy repos affected tail latency

Notes

Most delays came from retries on very large diffs, not model latency.
Separate queues per repository priority reduced merge-day complaints.

Future Improvements

Add diff-size based routing to dedicated workers
Cache repeated review hints for common file patterns

Proof Links

Tenant Analytics Pipeline

Problem: Product and support teams needed daily reports plus near real-time usage visibility per tenant.

Impact Snapshot

Dashboard freshness target

< 1 hour

Storage strategy

low-cost raw event retention

Isolation requirement

tenant-level boundaries by design

Architecture Flow

Client events→

Ingestion API→

Stream bus→

ETL jobs→

Warehouse marts→

Dashboards

Requirements / Constraints

strict tenant-level data isolation
cheap long-term storage for raw events
dashboard freshness within one hour

Role & Decisions

Designed ingestion-to-mart flow for daily and near real-time reporting.
Set partition and processing profiles to avoid tenant cost imbalance.
Added schema-governance guardrails to reduce dashboard breakage risk.

Key Trade-offs

Shared stream topics cut cost, but partition mistakes became risky
Hourly batch jobs were simple, but incident analysis felt delayed

Notes

Schema drift broke charts more often than pipeline failures.
Small tenants overpaid when using the same processing profile as large tenants.

Future Improvements

Move top KPIs to incremental processing
Add producer contract tests to catch schema drift early