Udhay Prakash Blog: November 2025

From Ad-Hoc Abuse to Enterprise Pseudo-Streaming – A STAR Story

STAR	Narrative (Gap-Free)	Critic’s Question & Reasoning (Why the Requirement Exists)
SITUATION	Business context – Internal analytics team needed real-time visibility into high-volume operational data via self-service dashboards. Initial contract – “≤ 2-day window, ≤ 1 000 rows” → sub-second REST GET. Reality – Users began POST-ing custom filters (dimensions, regex patterns, time-range up to 90 days). Pain point – Queries took 30–180 s → dashboard gateway timeout (60 s) → blank screens.	Q1: Why allow custom filters at all? A: Flexibility was a hard stakeholder requirement from Day-1: business wanted “self-service” without ticket-based schema changes. Q2: SQL injection risk? A: We never concatenated user input. All filters parsed into a safe AST → translated to search engine DSL. 100 % unit test coverage + external pentest. Q3: Why did abuse explode? A: Early wins → viral adoption → 50 % of analysts using API in 2 weeks. Feature creep became a formal requirement – leadership mandated “no restrictions” to preserve velocity.
TASK	Zero timeouts for any valid query. No long-lived HTTP connections (global concurrent users > 2 000). Incremental results – first page in < 2 s. Security & governance – audit every query, enforce retention, prevent storage bloat.	Q4: Why not just increase timeout to 300 s? A: Dashboard platform hard-coded 60 s gateway + UX degrades > 5 s wait. Q5: Why not cache everything? A: 90 % of queries were unique (different regex/time-range). Cache hit < 5 %.
ACTION	Step 1 – Input validation & job creation • POST → `202 Accepted` + `jobId` + `offset` (round-robin from fixed pool of 1 000 offsets). • Filter AST → query fingerprint (SHA-256) → deduplication cache (Redis, 5 min TTL). • Identical fingerprints → reuse same offset (instant share). Step 2 – Backend micro-batches • Worker executes in 10-row chunks → JSON → Kafka `ResultTopic` at assigned offset. • Each message: `jobId`, `chunkSeq`, `isLast`. Step 3 – Consumer experience • Dashboard connector polls from offset (or uses reactive client). • First chunk arrives in median 1.1 s. Step 4 – Governance • Offset TTL = 5 min → auto-cleanup consumer → log compaction → offset returned to pool. • Audit log: `jobId`, fingerprint, user, offset, row count. Step 5 – POC & rollout • 3-day POC: 100 concurrent 90 s queries → 100 % success. • Load test: 2 500 concurrent users → CPU < 70 %, no socket exhaustion. • Security sign-off: verified no injection path.	Q6: Why manual offset pool instead of consumer groups? A: Consumer groups require unique group.id per dashboard instance → dashboard engine cannot coordinate. Fixed offset pool gives deterministic URL (`offset=42`) embeddable in config. Q7: Why 10-row chunks? A: Balances Kafka throughput (≤ 1 ms per message) and UI rendering (10 rows = one screen page). Q8: Why 5 min TTL? A: 95th percentile query finishes in 110 s; 5 min covers stragglers + network retries. Q9: Why not Server-Sent Events (SSE)? A: SSE = one TCP socket per dashboard → 2 000 dashboards = 2 000 sockets → kernel EPOLL exhaustion. Kafka moves load to brokers.
RESULT	Timeouts eliminated – 0 % post-rollout. First-chunk latency: 95th percentile 1.2 s. Scalability: 200 → 2 500 concurrent users (12×) on same Kafka cluster. Cost: +1 microservice (offset-manager, 2 vCPU), no new hardware. Adoption: 3 additional teams adopted pattern in < 90 days. Governance: 100 % queries audited; zero security incidents. Feature status: “Pseudo-streaming API” is now official product requirement in Analytics Roadmap 2026.	Q10: Any regression? A: None. Added SLA dashboard showing chunk latency distribution – used in quarterly reviews. Q11: User abuse still possible? A: Rate-limited to 1 job/5 s per user + max 500 k rows per job → enforced in API gateway.

One-Liner for Resume / Interview

“Converted an abused ad-hoc REST API into a Kafka-mediated pseudo-streaming platform that scaled 12×, eliminated timeouts, and turned analyst flexibility into a governed enterprise feature – all while keeping HTTP stateless and injection-free.”

Turn a blocking API into an event-driven firehose by letting Kafka do the “streaming” while the HTTP tier stays stateless.

Generic, company-agnostic, and ready for your engineering blog. Copy, paste, publish.

Udhay Prakash Blog

Saturday, November 1, 2025

Turn a blocking API into an event-driven firehose by letting Kafka do the “streaming” while the HTTP tier stays stateless.

From Ad-Hoc Abuse to Enterprise Pseudo-Streaming – A STAR Story

Turn a blocking API into an event-driven firehose by letting Kafka do the “streaming” while the HTTP tier stays stateless.

Search This Blog