VR

Research · IEEE InC4 2025 · IEEE Xplore

Zero-Trust principles in distributed API infrastructure

Peer-reviewed · Published IEEE InC4 2025 · Real-time anomaly detection · Scapy + K-Means

01 Context

Most API security architectures are perimeter-based: once you're inside the network, you're trusted. Zero-Trust rejects that premise. The model — "never trust, always verify" — sounds straightforward as a policy decision. The interesting question is what it actually requires as an engineering one.

This paper came out of a gap I kept noticing in the literature: most Zero-Trust implementations described the policy layer in detail and hand-waved the enforcement layer. What does continuous verification look like when the verifier is itself a distributed service? What are its failure modes? How do you handle the latency cost of per-request verification at scale?

02 Problem

The specific problem we examined: how to implement real-time anomaly detection across service boundaries in a Zero-Trust architecture without creating a single point of failure in the verification path.

Existing approaches either centralized verification (simple but fragile) or distributed it naively (eliminated the SPOF but created consistency problems). The middle ground — where verification is distributed but coordinated — is where most of the interesting engineering tradeoffs live.

03 Approach

We used Scapy for packet-level traffic capture and K-Means Clustering for anomaly detection. The detection model runs at the network layer, not the application layer — this keeps it out of the request path and means it degrades gracefully if the detection service is slow or unavailable.

The key design decision: anomaly scores are advisory, not blocking, by default. An anomalous request still passes through; the score is logged and surfaced to the application layer which can choose to act on it. This trades some security guarantees for significant operational simplicity — you don't need to solve distributed consensus to decide whether a request should be blocked.

The verification architecture separates concerns: network-layer capture, statistical model, policy evaluation, and enforcement are independent components. Each can fail independently without taking down the others.

04 Findings

The detection model achieved acceptable precision on the test dataset. More interesting than the accuracy numbers was where it failed: the model had trouble distinguishing between legitimate traffic spikes and coordinated probing at similar request rates. This is a known limitation of rate-based anomaly detection and points toward behavioral fingerprinting as the right next layer.

The latency cost of the detection pipeline was lower than expected — under 5ms at the 95th percentile when running alongside production traffic. Most of that is dominated by the Scapy capture overhead, not the model inference.

The broader finding: Zero-Trust enforcement at the network layer is tractable with commodity tooling. The hard part isn't the detection — it's the policy evaluation and the organizational question of who owns the detection model and when it gets updated.

05 Retrospective

The paper was accepted to IEEE InC4 2025. The reviewers pushed back hardest on the evaluation methodology — specifically, the test dataset wasn't adversarial enough to stress-test the detection model in ways that would matter in production. This is a fair criticism and one I'd address differently if I were writing it now.

What I'd change: run the detection model against a red team simulation, not just traffic logs. The difference between "detects anomalies in logged traffic" and "detects adversarial probing in real time" is significant enough to require separate evaluation.

What held up: the architecture separating detection from enforcement. That decision made the system much easier to reason about and much easier to test independently. The next paper in this space would be about the enforcement coordination problem specifically.

Zero-Trust in Distributed API Infrastructure — Vishesh Rawal