Every engineer who has containerised a SIP proxy has had this moment: pods green, health checks passing, first call dead. No logs. No events. The app is fine. The network is not.
Getting SIP Kubernetes deployments and cloud-native WebRTC infrastructure to work reliably isn’t a matter of containerising existing workloads and hoping for the best. The protocols carry fundamentally different networking assumptions, and those assumptions collide with K8s defaults at four layers:
- Network
- Ingress
- Autoscaling
- Observability
This article covers each one, what breaks, why, and what actually works for teams tackling SIP WebRTC scaling in production. Whether you’re building this architecture in-house or working with dedicated WebRTC developers, understanding where the defaults fail is the foundation everything else is built.
Why Standard Kubernetes Networking Doesn’t Work for SIP and RTP Out of the Box?
The same defaults that make K8s great for web services are exactly what break real-time media.
Here’s where each assumption collapses:
- K8S is built for TCP/HTTP: Ingress controllers, service meshes, health checks, and load balancing all assume stateless HTTP. SIP and RTP don’t speak that language.
- SIP Needs Dynamic UDP Port Ranges: Signalling runs on 5060/5061. RTP media uses 10,000–20,000 UDP, negotiated per call. At 500 concurrent calls, that’s 1,000 active port bindings. Kubernetes Services expose individual ports only; there is no native support for port ranges. This is the root of almost every SIP Kubernetes networking workaround.
- WebRTC Needs a Real External IP: ICE negotiation requires the media server to advertise a stable, routable address. K8s overlay hides pod IPs. A pod IP in an ICE candidate fails with every external client, every time, which is why WebRTC Kubernetes deployments need explicit WebRTC NAT traversal architecture, not just containerisation.
- Both Protocols Are Stateful: A SIP dialog and a WebRTC session must remain in the same pod for their entire duration. K8s treats pods as interchangeable. Rolling upgrades drop active calls unless you explicitly build around this.
The takeaway is straightforward: containerising SIP or WebRTC without addressing these four mismatches doesn’t produce a cloud-native deployment; it produces a broken one with no error logs to debug.
What Is the HostNetwork Anti-Pattern and Why Is It So Common Despite Being Problematic?
Engineers don’t use hostNetwork because they don’t know better; they use it because it works immediately. Understanding why it eventually fails is what separates a staging deployment from a production one.
The three most common workarounds, and what they actually cost:
| Anti-Pattern | Appears to Solve | Production Ceiling |
| hostNetwork: true | Real IPs, port binding, no overlay NAT | One pod per node max — HPA meaningless; node-level blast radius on any compromise |
| NodePort for RTP Ranges | UDP exposure without hostNetwork | 10,000+ iptables rules degrade cluster networking; rejected by every security review |
| NGINX / Traefik for SIP | Reuses existing K8s ingress | Silent packet drops, broken SIP state, one-way audio, no error log to explain why |
The hostNetwork reality check: It’s documented as a workaround in Kamailio Kubernetes deployment guides, STUNner docs, and most community threads. It passes every staging test. It fails every security review and every scaling milestone. The correct fix isn’t to remove host-level networking; it’s to confine it to a single justified edge layer.
What all three anti-patterns share is the same root cause: they treat SIP and WebRTC as HTTP workloads with extra steps. The architecture patterns in the next section treat them as what they actually are.
How to Handle WebRTC NAT Traversal and SIP Media in K8s Without hostNetwork?
Eliminating hostNetwork doesn’t mean accepting broken media paths; it means replacing a blanket workaround with four targeted patterns, each solving a specific layer of the stack.
Here’s what that looks like in practice:
- SIP Ingress: Kamailio or OpenSIPS as a DaemonSet with hostNetwork: true at the edge only. One pod per node, bounded scope. All backend FreeSWITCH Kubernetes pods run in standard pod namespaces; no special privileges are required.
- WebRTC Media: STUNner (l7mp/stunner) is the cloud-native WebRTC TURN gateway built for Kubernetes. Single ingress port, no thousands of open UDP ports, no host networking on media pods. Scales like any K8s Deployment.
- SIP RTP Media: Multus CNI assigns each media server pod a secondary NIC that connects directly to the physical network. RTP bypasses the overlay. The pod stays schedulable and unprivileged.
- Session Stickiness: StatefulSets for media servers give stable DNS hostnames (media-0, media-1…) that survive restarts—Kamailio routes by these names, not ephemeral pod IPs.
None of these patterns requires compromising Kubernetes-native scheduling or security posture. The goal is a platform where SIP and WebRTC workloads behave like first-class Kubernetes citizens, schedulable, scalable, and auditable, not infrastructure exceptions that need special handling on every node.
Why CPU-Average HPA Fails for RTP Workloads and What Metric Actually Works?
Many teams only discover this problem in production. By the time audio quality starts degrading, the autoscaling logic has already failed.
The issue isn’t with Kubernetes Horizontal Pod Autoscaler (HPA) itself. The problem is the signal it is using. CPU averages do not accurately represent the load characteristics of real-time media workloads such as RTP.
Consider what runtime conditions actually look like in a typical deployment:
- 10 media pods are running in the cluster
- 3 pods are handling ~480 active calls each and are close to capacity
- The remaining 7 pods are completely idle
- Cluster-wide CPU average remains low, so HPA does not scale
- Meanwhile, callers connected to the overloaded pods begin experiencing audio degradation
Because HPA evaluates cluster averages, the imbalance remains invisible until the threshold is crossed. By the time scaling finally triggers, call quality has already suffered.
The solution is not complicated, but it requires using the correct workload signal.
Instead of CPU utilization, scale based on the number of active media sessions per pod.
This metric can be retrieved directly from the media server:
- FreeSWITCH → Event Socket Layer (ESL)
- Asterisk → ARI
- Janus → HTTP API
Expose this data through a custom Prometheus exporter, publish it to custom.metrics.k8s.io, and configure HPA to scale based on session load.
A practical policy looks like this:
- Scale up when any pod approaches its tested maximum session capacity
- Scale down when sufficient capacity exists across the cluster to absorb new calls safely
Autoscaling alone is not enough, however. Real-time media platforms must also prevent calls from being terminated during infrastructure operations such as rolling upgrades.
A safe deployment strategy includes:
- Mark the pod as NotReady so new calls are no longer routed to it.
- Use a PreStop hook to poll the active session count and wait until it reaches zero.
- Set terminationGracePeriodSeconds to 3600 to protect long-running conference calls.
- Configure maxUnavailable: 0 so a pod never terminates before its replacement is healthy.
Getting autoscaling right for real-time media is not a small configuration change. It requires rethinking how the platform measures load.
Teams that move from CPU-based scaling to session-based scaling stop reacting to call quality incidents and start preventing them.
However, even with correct scaling in place, the platform is still operating blind without the right observability layer.
What Observability a Production SIP/WebRTC K8s Deployment Needs That Generic Monitoring Misses?
Generic Kubernetes monitoring was built to answer one question: Is the infrastructure healthy? For SIP and WebRTC, that’s the wrong question. The infrastructure can be perfectly healthy while calls are actively failing, and that’s exactly the scenario standard monitoring leaves you blind to.
kube-state-metrics and node-exporter tell you your pods are running, your nodes are stable, and your cluster is fine. None of that tells you whether a SIP registration is completed, whether an RTP stream is degrading mid-call, or whether a registration storm is 30 seconds away from saturating your signalling layer. That gap is what a production SIP and WebRTC observability stack exists to close.
What that stack actually needs to cover:
- Active Concurrent Call Count Per Pod: Not in kube-state-metrics. Custom exporter on FreeSWITCH ESL / Asterisk ARI. This is your primary capacity and autoscaling signal — without it, every scaling decision is a guess.
- SIP Transaction Success Rate by Response Code: 4xx/5xx SIP failures are invisible to HTTP monitoring. Homer/HEP as a sidecar or cluster service is the production standard for SIP call tracing and the only way to catch call setup failures before users report them.
- RTP Packet Loss and Jitter Per Session: Aggregate network metrics hide per-call degradation entirely. RTPEngine exposes per-session quality metrics; wire its API into Prometheus to catch quality issues at the session level rather than the node level.
- SIP Registration Storm Rate: A spike in REGISTER requests at the edge DaemonSet is the earliest warning you have before signalling saturation. By the time CPU or memory metrics reflect the problem, the storm has already hit. Available from the Kamailio Prometheus module.
The tooling to build this stack exists and is production-proven: Homer for SIP tracing, RTPEngine for media quality, and custom exporters for call count. The teams that build it from day one aren’t just monitoring better, they’re operating a fundamentally different kind of platform. One where incidents get caught before callers notice them, not after.
Without this layer in place, production SIP and WebRTC infrastructure isn’t operated; it’s reacted to.
In a Nutshell
The patterns covered in this article are solved problems. Kamailio, STUNner, RTPEngine, and Homer are production-proven, actively maintained, and well-documented. The tooling is mature. None of it is a default, and that’s precisely the point. Every decision covered here requires someone who has made it before under real production pressure, not someone learning it for the first time on your infrastructure.
For teams moving fast on a UCaaS or CPaaS build, that distinction matters more than most realise. If the engineering bandwidth or protocol-specific depth isn’t available in-house, working with specialists who work in this stack day to day is often the faster path to a deployment that actually holds up. A hire VoIP Developer works with teams at exactly this layer, from SIP Kubernetes architecture to cloud-native WebRTC implementation, bringing production context that would otherwise take months to build internally.
