FusionFlow: System Architecture

Introduction

FusionFlow did not start as a business idea. It started as a technical challenge.

I wanted to see how far I could push a fully self-managed real-time platform: live video, chat, multilingual support, and monetization logic, without relying on third-party real-time providers.

The goal was not to build another streaming site. The goal was to design something where I control the networking layer, the backend logic, and the infrastructure. If something breaks, it is my fault, which is slightly stressful, but also more interesting.

This first article focuses on the system architecture and the thinking behind it.

What this article covers

  • The design principles behind FusionFlow
  • How media, control, and infrastructure are separated
  • Why self-hosting was a deliberate choice
  • The trade-offs this architecture introduces

Design Principles

Before writing code, I defined a few constraints for myself:

  1. No managed WebRTC providers.
  2. No SaaS translation APIs.
  3. Strongly typed backend as the control plane.
  4. Clear separation between media and business logic.
  5. Full infrastructure ownership.

This obviously increases complexity. But it also forces clarity. When you own everything, you cannot hide behind abstractions.

FusionFlow is designed in layers:

  • Media layer: peer-to-peer whenever possible.
  • Control layer: the backend owns identity, authorization, room lifecycle, token accounting, and stream entitlements.
  • Service layer: supporting infrastructure remains isolated and replaceable.

Keeping those boundaries clean turned out to be one of the most important decisions.

High-Level Architecture

flowchart LR
  classDef node rx:18,ry:18,stroke-width:1.5px;
  linkStyle default stroke-width:1.4px;

  Users["Users<br/>Viewers<br/>Streamers<br/>Studios"]:::node

  subgraph Platform["FusionFlow Platform"]
    direction LR

    Application["Application<br/>Web Experience"]:::node

    Core["Core Services<br/>Identity<br/>Rooms<br/>Tokens<br/>Governance"]:::node

    Interaction["Interaction Layer<br/>Live Presence"]:::node

    Application --> Core
    Application --> Interaction
  end

  subgraph Foundations[" "]
    direction TB
    Data["Data Foundation<br/>Persistence<br/>State"]:::node
    Media["Media Infrastructure<br/>Connectivity<br/>Delivery"]:::node
  end

  Users --> Application
  Core -->|State| Data
  Interaction -->|State| Data
  Interaction --> Media

  style Platform stroke-width:2px,rx:22,ry:22
  style Foundations stroke-width:0px

High-level view of FusionFlow architecture and service boundaries.


At a high level, FusionFlow consists of:

  • React frontend
  • Java Spring Boot backend
  • Peer-to-peer media transport (WebRTC)
  • WebSocket signaling
  • self-hosted Coturn for TURN relay
  • self-hosted LibreTranslate
  • Nginx as gateway
  • logging and monitoring stack

The frontend handles WebRTC session management, device access, reconnection logic, and UI state.

The backend acts as the control plane. It owns identity, authorization, room lifecycle, token accounting, and stream entitlements. Media infrastructure never makes business decisions. It only executes signed permissions issued by the backend.

Infrastructure services are containerized and isolated so they can evolve independently. Nothing is tightly coupled to a specific vendor or managed service.

Signaling coordinates SDP exchange, ICE negotiation, and session lifecycle, including reconnection across unstable mobile networks.

Media Plane vs Control Plane

One key architectural decision was separating the media plane from the control plane.

Media plane (data path)

The media plane is simple in theory:

  • video and audio streams flow directly between peers with WebRTC
  • when direct connectivity fails, traffic is relayed through TURN
  • the backend never processes raw media streams

Control plane (decision path)

The control plane is where the logic lives. The backend handles:

  • SDP exchange over WebSocket
  • ICE candidate distribution
  • authentication and stream key validation
  • role management and chat events
  • monetization rules and enforcement

Separating media from control ensures that bandwidth-heavy transport cannot starve decision-making logic. Media servers move bits. The backend enforces rules.

It also forces discipline. Media problems stay media problems. Business logic stays business logic.

Why Self-Hosting Everything

Choosing to self-host was deliberate.

FusionFlow runs its own TURN server, translation service, logging pipeline, and reverse proxy.

This provides cost predictability. There is no per-minute WebRTC billing surprise at the end of the month.

It also gives full control over networking behavior, which becomes very important once you start dealing with mobile networks, VPN users, and NAT traversal edge cases.

Hosting LibreTranslate internally also avoids per-character API billing and keeps translation traffic under my control.

Of course, this comes with operational overhead:

  • certificates expire
  • TURN misconfiguration manifests as silent failure modes that are difficult to diagnose without proper logging.
  • translation services consume CPU
  • logs need monitoring and retention discipline

Observability is treated as a first-class concern. Metrics and structured logs make media and signaling behavior measurable rather than guesswork.

But running the full stack yourself teaches you where the real complexity actually is.

Trade-Offs

This architecture is not the simplest path.

  • It increases operational responsibility.
  • It requires infrastructure tuning.
  • It removes the safety net of managed services.

It would have been easier to integrate a real-time SaaS provider and move on.

But the purpose of FusionFlow was to explore how real-time systems behave in production-like conditions: VPN users, mobile carriers, CGNAT, packet loss, and latency spikes.

Those problems do not show up in tutorials. They show up when you own the entire pipeline.

And that is exactly the point.

Next

In the next article, I will go deeper into the WebRTC implementation itself:

  • signaling design
  • ICE negotiation
  • TURN configuration
  • what happens when users connect from networks that do not behave nicely

That is where things become interesting.