The Delta Join in Apache Flink: Architectural Decoupling for Hyper-Scale Stream
The Paradigm Shift in Flink Stream Joins
What Delta Join Solves
Apache Flink has always been great at stateful stream processing, but here's the thing: traditional stream joins hit a wall when dealing with massive amounts of data and high-cardinality keys. The problem? You need to keep all the historical data in Flink's state forever to ensure correctness, and that's just not sustainable.
Delta Join (FLIP-486) changes everything. Instead of buffering everything internally, it turns joins into a stateless lookup mechanism that directly queries data in tables like Apache Fluss or Apache Paimon.
The Real Impact of Delta Join
Here's what Delta Join actually does: it separates compute from history. Instead of keeping everything in Flink's state, the operator just looks things up to external storage when needed. No more explosive state growth.
The numbers tell the story. Production deployments of Taobao team (Alibaba's ecommerce business) have seen:
-
50TB of join state eliminated - imagine that
-
10x cost reduction (2300 CU down to 200 CU) while maintaining the same throughput
-
80%+ CPU and memory savings
-
87% faster job recovery times
-
Second-level checkpointing instead of waiting forever
This isn't just incremental improvement - it's a complete game changer for large-scale stream processing.
The Unbounded State Crisis: Why Traditional Joins Fail at Scale
Why Regular Joins Break at Scale
Regular Joins in Flink are incredibly flexible - they handle Insert, Update, Delete operations perfectly. But here's the catch: you have to keep ALL the history from both streams in Flink's state forever.
Since streaming jobs run indefinitely, that state just keeps growing. Forever. And in high-cardinality scenarios? That's a disaster waiting to happen.

The problems pile up fast:
-
Resource Death: Your Task Managers get crushed under the weight of all that state
-
Checkpoint Nightmares: Checkpointing takes forever, jobs become unstable, timeouts everywhere
-
Recovery Hell: Restoring 100TB+ of state from storage? Plan for coffee breaks
What We Had Before Delta Join
Before Delta Join, Flink had some limited options:
- Interval Joins: Only work with time windows and append-only tables. Great if your use case fits those constraints, but most real-world scenarios don't.
- Temporal/Lookup Joins: Good for joining a stream with a dimension table, but they're one-way only. You can't use them for stream-to-stream joins where both sides need historical access.
The core issue? Regular Joins forced Flink to duplicate data that was already stored elsewhere. It was like keeping copies of your database in memory just in case - inefficient and unsustainable.
Architectural Deep Dive into Delta Join
The Delta Join Philosophy: Compute vs History
Delta Join (FLIP-486) is all about separation of concerns. The principle is simple: "probe on demand, minimal job state, eventual consistency."
When an event hits either side of the join, the operator doesn't dig through internal history. Instead, it queries an external index in real-time. No more hoarding data - just look it up when you need it.
How StreamingDeltaJoinOperator Actually Works
The StreamingDeltaJoinOperator is the engine that makes this all happen. Here's the key components:
-
LRU Caches (Both Sides): Check these first before hitting external storage. Hot data stays in memory, cold data gets evicted. Simple but effective.
-
Asynchronous Probing: No waiting around - when you miss the cache, fire off the lookup and keep processing.
-
AsyncDeltaJoinRunner: Two instances (one per side) manage the cache and external I/O operations.
Important: Delta Join isn't truly stateless - it's a hybrid model. The operator still keeps LRU caches and coordination state for consistency. The performance now depends on cache hit rates and external lookup latency.
Keeping Things Correct: Async Ordering
Here's the tricky part: when you have parallel async lookups, you can get updates for the same key arriving out of order. That's bad news for correctness.
Delta Join uses FLIP-519's KeyedAsyncWaitOperator to solve this. It ensures that all operations for a single key happen in sequence, while different keys can still run in parallel. Smart approach - you get the throughput benefits without sacrificing correctness.
External State Storage: Integration with the Real-Time Lakehouse
Why Fluss is Perfect for Delta Join
Apache Fluss (incubating) was built specifically for this use case. It's a disaggregated table storage engine designed from the ground up for Apache Flink.

Here's what makes it work:
-
Distributed Architecture: Coordinator + tablet servers with RocksDB backing for mutable tables
-
Dual Structure: KV store + log tablet = point-in-time lookups + CDC streams
-
Prefix Lookups: This is the killer feature. You can query using partial composite keys (e.g., just
customer_idinstead of the fullcustomer_id, order_id, item_id). Most systems require exact matches - Fluss is much more flexible.
The Future: Apache Paimon Integration
While Apache Fluss was the initial, purpose-built enabler for Delta Join, the Flink community recognizes the need to expand this concept to leverage open-source data lake formats. Continuous optimization of join performance remains a strategic focus. The roadmap for Flink SQL explicitly includes plans to add support for other storage systems, notably Apache Paimon, to facilitate near-real-time Delta Join capabilities.
Apache Paimon offers:
-
Primary-key tables with real-time streaming updates
-
Query access within minutes
-
Flexible merge engines (deduplication, partial updates, aggregation)
-
Integration with Spark, Hive, Trino
The goal: make Delta Join work across the entire Lakehouse ecosystem, not just Fluss.
Quantitative Impact and Operational Stability
The numbers don't lie - Delta Join delivers serious operational improvements.
The Big Wins
-
State Elimination: No more 100TB+ state files. No more checkpoint timeouts. No more job failures.
-
Resource Savings: 80%+ CPU and memory reduction. One case went from 2300 CU to 200 CU - that's a 10x cost cut while maintaining the same throughput.
Operational Stability Improvements
-
Checkpointing: No more waiting forever for checkpoints. Delta Join enables second-level checkpointing - that's how fast it is.
-
Recovery: 87% faster bootstrap times. When things go wrong, you get back up and running quickly.
Bonus: Since your join history lives in external storage, you can use it for other things too. One team reduced data reprocessing from 4 hours to 30 minutes by using Sort-Merge Joins on the externalized tables.
Implementation, Configuration, and Use Cases
Using Delta Join in Your SQL
The best part? Delta Join works with standard SQL. No special syntax needed - just write your regular JOIN queries like SELECT * FROM orders INNER JOIN Product ON orders.productId = Product.id.
The optimizer is smart enough to automatically choose Delta Join when:
-
The SQL pattern supports transforming the Regular Join into a Delta Join
-
You have the right external storage configured
In recent Flink versions, this often happens automatically.
Configuration Knobs
You can tune these parameters to optimize performance:
-
table.optimizer.delta-join.strategy='AUTO'- Let Flink decide whether a regular join can be converted to a delta join when applicable. Default is AUTO. -
table.exec.delta-join.cache-enabled='true'- Whether to enable the cache of delta join. Default is true. -
table.exec.delta-join.left.cache-size=10000andtable.exec.delta-join.right.cache-size=10000- The cache size used to cache the lookup results of the left and right table in delta join. -
table.exec.async-lookup.buffer-capacity=100- The max number of async i/o operation that the delta join can trigger to lookup
When to Use Delta Join
Delta Join shines in these scenarios:
-
High-Cardinality Stream Enrichment: Joining massive event streams (clicks, transactions) with large, frequently-updated dimension tables (customer profiles, catalogs). No state explosion.
-
Real-Time Traceability: Since all join history lives in external storage, you can debug and audit exactly what data was used for any calculation.
-
Complex Change Tracking: Handle Insert/Update/Delete operations on dynamic tables while keeping Flink's internal state minimal.
Conclusions and Outlook
Delta Join (FLIP-486) isn't just a new feature - it's a fundamental shift in how we think about stream processing at scale.
The trade-off is clear: instead of managing massive state within Flink's checkpoint domain, you pay the latency cost of external lookups. For enterprise-scale real-time applications, this is a no-brainer. You get:
-
Massive operational stability improvements
-
10x cost reduction with >80% resource efficiency
-
Lightning-fast job recovery
Right now, it works great with specialized systems like Fluss (with those awesome prefix lookups). The future roadmap includes Apache Paimon and other Lakehouse formats, which means Delta Join will become a standard capability across the entire streaming ecosystem.
This is how Flink stays ahead as the go-to engine for large-scale, stateful stream processing.
See Also
-
Alibaba Cloud. (2025). Flink 2.1 SQL: Unlocking Real-time Data & AI Integration for Scalable Stream Processing.
A deep dive into Apache Flink 2.1’s SQL capabilities, focusing on real-time data processing and integration with AI workloads.
- Apache Fluss Team. (2025). Taobao Practice with Apache Fluss.
An in-depth case study from Alibaba showcasing how Taobao leverages Apache Fluss for scalable streaming storage and real-time analytics.