The Dataflow Model | Devil King's Blog

论文地址

针对Unbounded, unordered, global-scale datasets, 考虑correctness, latency, and cost

举的例子，流式视频的广告投放问题

关于数据处理的几个重点：

exactly-once semantics
temporal primitives necessary for windowing, not provide windowing semantics that are limited totuple- or processing-time-based windows
provide event-time-based windowing either rely on ordering (SQLStream [27]),or have limited window triggering3semantics in event-timemode (Stratosphere/Flink)

a single unified model

Allows for the calculation of event-time5ordered re-sults, windowed by features of the data themselves,over an unbounded, unordered data source, with cor-rectness, latency, and cost tunable across a broad spec-trum of combinations
Decomposes pipeline implementation across four re-lated dimensions, providing clarity, composability, andflexibility:
- – Whatresults are being computed.
- – Wherein event time they are being computed.
- – Whenin processing time they are materialized.
- – Howearlier results relate to later refinements.
Separates the logical notion of data processing fromthe underlying physical implementation, allowing thechoice of batch, micro-batch, or streaming engine tobecome one of simply correctness, latency, and cost

那么总结起来：

a window model
a triggering model
a incremental processing model
scalable implements

Window

When dealing with unbounded data,windowing is required for some operations (to delineate fi-nite boundaries in most forms of grouping: aggregation,outer joins, time-bounded operations, etc.), and unneces-sary for others (filtering, mapping, inner joins, etc.).

Fixed，静态window
Sliding，滑动窗口
Sessions，windows that capture some period of activ-ity over a subset of the data，按照key以及timeout来计算

(particularly time management [28] and semantic models [9], but alsowindowing [22], out-of-order processing [23], punctuations[30], heartbeats [21], watermarks [2], frames)

DATA FLOW MODEL

核心要素：

ParDo, Each input ele-ment to be processed (which itself may be a finite col-lection) is provided to a user-defined function (calledaDoFnin Dataflow), which can yield zero or more out-put elements per input. (多输出设定？)
GroupByKey, key-grouping (key, value) pairs

可map也可以reduce

support for un-aligned windows, for which there are two key insights. Thefirst is that it is simpler to treat all windowing strategiesas unaligned from the perspective of the model, and allowunderlying implementations to apply optimizations relevantto the aligned cases where applicable.
windowing can be broken apart into two related operations: assignWindow and mergedWindow

pass (key, value, eventtime, window) 4-tuples

window assign->window merge:

DropTimestamps- Drops element timestamps, asonly the window is relevant from here on out9.
GroupByKey- Groups (value, window) tuples by key.
MergeWindows- Merges the set of currently bufferedwindows for a key. The actual merge logic is definedby the windowing strategy. In this case, the windowsforv1andv4overlap, so the sessions windowing strat-egy merges them into a single new, larger session, asindicated in bold.
GroupAlsoByWindow- For each key, groups valuesby window. After merging in the prior step,v1andv4are now in identical windows, and thus are groupedtogether at this step
ExpandToElements- Expands per-key, per-windowgroups of values into (key, value, eventtime, window)tuples, with new per-window timestamps. In this ex-ample, we set the timestamp to the end of the window,but any timestamp greater than or equal to the times-tamp of the earliest event in the window is valid withrespect to watermark correctness

有关watermarks，too fast or too slow

A useful insight in addressing the complete-ness problem is that the Lambda Architecture effectivelysidesteps the issue: it does not solve the completeness prob-lem by somehow providing correct answers faster; it simplyprovides the best low-latency estimate of a result that thestreaming pipeline can provide, with the promise of eventualconsistency and correctness once the batch pipeline runs

Windowing determines where in event time data are grouped together for processing.
Triggering determines when in processing time the results of groupings are emitted as panes

trigger同window的结合

Discarding: Upon triggering, window contents arediscarded, and later results bear no relation to previ-ous results.
Accumulating: Upon triggering, window contentsare left intact in persistent state, and later results be-come a refinement of previous results.
Accumulating & Retracting: Upon triggering, inaddition to theAccumulatingsemantics, a copy of theemitted value is also stored in persistent state. Whenthe window triggers again in the future, a retraction forthe previous value will be emitted first, followed by thenew value as a normal datum(反馈+反向更新)

design principles

Never rely on any notion of completeness.
Be flexible, to accommodate the diversity of known usecases, and those to come in the future.
Not only make sense, but also add value, in the contextof each of the envisioned execution engines.
Encourage clarity of implementation.•Support robust analysis of data in the context in whichthey occurred

One particularly large log join pipeline runs in streamingmode on MillWheel by default, but has a separate Flume-Java batch implementation used for large scale backfills.