This model enriches dataflow computation with timestamps that represent logical points in the computation and provide the basis for an efficient, lightweight coordination mechanism
However, no existing system satisfies all three requirements:
- stream processors can produce low-latency results for non-iterative algorithms
- batch systems can iterate synchronously at the expense of latency
- trigger-based approaches support iteration with only weak consistency guarantees
- structured loops allowing feedback in the dataflow
- stateful dataflow vertices capable of consuming and producing records without global coordination, and
- notifications for vertices once they have received all records for a given round of input or loop iteration.
利用loops context来解决问题?
Loop Context in the above graph is a cycle with an ingress(I) and an egress(E) node along with a feedback (F) node.
Dataflow graphs use logical timestamps [3]. These logical timestamps contain epoch and loop counter. They are used to track the computation, loop number of the input data.
逻辑时间戳代替实际的时间戳,这样就不存在过去时间的事件,
错误处理,恢复起来,比较缓慢