data traffic control in apache airflow

Scale our data pipelines to process workloads of up to several terabytes a day efficiently
Adapt to the platform’s technological and organisational shift to a micro-service architecture
Quickly detect failures and inconsistencies in the many data processes run each day/hour
Respond to internal or external faults without impacting the quality, conformity, and availability of actionable information to our business users

It provides:

Retry mechanisms to ensure that each and every anomaly can be detected, and automatically or manually healed over time (with as little human intervention as possible)
Priority aware work queue management, ensuring that the most important tasks are run first and complete as soon as possible
Resource pooling system to ensure that, in a high concurrency environment, thresholds can be set to avoid overloading input or output systems
Backfill capabilities to identify “missing” past runs, and automatically re-create and run them
Full history of metrics and statistics to view the evolution of each task performance over time, and even assess data-delivery SLAs over time
An horizontally scalable set of alternatives to the way tasks are dispatched and run on a distributed infrastructure
A centralized, secure place to store and view logs and configuration parameters for all task runs

A central database that stores all stateful information

Airflow proposes several executor out of the box, from the simplest to the most full-featured:

SequentialExecutor: a very basic, single task at a time, executor that is also the default one. You do NOT want to use this one for anything but unit testing
LocalExecutor: also very basic, it runs the tasks on the same host as the scheduler, and is quite simple to set-up. It’s the best candidate for small, non-distributed deployments, and development environments, but won’t scale horizontally
CeleryExecutor: here we are beginning to scale out over a distributed cluster of Celery workers to cope with a large sized task set. Still quite easy to set-up and use, it’s the recommended setup for production
MesosExecutor: if you’re one of the cool kids, and have an existing Mesosinfrastructure, surely your will want to leverage as a destination for your task executions
KubernetesExecutor: if you’re an even cooler kid, support for Kuberneteshas been added in version 1.10.0

不支持：

No dynamic execution — The graph built by Airflow is built ahead of the actual execution. Airflow has a dynamic DAG generation system, which can rely on external parameters (configuration, or even Airflow variables), to alter the workflow’s graph. We use this pattern a lot, but it’s not possible to alter the shape of the workflow at runtime (for instance, spawn a variable number of tasks depending on the output of an upstream task)