![]() ![]() This allows airflow to dynamically fetch configurations from other services on the platform - like our web app, feature flags, or other business logic. The first step towards this architecture was to get our airflow clusters to talk to a centralized configuration store on our platform. dag_factory - folder with all our DAGs in a factory pattern with a set format of standardized methods.projects//main.py - the core file where we will call the factory methods to generate DAGs we want to run for a project.projects//config.py - a file to fetch configuration from airflow variables or from a centralized config store.In practice, the file structure might look something like this snippet. Your deployment pipeline can then pick the correct DAGs from the monorepo to push to each of the airflow clusters. You can either duplicate DAGs in each project folder or have all your shared DAGs in a shared_dags folder. One possible approach is to have a monorepo with individual folders for each of your projects. To set the stage, throughout this article we will assume that we want to execute two complex tasks in airflow - process_message & process_invoices.īefore we dive into the new setup, it’s important to take a quick detour to see how this would be generally done in airflow. ![]() DAG Factories - Using a factory pattern with python classes that generate DAGs automatically based on dynamic input to the system.Įnough with the backstory, it’s time to get to the exciting part. This lead to the inception of DAG factories. Our main goal was to move away from the declarative format of deploying airflow and move more towards dynamically generated DAGs for flexibility and scalability - allowing us to quickly change what was running on airflow with as little as a feature flag modification. As we began to push airflow to its limits we recently undertook a reworking of how we deploy to airflow clusters - and dare I say found a better way to use airflow. Moreover, we manage multiple airflow deployments and also run massive multi-tenant airflow clusters running a plethora of workloads. ![]() Airflow allows you to write complex workflows in a declarative manner and offers many out-of-box operators to do complex tasks.Īt Flywheel Software we heavily use Airflow for complex ETL pipelines, machine learning pipelines, and statistical analysis. Originally hailing from the corner of Airbnb, this widely used project is now under the Apache banner and is the tool of choice for many data teams. If you’ve been on the hunt for a workflow management platform, my guess is you’ve come across Apache Airflow already. ![]()
0 Comments
Leave a Reply. |