How to Create First DAG in Airflow? - GeeksforGeeks (2024)

Improve

Directed Acyclic Graph (DAG) is a group of all individual tasks that we run in an ordered fashion. In other words, we can say that a DAG is a data pipeline in airflow. In a DAG:

  • There is no loop
  • Edges are directed

Key Terminologies:

  • Operator: The task in your DAG is called an operator. In airflow, the nodes of the DAG can be called an operator
  • Dependencies: The specified relationships between your operators are known as dependencies. In airflow, the directed edges of the DAG can be called dependencies.
  • Tasks: Tasks are units of work in Airflow. Each task can be an operator, a sensor, or a hook.
  • Task Instances: It is a run of a task at a point in time. These are runnable entities. Task Instances belong to a DagRun.

A Dag file is a python file that specifies the structure as well as the code of the DAG.

Steps To Create an Airflow DAG

  1. Importing the right modules for your DAG
  2. Create default arguments for the DAG
  3. Creating a DAG Object
  4. Creating tasks
  5. Setting up dependencies for the DAG

Now, let’s discuss these steps one by one in detail and create a simple DAG.

Step 1: Importing the right modules for your DAG

In order to create a DAG, it is very important to import the right modules that are needed in order to make sure, that we have imported all the modules, that we will be using in our code to create the structure of the DAG. The first and most important module to import is the “DAG” module from the airflow package that will initiate the DAG object for us. Then, we can import the modules related to the date and time. After that we can import the operators, we will be using in our DAG file. Here, we will be just importing the Dummy Operator.

# To initiate the DAG Objectfrom airflow import DAG# Importing datetime and timedelta modules for scheduling the DAGsfrom datetime import timedelta, datetime# Importing operators from airflow.operators.dummy_operator import DummyOperator

Step 2: Create default arguments for the DAG

Default arguments is a dictionary that we pass to airflow object, it contains the metadata of the DAG. We can easily apply these arguments to as many operators, that we want.

Let’s create a dictionary named default_args

# Initiating the default_argsdefault_args = { 'owner' : 'airflow', 'start_date' : datetime(2022, 11, 12)}
  • the owner can be the owner of the DAG
  • start_date is the date DAG starts getting scheduled

We can add more such parameters to our arguments, as per our requirement.

Step 3: Creating DAG Object

After the default_args, we have to create a DAG object, by passing a unique identifier, that we call “dag_id“, Here we can name it DAG-1.

So, let’s create a DAG Object.

# Creating DAG Objectdag = DAG(dag_id='DAG-1', default_args=default_args, schedule_interval='@once', catchup=False )

Here,

  • dag_id is the unique identifier for the DAG.
  • schedule_interval is the time, how frequently our DAG will be triggered. It can be once, hourly, daily, weekly, monthly, or yearly. None means that we do not want to schedule our DAG and can trigger it manually.
  • catchup – If we want to start executing the task from the current task, then we have to specify the catchup to be False. By default, catchup is True, which means that airflow will start running the tasks for all past intervals up to the current interval by default.

Step 4: Create tasks

A task is an instance of an operator. It has a unique identifier called task_id. There are various operators, but here, we will be using the DummyOperator. We can create various tasks using various operators. Here we will be creating two simple tasks:-

 # Creating first task start = DummyOperator(task_id = 'start', dag = dag)

If you go to the graph view in UI, then you can see the task, “start” has been created.

How to Create First DAG in Airflow? - GeeksforGeeks (1)

# Creating second taskend = DummyOperator(task_id = 'end', dag = dag)

Now, two tasks start and end will be created,

How to Create First DAG in Airflow? - GeeksforGeeks (2)

Step 5: Setting up dependencies for the DAG.

Dependencies are the relationship between the operators or the order in which the tasks in a DAG will be executed. We can set the order of execution by using the bitwise left or right operators to specify the downstream or upstream fashion respectively.

  • a >> b means that first, a will run, and then b will run. It can also be written as a.set_downstream(b).
  • a << b means that first, b will run which will be followed by a. It can also be written as a.set_upstream(b).

Now, let’s set up the order of execution between the start and end tasks. Here, let us suppose that we want to start to run first, and end running after that.

# Setting up dependencies start >> end # We can also write it as start.set_downstream(end) 

Now, start and end after setting up dependencies:-

How to Create First DAG in Airflow? - GeeksforGeeks (3)

Putting all our code together,

# Step 1: Importing Modules# To initiate the DAG Objectfrom airflow import DAG# Importing datetime and timedelta modules for scheduling the DAGsfrom datetime import timedelta, datetime# Importing operators from airflow.operators.dummy_operator import DummyOperator# Step 2: Initiating the default_argsdefault_args = { 'owner' : 'airflow', 'start_date' : datetime(2022, 11, 12),}# Step 3: Creating DAG Objectdag = DAG(dag_id='DAG-1', default_args=default_args, schedule_interval='@once', catchup=False )# Step 4: Creating task# Creating first task start = DummyOperator(task_id = 'start', dag = dag)# Creating second task end = DummyOperator(task_id = 'end', dag = dag) # Step 5: Setting up dependencies start >> end 

Now, we have successfully created our first dag. We can move on to the webserver to see it in the UI.

How to Create First DAG in Airflow? - GeeksforGeeks (4)

Now, you can click on the dag and can explore different views of the DAG in the Airflow UI.


Last Updated : 24 Apr, 2023

Like Article

Save Article

Previous

How to Export Flutter App from Android Studio to Xcode?

Next

How To Check Usage of Data Logs in Microsoft Azure?

Share your thoughts in the comments

Please Login to comment...

How to Create First DAG in Airflow? - GeeksforGeeks (2024)

FAQs

How to Create First DAG in Airflow? - GeeksforGeeks? ›

A DAG is defined in Python code and visualized in the Airflow UI. DAGs can be as simple as a single task or as complex as hundreds or thousands of tasks with complicated dependencies. The following screenshot shows a complex DAG graph in the Airflow UI.

How do you create the first DAG in Airflow? ›

To create a DAG in Airflow, you'll typically follow these steps:
  1. Import necessary modules: You'll need to import airflow modules like `DAG`, `operators`, and `tasks`.
  2. Define default arguments: Set default arguments that will be shared among all the tasks in your DAG, such as start date, owner, and retries.
Sep 29, 2023

How do you implement a DAG in Airflow? ›

The following are the steps by step to write an Airflow DAG or workflow:
  1. Creating a python file.
  2. Importing the modules.
  3. Default Arguments for the DAG.
  4. Instantiate a DAG.
  5. Creating a callable function.
  6. Creating Tasks.
  7. Setting up Dependencies.
  8. Verifying the final Dag code.
Mar 18, 2022

How do I create a new DAG in Airflow UI? ›

Creating an Airflow DAG using the Pipeline UI
  1. Go to Jobs > Create Job. Under Job details, select Airflow. ...
  2. Specify a name for the job.
  3. Under DAG File select the Editor option.
  4. Click Create. You are redirected to the job Editor tab.
  5. Build your Airflow pipeline. ...
  6. When you are done with building your pipeline, click Save.

What is DAG in Python? ›

A DAG is defined in Python code and visualized in the Airflow UI. DAGs can be as simple as a single task or as complex as hundreds or thousands of tasks with complicated dependencies. The following screenshot shows a complex DAG graph in the Airflow UI.

How to create DAG in Airflow using REST API? ›

How to Trigger Airflow DAG Using REST API
  1. Enable REST API in Airflow. In the config, enable authentication and set it to basic authentication. I have version 3.8 of the docker-compose document and version 2.6. ...
  2. Build your POST request. I'm using Postman in this demo for the API requests. Define headers as follows:
Nov 10, 2023

What is the default DAG path in Airflow? ›

The DAGs folder is specified in the Airflow configuration file, and by default, it is located in the ~/airflow/dags directory.

How do you make a DAG? ›

To create a DAG one must specify: 1) the causal question of interest, thus necessitating inclusion of exposure/treatment (which we call E) and outcome of interest (D); 2) variables that might influence both E (or a mediator of interest) and D; 3) discrepancies between the ideal measures of the variables and ...

What is Airflow DAG script? ›

In Airflow, a DAG – or a Directed Acyclic Graph – is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. A DAG is defined in a Python script, which represents the DAGs structure (tasks and their dependencies) as code.

How do I create a DAGs folder in Airflow? ›

In Airflow, DAGs are defined as Python code. Airflow executes all Python code in the dags_folder and loads any DAG objects that appear in globals() . The simplest way to create a DAG is to write it as a static Python file. Sometimes, manually writing DAGs isn't practical.

How do you call another DAG in Airflow? ›

You can trigger a downstream DAG with the TriggerDagRunOperator from any point in the upstream DAG. If you set the operator's wait_for_completion parameter to True , the upstream DAG pauses and then resumes only after the downstream DAG has finished running.

How do I know if my Airflow DAG is running? ›

DAGs that have a currently running DAG run can be shown on the UI dashboard in the “Running” tab. Similarly, DAGs whose latest DAG run is marked as failed can be found on the “Failed” tab.

How do I add tasks to DAG? ›

Each DAG object has method “add_task” and “add_tasks” to manual adding tasks to DAG object from different places (without use 'dag' attribute inside task and without defining task in context). In this example, we of course, can pass dag attribute, but we want to re-use this test task on different DAGs later.

How do you write Airflow DAG in Python? ›

In order to create a Python DAG in Airflow, you must always import the required Python DAG class. Following the DAG class are the Operator imports. Basically, you must import the corresponding Operator for each one you want to use. To execute a Python function, for example, you must import the PythonOperator.

Where do I put DAG files in Airflow? ›

The default location for your DAGs is ~/airflow/dags .

What is DAG with example? ›

A directed acyclic graph (DAG) is a conceptual representation of a series of activities. The order of the activities is depicted by a graph, which is visually presented as a set of circles, each representing an activity, some of which are connected by lines, representing the flow from one activity to another.

What is the order of task execution in Airflow? ›

Ideally, a task should flow from none , to scheduled , to queued , to running , and finally to success .

How do you trigger DAG from cloud function? ›

Deploy a Cloud Function that triggers the DAG
  1. Trigger Type. Cloud Storage.
  2. Event Type. Finalize / Create.
  3. Bucket. Select a bucket that must trigger this function.
  4. Retry on failure. We recommend to disable this option for the purposes of this example.

Top Articles
Latest Posts
Article information

Author: Gregorio Kreiger

Last Updated:

Views: 6157

Rating: 4.7 / 5 (77 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Gregorio Kreiger

Birthday: 1994-12-18

Address: 89212 Tracey Ramp, Sunside, MT 08453-0951

Phone: +9014805370218

Job: Customer Designer

Hobby: Mountain biking, Orienteering, Hiking, Sewing, Backpacking, Mushroom hunting, Backpacking

Introduction: My name is Gregorio Kreiger, I am a tender, brainy, enthusiastic, combative, agreeable, gentle, gentle person who loves writing and wants to share my knowledge and understanding with you.