What is Providers in Apache Airflow – Airflow Provider packages

Posted on by By admin, in Apache Airflow | 0

Apache Airflow providers are plugins that allow Airflow to interface with external systems.The capabilities of Apache Airflow can be extended by installing additional packages, called providers.

Full list of the providers can be found here

https://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html

First we need to install this provider

pip install ‘apache-airflow-providers-docker’

Now let’s try to understand task with a simple example

from airflow import DAG
from datetime import datetime, timedelta
from airflow.providers.docker.operators.docker import DockerOperator

default_args = {
"owner":"airflow",
"email_on_failure":False,
"email_on_retry":False,
"email":"airflowadmin@airflow.com",
"retries":1,
"retry_delay":timedelta(minutes=5)
}

with DAG("forex_data_pipeline",start_date=datetime(2023,11,14),schedule_interval="@daily",
         default_args=default_args,catchup=False) as dag:

docker_task = DockerOperator(
    task_id='docker_task',
    image='python:3.7',
    api_version='auto',
    auto_remove=True, 
    command='/bin/sleep 30'
)

  • First we are creating the instance of the DockerOperator and assign this to docker_task variable
  • We always have to specify task_id the task ID must be unique across all of the operators you have in the same Dag.
  • image=’python:3.7′ specifies the Docker image that the operator will use to create a container.
  • After that we api_version this is used specify the api_version to use auto means Docker API version based on the installed Docker SDK.
  • Then we have auto_removal=true this means container will be removed once it’s done executing
  • After this we have command which as the name suggest a docker command which means it will sleep for 30 seconds

Now we test this task if this running successfully or not for that we have to run the below command

airflow tasks test forex_data_pipelinedocker_task 2023-11-01

Here forex_data_pipeline is dag id and docker_task is task id and 2023-11-01 is our execution date in past

We at Helical have more than 10 years of experience in providing solutions and services in the domain of data and have served more than 85+ clients. Please reach out to us for assistance, consulting, services, maintenance as well as POC and to hear about our past experience on Airflow. Please do reach out on nikhilesh@Helicaltech.com

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments