Workflow Basics
Learn the fundamentals of writing Dagu workflows.
Your First Workflow
Create hello.yaml
:
steps:
- echo "Hello from Dagu!"
Run it:
dagu start hello.yaml
Workflow Structure
A complete workflow contains:
# Metadata
name: data-pipeline
description: Process daily data
tags: [etl, production]
# Configuration
schedule: "0 2 * * *"
params:
- DATE: ${DATE:-today}
# Steps
steps:
- name: process
command: python process.py ${DATE}
# Handlers
handlerOn:
failure:
command: notify-error.sh
Steps
The basic unit of execution.
Step Names
Step names are optional. When omitted, Dagu automatically generates names based on the step type:
steps:
- echo "First step" # Auto-named: cmd_1
- script: | # Auto-named: script_2
echo "Multi-line"
echo "Script"
- name: explicit-name # Explicit name
command: echo "Third step"
- executor: # Auto-named: http_4
type: http
config:
url: https://api.example.com
- run: child-workflow # Auto-named: dag_5
Auto-generated names follow the pattern {type}_{number}
:
cmd_N
- Command stepsscript_N
- Script stepshttp_N
- HTTP executor stepsdag_N
- DAG executor stepscontainer_N
- Docker/container stepsssh_N
- SSH executor stepsmail_N
- Mail executor stepsjq_N
- JQ executor steps
For parallel steps (see below), the pattern is parallel_{group}_{type}_{index}
.
Shorthand Command Syntax
For simple commands, you can use an even more concise syntax:
steps:
- echo "Hello World"
- ls -la
- python script.py
This is equivalent to:
steps:
- name: step 1
command: echo "Hello World"
- name: step 2
command: ls -la
depends: step 1
- name: step 3
command: python script.py
depends: step 2
Multi-line Scripts
steps:
- script: |
#!/bin/bash
set -e
echo "Processing..."
python analyze.py data.csv
echo "Complete"
Shell Selection
steps:
- name: bash-task
shell: bash
command: echo $BASH_VERSION
- name: python-task
shell: python3
script: |
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())
Dependencies
Steps run sequentially by default. Use depends
for parallel execution or to control order.
steps:
- name: download
command: wget data.csv
- name: process
command: python process.py
- name: upload
command: aws s3 cp output.csv s3://bucket/
Parallel Execution
You can run steps in parallel using explicit dependencies:
steps:
- name: setup
command: echo "Setup"
- name: task1
command: echo "Task 1"
depends: setup
- name: task2
command: echo "Task 2"
depends: setup
- name: finish
command: echo "All tasks complete"
depends: [task1, task2]
Shorthand Parallel Syntax
For simpler cases, you can use nested arrays to define parallel steps with automatic dependency management:
steps:
- echo "Step 1: Sequential"
-
- echo "Step 2a: Parallel"
- echo "Step 2b: Parallel"
- echo "Step 2c: Parallel"
- echo "Step 3: Sequential"
In this syntax:
- Steps in a nested array run in parallel
- They automatically depend on the previous sequential step
- The next sequential step automatically depends on all parallel steps
This is equivalent to:
steps:
- name: cmd_1
command: echo "Step 1: Sequential"
- name: parallel_2_echo_1
command: echo "Step 2a: Parallel"
depends: cmd_1
- name: parallel_2_echo_2
command: echo "Step 2b: Parallel"
depends: cmd_1
- name: parallel_2_echo_3
command: echo "Step 2c: Parallel"
depends: cmd_1
- name: cmd_3
command: echo "Step 3: Sequential"
depends: [parallel_2_echo_1, parallel_2_echo_2, parallel_2_echo_3]
You can have multiple parallel groups:
steps:
- echo "Start"
-
- echo "First parallel group - task 1"
- echo "First parallel group - task 2"
- echo "Middle sequential step"
-
- echo "Second parallel group - task 1"
- echo "Second parallel group - task 2"
- echo "Second parallel group - task 3"
- echo "End"
You can also mix shorthand and standard syntax:
steps:
- name: setup
command: ./setup.sh
-
- echo "Parallel task 1"
- name: test
command: npm test
env:
- NODE_ENV: test
- name: cleanup
command: ./cleanup.sh
Working Directory
Set where commands execute:
steps:
- name: in-project
workingDir: /home/user/project
command: python main.py
- name: in-data
workingDir: /data/input
command: ls -la
Environment Variables
Global Environment
env:
- API_KEY: secret123
- ENV: production
steps:
- name: use-env
command: echo "Running in $ENV"
Step-Level Environment
Steps can have their own environment variables that override DAG-level ones:
env:
- ENV: production
steps:
- name: dev-test
command: echo "Running in $ENV"
env:
- ENV: development # Overrides DAG-level
- TEST_FLAG: true
# Output: Running in development
Load from .env Files
dotenv:
- .env
- .env.production
steps:
- name: use-dotenv
command: echo $DATABASE_URL
Capturing Output
Store command output in variables:
steps:
- name: get-version
command: git rev-parse --short HEAD
output: VERSION
- name: build
command: docker build -t app:${VERSION} .
Basic Error Handling
Continue on Failure
steps:
- name: optional-step
command: maybe-fails.sh
continueOn:
failure: true
- name: always-runs
command: cleanup.sh
Simple Retry
steps:
- name: flaky-api
command: curl https://unstable-api.com
retryPolicy:
limit: 3
Timeouts
Prevent steps from running forever:
steps:
- name: long-task
command: echo "Processing data"
timeoutSec: 300 # 5 minutes
Step Descriptions
Document your steps:
steps:
- name: etl-process
description: |
Extract data from API, transform to CSV,
and load into data warehouse
command: python etl.py
Tags and Organization
Group related workflows:
name: customer-report
tags:
- reports
- customer
- daily
group: Analytics # UI grouping
See Also
- Control Flow - Conditionals and loops
- Data & Variables - Pass data between steps
- Error Handling - Advanced error recovery
- Parameters - Make workflows configurable