Skip to content

YAML Specification

Overview

Dagu workflows are defined using YAML files. Each file represents a DAG (Directed Acyclic Graph) that describes your workflow steps and their relationships.

Basic Structure

yaml
# Workflow metadata
name: my-workflow          # Optional: defaults to filename
description: "What this workflow does"
tags: [production, etl]    # Optional: for organization

# Scheduling
schedule: "0 * * * *"      # Optional: cron expression

# Execution control
maxActiveRuns: 1           # Max concurrent runs
maxActiveSteps: 10         # Max parallel steps
timeoutSec: 3600           # Workflow timeout (seconds)

# Parameters
params:
  - KEY: default_value
  - ANOTHER_KEY: "${ENV_VAR}"

# Environment variables
env:
  - VAR_NAME: value
  - PATH: ${PATH}:/custom/path

# Workflow steps
steps:
  - name: step-name
    command: echo "Hello"
    depends: previous-step

# Lifecycle handlers
handlerOn:
  success:
    command: notify-success.sh
  failure:
    command: cleanup-on-failure.sh

Root Fields

Metadata Fields

FieldTypeDescriptionDefault
namestringWorkflow nameFilename without extension
descriptionstringHuman-readable description-
tagsarrayTags for categorization[]
groupstringGroup name for organization-

Scheduling Fields

FieldTypeDescriptionDefault
schedulestring/arrayCron expression(s)-
skipIfSuccessfulbooleanSkip if already succeeded todayfalse
restartWaitSecintegerWait seconds before restart0

Schedule Formats

yaml
# Single schedule
schedule: "0 2 * * *"

# Multiple schedules
schedule:
  - "0 9 * * MON-FRI"   # 9 AM weekdays
  - "0 14 * * SAT,SUN"  # 2 PM weekends

# With timezone
schedule: "CRON_TZ=America/New_York 0 9 * * *"

# Start/stop schedules
schedule:
  start:
    - "0 8 * * MON-FRI"   # Start at 8 AM
  stop:
    - "0 18 * * MON-FRI"  # Stop at 6 PM
  restart:
    - "0 12 * * MON-FRI"  # Restart at noon

Execution Control Fields

FieldTypeDescriptionDefault
maxActiveRunsintegerMax concurrent workflow runs (-1 = unlimited)1
maxActiveStepsintegerMax parallel steps1
timeoutSecintegerWorkflow timeout in seconds0 (no timeout)
delaySecintegerInitial delay before start (seconds)0
maxCleanUpTimeSecintegerMax cleanup time (seconds)300
preconditionsarrayWorkflow-level preconditions-

Data Fields

FieldTypeDescriptionDefault
paramsarrayDefault parameters[]
envarrayEnvironment variables[]
dotenvarray.env files to load[]
logDirstringCustom log directorySystem default
histRetentionDaysintegerHistory retention days30
maxOutputSizeintegerMax output size per step (bytes)1048576

Queue Configuration

FieldTypeDescriptionDefault
queuestringQueue name-

Notification Fields

FieldTypeDescriptionDefault
mailOnobjectEmail notification triggers-
errorMailobjectError email configuration-
smtpobjectSMTP server configuration-
yaml
mailOn:
  success: true
  failure: true
  
errorMail:
  from: [email protected]
  to: [email protected]
  prefix: "[ALERT]"
  attachLogs: true
  
smtp:
  host: smtp.gmail.com
  port: "587"
  username: [email protected]
  password: ${SMTP_PASSWORD}

Handler Fields

FieldTypeDescriptionDefault
handlerOnobjectLifecycle event handlers-
yaml
handlerOn:
  success:
    command: echo "Workflow succeeded"
  failure:
    command: ./notify-failure.sh
  cancel:
    command: ./cleanup.sh
  exit:
    command: ./always-run.sh

Step Fields

Each step in the steps array can have these fields:

Basic Fields

FieldTypeDescriptionDefault
namestringRequired - Step name-
commandstringCommand to execute-
scriptstringInline script (alternative to command)-
runstringRun another DAG-
dependsstring/arrayStep dependencies-

Execution Fields

FieldTypeDescriptionDefault
dirstringWorking directoryCurrent directory
shellstringShell to useSystem default
stdoutstringRedirect stdout to file-
stderrstringRedirect stderr to file-
outputstringCapture output to variable-
envarrayStep-specific environment-
paramsstringParameters for sub-DAG-

Parallel Execution

FieldTypeDescriptionDefault
parallelarrayItems to process in parallel-
maxConcurrentintegerMax parallel executionsNo limit
yaml
steps:
  - name: process-files
    run: file-processor
    parallel:
      items: [file1.csv, file2.csv, file3.csv]
      maxConcurrent: 2
    params: "FILE=${ITEM}"

Conditional Execution

FieldTypeDescriptionDefault
preconditionsarrayConditions to check before execution-
continueOnobjectContinue workflow on certain conditions-
yaml
steps:
  - name: conditional-step
    command: ./deploy.sh
    preconditions:
      - condition: "${ENVIRONMENT}"
        expected: "production"
      - condition: "`git branch --show-current`"
        expected: "main"
    
  - name: optional-step
    command: ./optional.sh
    continueOn:
      failure: true
      skipped: true
      exitCode: [0, 1, 2]
      output: ["WARNING", "SKIP"]
      markSuccess: true

Error Handling

FieldTypeDescriptionDefault
retryPolicyobjectRetry configuration-
repeatPolicyobjectRepeat configuration-
mailOnErrorbooleanSend email on errorfalse
signalOnStopstringSignal to send on stopSIGTERM
yaml
steps:
  - name: retry-example
    command: curl https://api.example.com
    retryPolicy:
      limit: 3
      intervalSec: 30
      exitCodes: [1, 255]  # Retry only on specific codes
    
  - name: repeat-example
    command: check-status.sh
    repeatPolicy:
      repeat: true
      intervalSec: 60
      condition: "${STATUS}"
      expected: "ready"

Executor Configuration

FieldTypeDescriptionDefault
executorobjectExecutor configurationShell executor
yaml
steps:
  - name: docker-step
    executor:
      type: docker
      config:
        image: python:3.11
        volumes:
          - /data:/data:ro
        env:
          - API_KEY=${API_KEY}
    command: python process.py

Variable Substitution

Parameter References

yaml
params:
  - USER: john
  - DOMAIN: example.com

steps:
  - name: greet
    command: echo "Hello ${USER} from ${DOMAIN}"

Environment Variables

yaml
env:
  - API_URL: https://api.example.com
  - API_KEY: ${SECRET_API_KEY}  # From system env

steps:
  - name: call-api
    command: curl -H "X-API-Key: ${API_KEY}" ${API_URL}

Command Substitution

yaml
steps:
  - name: dynamic-date
    command: echo "Today is `date +%Y-%m-%d`"
    
  - name: git-branch
    command: deploy.sh
    preconditions:
      - condition: "`git branch --show-current`"
        expected: "main"

Output Variables

yaml
steps:
  - name: get-version
    command: cat VERSION
    output: VERSION
    
  - name: build
    command: docker build -t app:${VERSION} .
    depends: get-version

JSON Path Access

yaml
steps:
  - name: get-config
    command: cat config.json
    output: CONFIG
    
  - name: use-config
    command: echo "Port is ${CONFIG.server.port}"
    depends: get-config

Special Variables

These variables are automatically available:

VariableDescription
DAG_NAMECurrent DAG name
DAG_RUN_IDUnique run identifier
DAG_RUN_LOG_FILEPath to workflow log
DAG_RUN_STEP_NAMECurrent step name
DAG_RUN_STEP_STDOUT_FILEStep stdout file path
DAG_RUN_STEP_STDERR_FILEStep stderr file path
ITEMCurrent item in parallel execution

Execution Types

Chain (Default)

Steps execute based on dependencies:

yaml
steps:
  - name: A
    command: echo "A"
  - name: B
    command: echo "B"
    depends: A
  - name: C
    command: echo "C"
    depends: B

Parallel

All steps without dependencies run in parallel:

yaml
steps:
  - name: task1
    command: ./task1.sh
  - name: task2
    command: ./task2.sh
  - name: task3
    command: ./task3.sh

Complete Example

yaml
name: production-etl
description: Daily ETL pipeline for production data
tags: [production, etl, critical]
schedule: "0 2 * * *"

maxActiveRuns: 1
maxActiveSteps: 5
timeoutSec: 7200
histRetentionDays: 90

params:
  - DATE: "`date +%Y-%m-%d`"
  - ENVIRONMENT: production

env:
  - DATA_DIR: /data/etl
  - LOG_LEVEL: info
  
dotenv:
  - /etc/dagu/production.env

preconditions:
  - condition: "`date +%u`"
    expected: "re:[1-5]"  # Weekdays only

steps:
  - name: validate-environment
    command: ./scripts/validate.sh
    
  - name: extract-data
    command: python extract.py --date=${DATE}
    depends: validate-environment
    output: RAW_DATA_PATH
    retryPolicy:
      limit: 3
      intervalSec: 300
    
  - name: transform-data
    run: transform-module
    parallel:
      items: [customers, orders, products]
      maxConcurrent: 2
    params: "TYPE=${ITEM} INPUT=${RAW_DATA_PATH}"
    depends: extract-data
    continueOn:
      failure: false
    
  - name: load-data
    command: python load.py --date=${DATE}
    depends: transform-data
    env:
      - LOAD_TIMEOUT: 600
    
  - name: validate-results
    command: python validate_results.py --date=${DATE}
    depends: load-data
    mailOnError: true

handlerOn:
  success:
    command: |
      echo "ETL completed successfully for ${DATE}"
      ./scripts/notify-success.sh
  failure:
    executor:
      type: mail
      config:
        to: [email protected]
        subject: "ETL Failed - ${DATE}"
        body: "Check logs at ${DAG_RUN_LOG_FILE}"
        attachLogs: true
  exit:
    command: ./scripts/cleanup.sh ${DATE}

mailOn:
  failure: true
  
smtp:
  host: smtp.company.com
  port: "587"
  username: [email protected]

Released under the MIT License.