Skip to content

YAML Specification

Overview

Dagu workflows are defined using YAML files. Each file represents a DAG (Directed Acyclic Graph) that describes your workflow steps and their relationships.

Basic Structure

yaml
# Workflow metadata
name: my-workflow          # Optional: defaults to filename
description: "What this workflow does"
tags: [production, etl]    # Optional: for organization

# Scheduling
schedule: "0 * * * *"      # Optional: cron expression

# Execution control
maxActiveRuns: 1           # Max concurrent runs
maxActiveSteps: 10         # Max parallel steps
timeoutSec: 3600           # Workflow timeout (seconds)

# Parameters
params:
  - KEY: default_value
  - ANOTHER_KEY: "${ENV_VAR}"

# Environment variables
env:
  - VAR_NAME: value
  - PATH: ${PATH}:/custom/path

# Workflow steps
steps:
  - name: step-name
    command: echo "Hello"
    depends: previous-step

# Lifecycle handlers
handlerOn:
  success:
    command: notify-success.sh
  failure:
    command: cleanup-on-failure.sh

Root Fields

Metadata Fields

FieldTypeDescriptionDefault
namestringWorkflow nameFilename without extension
descriptionstringHuman-readable description-
tagsarrayTags for categorization[]
groupstringGroup name for organization-

Scheduling Fields

FieldTypeDescriptionDefault
schedulestring/arrayCron expression(s)-
skipIfSuccessfulbooleanSkip if already succeeded todayfalse
restartWaitSecintegerWait seconds before restart0

Schedule Formats

yaml
# Single schedule
schedule: "0 2 * * *"

# Multiple schedules
schedule:
  - "0 9 * * MON-FRI"   # 9 AM weekdays
  - "0 14 * * SAT,SUN"  # 2 PM weekends

# With timezone
schedule: "CRON_TZ=America/New_York 0 9 * * *"

# Start/stop schedules
schedule:
  start:
    - "0 8 * * MON-FRI"   # Start at 8 AM
  stop:
    - "0 18 * * MON-FRI"  # Stop at 6 PM
  restart:
    - "0 12 * * MON-FRI"  # Restart at noon

Execution Control Fields

FieldTypeDescriptionDefault
maxActiveRunsintegerMax concurrent workflow runs (-1 = unlimited)1
maxActiveStepsintegerMax parallel steps1
timeoutSecintegerWorkflow timeout in seconds0 (no timeout)
delaySecintegerInitial delay before start (seconds)0
maxCleanUpTimeSecintegerMax cleanup time (seconds)300
preconditionsarrayWorkflow-level preconditions-

Data Fields

FieldTypeDescriptionDefault
paramsarrayDefault parameters[]
envarrayEnvironment variables[]
dotenvarray.env files to load[]
logDirstringCustom log directorySystem default
histRetentionDaysintegerHistory retention days30
maxOutputSizeintegerMax output size per step (bytes)1048576

Queue Configuration

FieldTypeDescriptionDefault
queuestringQueue name-

OpenTelemetry Configuration

FieldTypeDescriptionDefault
otelobjectOpenTelemetry tracing configuration-
yaml
otel:
  enabled: true
  endpoint: "localhost:4317"  # OTLP gRPC endpoint
  headers:
    Authorization: "Bearer ${OTEL_TOKEN}"
  insecure: false
  timeout: 30s
  resource:
    service.name: "dagu-${DAG_NAME}"
    service.version: "1.0.0"
    deployment.environment: "${ENVIRONMENT}"

See OpenTelemetry Tracing for detailed configuration.

Notification Fields

FieldTypeDescriptionDefault
mailOnobjectEmail notification triggers-
errorMailobjectError email configuration-
infoMailobjectInfo email configuration-
smtpobjectSMTP server configuration-
yaml
mailOn:
  success: true
  failure: true
  
errorMail:
  from: [email protected]
  to: [email protected]  # Single recipient (string)
  # Or multiple recipients (array):
  # to:
  #   - [email protected]
  #   - [email protected]
  prefix: "[ALERT]"
  attachLogs: true
  
infoMail:
  from: [email protected]
  to: [email protected]  # Single recipient (string)
  # Or multiple recipients (array):
  # to:
  #   - [email protected]
  #   - [email protected]
  prefix: "[INFO]"
  attachLogs: false
  
smtp:
  host: smtp.gmail.com
  port: "587"
  username: [email protected]
  password: ${SMTP_PASSWORD}

Handler Fields

FieldTypeDescriptionDefault
handlerOnobjectLifecycle event handlers-
yaml
handlerOn:
  success:
    command: echo "Workflow succeeded"
  failure:
    command: ./notify-failure.sh
  cancel:
    command: ./cleanup.sh
  exit:
    command: ./always-run.sh

Step Fields

Each step in the steps array can have these fields:

Basic Fields

FieldTypeDescriptionDefault
namestringRequired - Step name-
commandstringCommand to execute-
scriptstringInline script (alternative to command)-
runstringRun another DAG-
dependsstring/arrayStep dependencies-

Execution Fields

FieldTypeDescriptionDefault
dirstringWorking directoryCurrent directory
shellstringShell to useSystem default
stdoutstringRedirect stdout to file-
stderrstringRedirect stderr to file-
outputstringCapture output to variable-
envarray/objectStep-specific environment variables (overrides DAG-level)-
paramsstringParameters for sub-DAG-

Parallel Execution

FieldTypeDescriptionDefault
parallelarrayItems to process in parallel-
maxConcurrentintegerMax parallel executionsNo limit
yaml
steps:
  - name: process-files
    run: file-processor
    parallel:
      items: [file1.csv, file2.csv, file3.csv]
      maxConcurrent: 2
    params: "FILE=${ITEM}"

Conditional Execution

FieldTypeDescriptionDefault
preconditionsarrayConditions to check before execution-
continueOnobjectContinue workflow on certain conditions-

ContinueOn Fields

FieldTypeDescriptionDefault
failurebooleanContinue execution when step failsfalse
skippedbooleanContinue when step is skipped due to preconditionsfalse
exitCodearrayList of exit codes that allow continuation[]
outputarrayList of stdout patterns that allow continuation (supports regex with re: prefix)[]
markSuccessbooleanMark step as successful when continue conditions are metfalse
yaml
steps:
  - name: conditional-step
    command: ./deploy.sh
    preconditions:
      - condition: "${ENVIRONMENT}"
        expected: "production"
      - condition: "`git branch --show-current`"
        expected: "main"
    
  - name: optional-step
    command: ./optional.sh
    continueOn:
      failure: true
      skipped: true
      exitCode: [0, 1, 2]
      output: ["WARNING", "SKIP", "re:^INFO:.*"]
      markSuccess: true

See the Continue On Reference for detailed documentation.

Error Handling

FieldTypeDescriptionDefault
retryPolicyobjectRetry configuration-
repeatPolicyobjectRepeat configuration-
mailOnErrorbooleanSend email on errorfalse
signalOnStopstringSignal to send on stopSIGTERM

Retry Policy Fields

FieldTypeDescriptionDefault
limitintegerMaximum retry attempts-
intervalSecintegerBase interval between retries (seconds)-
backoffanyExponential backoff multiplier. true = 2.0, or specify custom number > 1.0-
maxIntervalSecintegerMaximum interval between retries (seconds)-
exitCodearrayExit codes that trigger retryAll non-zero

Exponential Backoff: When backoff is set, intervals increase exponentially using the formula:
interval * (backoff ^ attemptCount)

Repeat Policy Fields

FieldTypeDescriptionDefault
repeatstringRepeat mode: "while" or "until"-
intervalSecintegerBase interval between repetitions (seconds)-
backoffanyExponential backoff multiplier. true = 2.0, or specify custom number > 1.0-
maxIntervalSecintegerMaximum interval between repetitions (seconds)-
limitintegerMaximum number of executions-
conditionstringCondition to evaluate-
expectedstringExpected value/pattern-
exitCodearrayExit codes that trigger repeat-

Repeat Modes:

  • while: Repeats while the condition is true or exit code matches
  • until: Repeats until the condition is true or exit code matches

Exponential Backoff: When backoff is set, intervals increase exponentially using the formula:
interval * (backoff ^ attemptCount)

yaml
steps:
  - name: retry-example
    command: curl https://api.example.com
    retryPolicy:
      limit: 3
      intervalSec: 30
      exitCode: [1, 255]  # Retry only on specific codes
      
  - name: retry-with-backoff
    command: curl https://api.example.com
    retryPolicy:
      limit: 5
      intervalSec: 2
      backoff: true        # Exponential backoff (2.0x multiplier)
      maxIntervalSec: 60   # Cap at 60 seconds
      exitCode: [429, 503] # Rate limit or unavailable
    
  - name: repeat-while-example
    command: check-process.sh
    repeatPolicy:
      repeat: while        # Repeat WHILE process is running
      exitCode: [0]        # Exit code 0 means process found
      intervalSec: 60
      limit: 30
      
  - name: repeat-until-with-backoff
    command: check-status.sh
    output: STATUS
    repeatPolicy:
      repeat: until        # Repeat UNTIL status is ready
      condition: "${STATUS}"
      expected: "ready"
      intervalSec: 5
      backoff: 1.5         # Custom backoff multiplier
      maxIntervalSec: 300  # Cap at 5 minutes
      limit: 60

Executor Configuration

FieldTypeDescriptionDefault
executorobjectExecutor configurationShell executor
yaml
steps:
  - name: docker-step
    executor:
      type: docker
      config:
        image: python:3.11
        volumes:
          - /data:/data:ro
        env:
          - API_KEY=${API_KEY}
    command: python process.py

Distributed Execution

FieldTypeDescriptionDefault
workerSelectorobjectWorker label requirements for distributed execution-

When using distributed execution, specify workerSelector to route tasks to workers with matching labels:

yaml
steps:
  - name: gpu-training
    run: gpu-training
---
# Run on a worker with gpu
name: gpu-training
workerSelector:
  gpu: "true"
  memory: "64G"
steps:
  - name: gpu-training
    command: python train_model.py

Worker Selection Rules:

  • All labels in workerSelector must match exactly on the worker
  • Label values are case-sensitive strings
  • Steps without workerSelector can run on any available worker
  • If no workers match the selector, the task waits until a matching worker is available

See Distributed Execution for complete documentation.

Variable Substitution

Parameter References

yaml
params:
  - USER: john
  - DOMAIN: example.com

steps:
  - name: greet
    command: echo "Hello ${USER} from ${DOMAIN}"

Environment Variables

yaml
env:
  - API_URL: https://api.example.com
  - API_KEY: ${SECRET_API_KEY}  # From system env

steps:
  - name: call-api
    command: curl -H "X-API-Key: ${API_KEY}" ${API_URL}

Command Substitution

yaml
steps:
  - name: dynamic-date
    command: echo "Today is `date +%Y-%m-%d`"
    
  - name: git-branch
    command: deploy.sh
    preconditions:
      - condition: "`git branch --show-current`"
        expected: "main"

Output Variables

yaml
steps:
  - name: get-version
    command: cat VERSION
    output: VERSION
    
  - name: build
    command: docker build -t app:${VERSION} .
    depends: get-version

JSON Path Access

yaml
steps:
  - name: get-config
    command: cat config.json
    output: CONFIG
    
  - name: use-config
    command: echo "Port is ${CONFIG.server.port}"
    depends: get-config

Special Variables

These variables are automatically available:

VariableDescription
DAG_NAMECurrent DAG name
DAG_RUN_IDUnique run identifier
DAG_RUN_LOG_FILEPath to workflow log
DAG_RUN_STEP_NAMECurrent step name
DAG_RUN_STEP_STDOUT_FILEStep stdout file path
DAG_RUN_STEP_STDERR_FILEStep stderr file path
ITEMCurrent item in parallel execution

Execution Types

Chain (Default)

Steps execute based on dependencies:

yaml
steps:
  - name: A
    command: echo "A"
  - name: B
    command: echo "B"
    depends: A
  - name: C
    command: echo "C"
    depends: B

Parallel

All steps without dependencies run in parallel:

yaml
steps:
  - name: task1
    command: ./task1.sh
  - name: task2
    command: ./task2.sh
  - name: task3
    command: ./task3.sh

Complete Example

yaml
name: production-etl
description: Daily ETL pipeline for production data
tags: [production, etl, critical]
schedule: "0 2 * * *"

maxActiveRuns: 1
maxActiveSteps: 5
timeoutSec: 7200
histRetentionDays: 90

params:
  - DATE: "`date +%Y-%m-%d`"
  - ENVIRONMENT: production

env:
  - DATA_DIR: /data/etl
  - LOG_LEVEL: info
  
dotenv:
  - /etc/dagu/production.env

preconditions:
  - condition: "`date +%u`"
    expected: "re:[1-5]"  # Weekdays only

steps:
  - name: validate-environment
    command: ./scripts/validate.sh
    
  - name: extract-data
    command: python extract.py --date=${DATE}
    depends: validate-environment
    output: RAW_DATA_PATH
    retryPolicy:
      limit: 3
      intervalSec: 300
    
  - name: transform-data
    run: transform-module
    parallel:
      items: [customers, orders, products]
      maxConcurrent: 2
    params: "TYPE=${ITEM} INPUT=${RAW_DATA_PATH}"
    depends: extract-data
    continueOn:
      failure: false
    
  - name: load-data
    command: python load.py --date=${DATE}
    depends: transform-data
    env:
      - LOAD_TIMEOUT: 600
      - DB_CONNECTION: ${PROD_DB}
    
  - name: validate-results
    command: python validate_results.py --date=${DATE}
    depends: load-data
    mailOnError: true

handlerOn:
  success:
    command: |
      echo "ETL completed successfully for ${DATE}"
      ./scripts/notify-success.sh
  failure:
    executor:
      type: mail
      config:
        to: [email protected]
        subject: "ETL Failed - ${DATE}"
        body: "Check logs at ${DAG_RUN_LOG_FILE}"
        attachLogs: true
  exit:
    command: ./scripts/cleanup.sh ${DATE}

mailOn:
  failure: true
  
smtp:
  host: smtp.company.com
  port: "587"
  username: [email protected]

Released under the MIT License.