Skip to content

YAML Specification

Overview

Dagu workflows are defined using YAML files. Each file represents a DAG (Directed Acyclic Graph) that describes your workflow steps and their relationships.

Basic Structure

yaml
# Workflow metadata
description: "What this workflow does"
tags: [production, etl]    # Optional: for organization

# Scheduling
schedule: "0 * * * *"      # Optional: cron expression

# Execution control
maxActiveRuns: 1           # Max concurrent runs
maxActiveSteps: 10         # Max parallel steps
timeoutSec: 3600           # Workflow timeout (seconds)

# Parameters
params:
  - KEY: default_value
  - ANOTHER_KEY: "${ENV_VAR}"

# Environment variables
env:
  - VAR_NAME: value
  - PATH: ${PATH}:/custom/path

# Workflow steps
steps:
  - name: step-name        # Optional
    command: echo "Hello"
    depends: previous-step # Optional

# Lifecycle handlers
handlerOn:
  success:
    command: notify-success.sh
  failure:
    command: cleanup-on-failure.sh

Root Fields

Metadata Fields

FieldTypeDescriptionDefault
namestringWorkflow nameFilename without extension
descriptionstringHuman-readable description-
tagsarrayTags for categorization[]
groupstringGroup name for organization-

Scheduling Fields

FieldTypeDescriptionDefault
schedulestring/arrayCron expression(s)-
skipIfSuccessfulbooleanSkip if already succeeded todayfalse
restartWaitSecintegerWait seconds before restart0

Schedule Formats

yaml
# Single schedule
schedule: "0 2 * * *"

# Multiple schedules
schedule:
  - "0 9 * * MON-FRI"   # 9 AM weekdays
  - "0 14 * * SAT,SUN"  # 2 PM weekends

# With timezone
schedule: "CRON_TZ=America/New_York 0 9 * * *"

# Start/stop schedules
schedule:
  start:
    - "0 8 * * MON-FRI"   # Start at 8 AM
  stop:
    - "0 18 * * MON-FRI"  # Stop at 6 PM
  restart:
    - "0 12 * * MON-FRI"  # Restart at noon

Execution Control Fields

FieldTypeDescriptionDefault
maxActiveRunsintegerMax concurrent workflow runs (-1 = unlimited)1
maxActiveStepsintegerMax parallel steps1
timeoutSecintegerWorkflow timeout in seconds0 (no timeout)
delaySecintegerInitial delay before start (seconds)0
maxCleanUpTimeSecintegerMax cleanup time (seconds)300
preconditionsarrayWorkflow-level preconditions-
runConfigobjectUser interaction controls when starting DAG-

Data Fields

FieldTypeDescriptionDefault
paramsarrayDefault parameters[]
envarrayEnvironment variables[]
dotenvstring/array.env files to load[".env"]
workingDirstringWorking directory for the DAGDirectory of DAG file
logDirstringCustom log directorySystem default
histRetentionDaysintegerHistory retention days30
maxOutputSizeintegerMax output size per step (bytes)1048576

Container Configuration

FieldTypeDescriptionDefault
containerobjectDefault container configuration for all steps-
yaml
container:
  image: python:3.11
  pullPolicy: missing      # always, missing, never
  env:
    - API_KEY=${API_KEY}
  volumes:
    - /data:/data:ro
  workingDir: /app
  platform: linux/amd64
  user: "1000:1000"
  ports:
    - "8080:8080"
  network: host
  startup: keepalive       # keepalive | entrypoint | command
  command: ["sh", "-c", "my-daemon"]   # when startup: command
  waitFor: running         # running | healthy
  logPattern: "Ready to accept connections"  # optional regex
  restartPolicy: unless-stopped              # optional: no|always|unless-stopped
  keepContainer: false     # Keep container after DAG run

Note: A DAG‑level container is started once and kept alive while the workflow runs; each step executes via docker exec inside that container. This means step commands do not pass through the image’s ENTRYPOINT/CMD. If your image’s entrypoint dispatches subcommands, invoke it explicitly in the step command (see Execution Model and Entrypoint Behavior). Readiness waiting (running/healthy and optional logPattern) times out after 120 seconds with a clear error including the last known state.

SSH Configuration

FieldTypeDescriptionDefault
sshobjectDefault SSH configuration for all steps-
yaml
ssh:
  user: deploy
  host: production.example.com
  port: "22"           # Optional, defaults to "22"
  key: ~/.ssh/id_rsa   # Optional, defaults to standard keys
  password: "${SSH_PASSWORD}" # Optional; prefer keys for security
  strictHostKey: true  # Optional, defaults to true for security
  knownHostFile: ~/.ssh/known_hosts  # Optional, defaults to ~/.ssh/known_hosts

When configured at the DAG level, all steps using SSH executor will inherit these settings:

yaml
# DAG-level SSH configuration
ssh:
  user: deploy
  host: app.example.com
  key: ~/.ssh/deploy_key

steps:
  # These steps inherit the DAG-level SSH configuration
  - systemctl status myapp
  - systemctl restart myapp
  
  # Step-level config overrides DAG-level
  - executor:
      type: ssh
      config:
        user: backup      # Override user
        host: db.example.com  # Override host
        key: ~/.ssh/backup_key  # Override key
    command: mysqldump mydb > backup.sql

Important Notes:

  • SSH and container fields are mutually exclusive at the DAG level
  • Step-level SSH configuration completely overrides DAG-level configuration (no partial overrides)
  • Password authentication is supported but not recommended; prefer key-based auth
  • Default SSH keys are tried if no key is specified: ~/.ssh/id_rsa, ~/.ssh/id_ecdsa, ~/.ssh/id_ed25519, ~/.ssh/id_dsa

Working Directory and Volume Resolution

When using container volumes with relative paths, the paths are resolved relative to the DAG's workingDir:

yaml
# DAG with working directory and container volumes
workingDir: /app/project
container:
  image: python:3.11
  volumes:
    - ./data:/data        # Resolves to /app/project/data:/data
    - .:/workspace        # Resolves to /app/project:/workspace
    - /abs/path:/other   # Absolute paths are unchanged

steps:
  - python process.py

Working Directory Inheritance:

  • Steps inherit workingDir from the DAG if not explicitly set
  • Step-level workingDir overrides DAG-level workingDir
  • Both dir and workingDir set the working directory (use one or the other)
yaml
# Example of workingDir inheritance
workingDir: /project          # DAG-level working directory

steps:
  - pwd                   # Outputs: /project
  - workingDir: /custom   # Override DAG workingDir
    command: pwd          # Outputs: /custom

Queue Configuration

FieldTypeDescriptionDefault
queuestringQueue name-

OpenTelemetry Configuration

FieldTypeDescriptionDefault
otelobjectOpenTelemetry tracing configuration-
yaml
otel:
  enabled: true
  endpoint: "localhost:4317"  # OTLP gRPC endpoint
  headers:
    Authorization: "Bearer ${OTEL_TOKEN}"
  insecure: false
  timeout: 30s
  resource:
    service.name: "dagu-${DAG_NAME}"
    service.version: "1.0.0"
    deployment.environment: "${ENVIRONMENT}"

See OpenTelemetry Tracing for detailed configuration.

Notification Fields

FieldTypeDescriptionDefault
mailOnobjectEmail notification triggers-
errorMailobjectError email configuration-
infoMailobjectInfo email configuration-
smtpobjectSMTP server configuration-
yaml
mailOn:
  success: true
  failure: true
  
errorMail:
  from: [email protected]
  to: [email protected]  # Single recipient (string)
  # Or multiple recipients (array):
  # to:
  #   - [email protected]
  #   - [email protected]
  prefix: "[ALERT]"
  attachLogs: true
  
infoMail:
  from: [email protected]
  to: [email protected]  # Single recipient (string)
  # Or multiple recipients (array):
  # to:
  #   - [email protected]
  #   - [email protected]
  prefix: "[INFO]"
  attachLogs: false
  
smtp:
  host: smtp.gmail.com
  port: "587"
  username: [email protected]
  password: ${SMTP_PASSWORD}

Handler Fields

FieldTypeDescriptionDefault
handlerOnobjectLifecycle event handlers-
yaml
handlerOn:
  success:
    command: echo "Workflow succeeded"
  failure:
    command: echo "Notifying failure"
  cancel:
    command: echo "Cleaning up"
  exit:
    command: echo "Always running"

RunConfig

The runConfig field allows you to control user interactions when starting DAG runs:

FieldTypeDescriptionDefault
disableParamEditbooleanPrevent parameter editing when starting DAGfalse
disableRunIdEditbooleanPrevent custom run ID input when starting DAGfalse

Example usage:

yaml
# Prevent users from modifying parameters at runtime
runConfig:
  disableParamEdit: true
  disableRunIdEdit: false

params:
  - ENVIRONMENT: production  # Users cannot change this
  - VERSION: 1.0.0           # This is fixed

This is useful when:

  • You want to enforce specific parameter values for production workflows
  • You need consistent run IDs for tracking purposes
  • You want to prevent accidental parameter changes

Step Fields

Each step in the steps array can have these fields:

Basic Fields

FieldTypeDescriptionDefault
namestringStep name (optional - auto-generated if not provided)Auto-generated
commandstringCommand to execute-
scriptstringInline script (alternative to command)-
runstringRun another DAG-
dependsstring/arrayStep dependencies-

Step Definition Formats

Steps can be defined in multiple formats:

Standard Format

yaml
steps:
  - echo "Hello"

Shorthand String Format

yaml
steps:
  - echo "Hello"     # Equivalent to: {command: echo "Hello"}
  - ls -la          # Equivalent to: {command: ls -la}

Nested Array Format (Parallel Steps)

yaml
steps:
  - echo "Sequential step 1"
  - 
    - echo "Parallel step 2a"
    - echo "Parallel step 2b"
  - echo "Sequential step 3"

In the nested array format:

  • Steps within a nested array run in parallel
  • They automatically depend on the previous sequential step
  • The next sequential step automatically depends on all parallel steps in the group
  • Auto-generated names follow the pattern: parallel_{group}_{command}_{index}

Execution Fields

FieldTypeDescriptionDefault
dirstringWorking directoryCurrent directory
workingDirstringWorking directory (alternative to dir, inherits from DAG)DAG's workingDir
shellstringShell to useSystem default
stdoutstringRedirect stdout to file-
stderrstringRedirect stderr to file-
outputstringCapture output to variable-
envarray/objectStep-specific environment variables (overrides DAG-level)-
paramsstringParameters for sub-DAG-

Parallel Execution

FieldTypeDescriptionDefault
parallelarrayItems to process in parallel-
maxConcurrentintegerMax parallel executionsNo limit
yaml
steps:
  - run: file-processor
    parallel:
      items: [file1.csv, file2.csv, file3.csv]
      maxConcurrent: 2
    params: "FILE=${ITEM}"

Conditional Execution

FieldTypeDescriptionDefault
preconditionsarrayConditions to check before execution-
continueOnobjectContinue workflow on certain conditions-

ContinueOn Fields

FieldTypeDescriptionDefault
failurebooleanContinue execution when step failsfalse
skippedbooleanContinue when step is skipped due to preconditionsfalse
exitCodearrayList of exit codes that allow continuation[]
outputarrayList of stdout patterns that allow continuation (supports regex with re: prefix)[]
markSuccessbooleanMark step as successful when continue conditions are metfalse
yaml
steps:
  - command: echo "Deploying"
    preconditions:
      - condition: "${ENVIRONMENT}"
        expected: "production"
      - condition: "`git branch --show-current`"
        expected: "main"
    
  - command: echo "Running optional task"
    continueOn:
      failure: true
      skipped: true
      exitCode: [0, 1, 2]
      output: ["WARNING", "SKIP", "re:^INFO:.*"]
      markSuccess: true

See the Continue On Reference for detailed documentation.

Error Handling

FieldTypeDescriptionDefault
retryPolicyobjectRetry configuration-
repeatPolicyobjectRepeat configuration-
mailOnErrorbooleanSend email on errorfalse
signalOnStopstringSignal to send on stopSIGTERM

Retry Policy Fields

FieldTypeDescriptionDefault
limitintegerMaximum retry attempts-
intervalSecintegerBase interval between retries (seconds)-
backoffanyExponential backoff multiplier. true = 2.0, or specify custom number > 1.0-
maxIntervalSecintegerMaximum interval between retries (seconds)-
exitCodearrayExit codes that trigger retryAll non-zero

Exponential Backoff: When backoff is set, intervals increase exponentially using the formula:
interval * (backoff ^ attemptCount)

Repeat Policy Fields

FieldTypeDescriptionDefault
repeatstringRepeat mode: "while" or "until"-
intervalSecintegerBase interval between repetitions (seconds)-
backoffanyExponential backoff multiplier. true = 2.0, or specify custom number > 1.0-
maxIntervalSecintegerMaximum interval between repetitions (seconds)-
limitintegerMaximum number of executions-
conditionstringCondition to evaluate-
expectedstringExpected value/pattern-
exitCodearrayExit codes that trigger repeat-

Repeat Modes:

  • while: Repeats while the condition is true or exit code matches
  • until: Repeats until the condition is true or exit code matches

Exponential Backoff: When backoff is set, intervals increase exponentially using the formula:
interval * (backoff ^ attemptCount)

yaml
steps:
  - command: curl https://api.example.com
    retryPolicy:
      limit: 3
      intervalSec: 30
      exitCode: [1, 255]  # Retry only on specific codes
      
  - command: curl https://api.example.com
    retryPolicy:
      limit: 5
      intervalSec: 2
      backoff: true        # Exponential backoff (2.0x multiplier)
      maxIntervalSec: 60   # Cap at 60 seconds
      exitCode: [429, 503] # Rate limit or unavailable
    
  - command: check-process.sh
    repeatPolicy:
      repeat: while        # Repeat WHILE process is running
      exitCode: [0]        # Exit code 0 means process found
      intervalSec: 60
      limit: 30
      
  - command: echo "Checking status"
    output: STATUS
    repeatPolicy:
      repeat: until        # Repeat UNTIL status is ready
      condition: "${STATUS}"
      expected: "ready"
      intervalSec: 5
      backoff: 1.5         # Custom backoff multiplier
      maxIntervalSec: 300  # Cap at 5 minutes
      limit: 60

Executor Configuration

FieldTypeDescriptionDefault
executorobjectExecutor configurationShell executor
yaml
steps:
  - executor:
      type: docker
      config:
        image: python:3.11
        volumes:
          - /data:/data:ro
        env:
          - API_KEY=${API_KEY}
    command: python process.py

Distributed Execution

FieldTypeDescriptionDefault
workerSelectorobjectWorker label requirements for distributed execution-

When using distributed execution, specify workerSelector to route tasks to workers with matching labels:

yaml
steps:
  - run: gpu-training
---
# Run on a worker with gpu
name: gpu-training
workerSelector:
  gpu: "true"
  memory: "64G"
steps:
  - python train_model.py

Worker Selection Rules:

  • All labels in workerSelector must match exactly on the worker
  • Label values are case-sensitive strings
  • Steps without workerSelector can run on any available worker
  • If no workers match the selector, the task waits until a matching worker is available

See Distributed Execution for complete documentation.

Variable Substitution

Parameter References

yaml
params:
  - USER: john
  - DOMAIN: example.com

steps:
  - echo "Hello ${USER} from ${DOMAIN}"

Environment Variables

yaml
env:
  - API_URL: https://api.example.com
  - API_KEY: ${SECRET_API_KEY}  # From system env

steps:
  - curl -H "X-API-Key: ${API_KEY}" ${API_URL}

Loading Environment from .env Files

The dotenv field allows loading environment variables from .env files:

yaml
# Default behavior - loads .env file if it exists
# No dotenv field needed, defaults to [".env"]

# Load specific .env file
dotenv: .env.production

# Load multiple .env files (later files override earlier ones)
dotenv:
  - .env.defaults
  - .env.local

# Disable .env loading
dotenv: []

Important Notes:

  • If dotenv is not specified, Dagu automatically tries to load .env file
  • Files are loaded relative to the DAG's workingDir
  • Later files in the array override variables from earlier files
  • System environment variables take precedence over .env file variables
  • .env files are loaded at DAG startup, before any steps execute

Example .env file:

bash
# .env file
DATABASE_URL=postgres://localhost/mydb
API_KEY=secret123
DEBUG=true
yaml
# DAG using .env variables
workingDir: /app
dotenv: .env          # Optional, this is the default

steps:
  - psql ${DATABASE_URL}
  - echo "Debug is ${DEBUG}"

Command Substitution

yaml
steps:
  - echo "Today is `date +%Y-%m-%d`"
    
  - command: deploy.sh
    preconditions:
      - condition: "`git branch --show-current`"
        expected: "main"

Output Variables

yaml
steps:
  - command: cat VERSION
    output: VERSION
  - command: docker build -t app:${VERSION} .

JSON Path Access

yaml
steps:
  - command: cat config.json
    output: CONFIG
  - command: echo "Port is ${CONFIG.server.port}"

Special Variables

These variables are automatically available:

VariableDescription
DAG_NAMECurrent DAG name
DAG_RUN_IDUnique run identifier
DAG_RUN_LOG_FILEPath to workflow log
DAG_RUN_STEP_NAMECurrent step name
DAG_RUN_STEP_STDOUT_FILEStep stdout file path
DAG_RUN_STEP_STDERR_FILEStep stderr file path
ITEMCurrent item in parallel execution

Complete Example

yaml
name: production-etl
description: Daily ETL pipeline for production data
tags: [production, etl, critical]
schedule: "0 2 * * *"

maxActiveRuns: 1
maxActiveSteps: 5
timeoutSec: 7200
histRetentionDays: 90

params:
  - DATE: "`date +%Y-%m-%d`"
  - ENVIRONMENT: production

env:
  - DATA_DIR: /data/etl
  - LOG_LEVEL: info
  
dotenv:
  - /etc/dagu/production.env

# Default container for all steps
container:
  image: python:3.11-slim
  pullPolicy: missing
  env:
    - PYTHONUNBUFFERED=1
  volumes:
    - ./data:/data
    - ./scripts:/scripts:ro

preconditions:
  - condition: "`date +%u`"
    expected: "re:[1-5]"  # Weekdays only

steps:
  - ./scripts/validate.sh
    
  - command: python extract.py --date=${DATE}
    depends: validate-environment
    output: RAW_DATA_PATH
    retryPolicy:
      limit: 3
      intervalSec: 300
    
  - run: transform-module
    parallel:
      items: [customers, orders, products]
      maxConcurrent: 2
    params: "TYPE=${ITEM} INPUT=${RAW_DATA_PATH}"
    continueOn:
      failure: false

 # Use different executor for this step   
  - executor:
      type: docker
      config:
        image: postgres:16
        env:
          - PGPASSWORD=${DB_PASSWORD}
    command: psql -h ${DB_HOST} -U ${DB_USER} -f load.sql
    
  - command: python validate_results.py --date=${DATE}
    depends: load-data
    mailOnError: true

handlerOn:
  success:
    command: |
      echo "ETL completed successfully for ${DATE}"
      ./scripts/notify-success.sh
  failure:
    executor:
      type: mail
      config:
        to: [email protected]
        subject: "ETL Failed - ${DATE}"
        body: "Check logs at ${DAG_RUN_LOG_FILE}"
        attachLogs: true
  exit:
    command: ./scripts/cleanup.sh ${DATE}

mailOn:
  failure: true
  
smtp:
  host: smtp.company.com
  port: "587"
  username: [email protected]

Released under the MIT License.