Skip to content

Archive Executor

Work with archive files directly from a DAG step without relying on shell utilities. The executor is built on top of github.com/mholt/archives and streams data for efficiency.

Supported Formats

Archive Formats

FormatExtensionReadWritePassword SupportNotes
ZIP.zipNoFull read/write support
TAR.tarNoFull read/write support
RAR.rarYes (read)Read-only; extraction with password
7-Zip.7zYes (read)Read-only; extraction with password

Compression Formats (Single File)

FormatExtensionCompression LevelNotes
GZIP.gz0-9 (default: -1)Configurable compression
Bzip2.bz20-9 (default: -1)Configurable compression
XZ.xzFixedHigh compression ratio
Zstandard.zst, .zstdFixedFast with good compression
LZ4.lz4FixedVery fast, lower ratio

Combined Formats (Archive + Compression)

FormatExtensionsReadWriteCompression Level
TAR+GZIP.tar.gz, .tgz0-9 (default: -1)
TAR+Bzip2.tar.bz2, .tbz2, .tbz0-9 (default: -1)
TAR+XZ.tar.xz, .txzFixed
TAR+Zstandard.tar.zst, .tar.zstdFixed

Format Detection

The executor automatically detects archive format from:

  1. File extension - Recognizes all standard extensions (.tar.gz, .zip, etc.)
  2. Magic bytes - Examines file headers when extension is ambiguous
  3. Explicit configuration - Override with format field when needed

Supported Operations

CommandDescription
extractUnpack an archive into a directory
createCreate an archive from files/folders
listEnumerate entries in an archive

Quick Start

yaml
steps:
  - name: unpack
    executor:
      type: archive
      config:
        source: logs.tar.gz
        destination: ./logs
    command: extract

  - name: package
    executor:
      type: archive
      config:
        source: ./logs
        destination: logs-backup.tar.gz
    command: create

  - name: inspect
    executor:
      type: archive
      config:
        source: logs-backup.tar.gz
    command: list
    output: ARCHIVE_INDEX

extract and create emit a JSON summary (files processed, bytes, duration, etc.) on stdout. list outputs a JSON array of entries so subsequent steps can filter or inspect the archive with tools like jq.

Configuration

FieldDescriptionTypeDefaultNotes
sourceInput archive or directorystringrequiredPath to archive file (extract/list) or source directory (create)
destinationOutput directory or archive pathstring. (extract)Target directory (extract) or output archive path (create); optional for list
formatArchive format overridestringauto-detectExplicit format: zip, tar, tar.gz, tar.bz2, tar.xz, tar.zst, 7z, rar, etc.
compressionLevelCompression levelint-1-1 = default, 0 = none, 1-9 = level; applies to gzip and bzip2 only
overwriteReplace existing filesboolfalseWhen false, extraction fails if destination file exists
stripComponentsStrip leading path segmentsint0Remove N leading directories from paths (like tar --strip-components=N)
preservePathsPreserve full pathsbooltrueWhen false, only extracts the basename of each file
includeInclude glob patterns[]stringall filesOnly process files matching these patterns (e.g., **/*.csv)
excludeExclude glob patterns[]stringnoneSkip files matching these patterns (applied after include)
followSymlinksFollow symlinks when creatingboolfalseWhen true, dereferences symlinks; when false, preserves them
verifyIntegrityVerify archive after operationboolfalsePerforms full read pass to validate archive integrity
continueOnErrorContinue on individual file errorsboolfalseLogs errors but continues processing remaining files
dryRunSimulate operationboolfalseCalculate metrics without writing files to disk
passwordArchive passwordstringnoneExtraction only for password-protected 7z and rar archives

All fields support environment interpolation (${VAR}) and outputs from previous steps.

Additional Examples

Selective Extraction

yaml
workingDir: /data/pipeline

steps:
  - name: extract-csv
    executor:
      type: archive
      config:
        source: dataset.tar.zst
        destination: ./data
        include:
          - "**/*.csv"
        stripComponents: 1
    command: extract

Create Archive With Verification

yaml
workingDir: /deploy/release

steps:
  - name: bundle-artifacts
    executor:
      type: archive
      config:
        source: ./dist
        destination: dist.tar.gz
        format: tar.gz
        verifyIntegrity: true
    command: create

Extract Password-Protected 7z (Read-Only)

yaml
workingDir: /data/decrypted

secrets:
  - name: ARCHIVE_PASSWORD
    provider: env
    key: ARCHIVE_PASSWORD

steps:
  - name: unpack-secure
    executor:
      type: archive
      config:
        source: secure-data.7z
        destination: ./decrypted
        password: ${ARCHIVE_PASSWORD}
        include:
          - "**/*.csv"
        overwrite: true
    command: extract

Important: Password protection is read-only. You can extract password-protected 7z and rar archives, but creating encrypted archives is not supported.

Security Features

The executor implements security protections against malicious archives:

  • Path traversal prevention - Rejects archives with entries escaping the destination directory
  • Symlink validation - Blocks symlinks with absolute targets or paths escaping the destination
  • Safe path handling - Validates all extracted paths before writing files

These protections defend against "zip slip" and similar archive-based attacks.

Limitations

FormatLimitation
RARRead-only; cannot create RAR archives
7-ZipRead-only; cannot create 7z archives
Password ProtectionExtraction only; cannot create encrypted archives
Compression LevelsOnly GZIP and Bzip2 support configurable levels (0-9)

Output Format

Extract and Create Operations

Both extract and create commands output JSON to stdout with operation metrics:

json
{
  "operation": "extract",
  "source": "logs.tar.gz",
  "destination": "./logs",
  "filesExtracted": 1523,
  "bytesExtracted": 45829384,
  "filesSkipped": 0,
  "duration": "1.234s",
  "verifyPerformed": false,
  "errors": []
}

List Operation

The list command outputs a JSON array of archive entries:

json
{
  "operation": "list",
  "source": "logs.tar.gz",
  "totalFiles": 1523,
  "totalSize": 45829384,
  "verified": false,
  "duration": "0.123s",
  "files": [
    {
      "path": "logs/app.log",
      "size": 12345,
      "mode": "-rw-r--r--",
      "modTime": "2025-11-02T12:34:56Z",
      "isDir": false
    }
  ]
}

Released under the MIT License.