Skip to content

Getting Started

Install

pip install wowdata

For local development:

git clone https://github.com/sci2pro/wowdata.git
cd wowdata
pip install -e .[dev]

First Pipeline (Python)

from wowdata import Pipeline, Sink, Source, Transform

pipe = (
    Pipeline(Source("people.csv"))
    .then(Transform("cast", params={"types": {"age": "integer"}, "on_error": "null"}))
    .then(Transform("filter", params={"where": "age >= 18"}))
    .then(Sink("adults.csv"))
)

pipe.run()

First Pipeline (YAML + CLI)

wowdata: 0
pipeline:
  start:
    uri: people.csv
    type: csv
  steps:
    - transform:
        op: filter
        params:
          where: "age >= 18"
    - sink:
        uri: adults.csv
        type: csv

Run it:

wow run pipeline.yaml

Fallback command:

wowdata run pipeline.yaml

CLI (v0)

WowData™ includes a CLI for running YAML-serialized pipelines.

After installing the package, use:

wow --help

If wow conflicts with another tool in your environment, use the fallback command:

wowdata --help

Commands

  1. wow run pipeline.yaml (fallback: wowdata run pipeline.yaml)
  2. Executes the pipeline end-to-end.
  3. Returns non-zero on runtime failures.

  4. wow validate pipeline.yaml (fallback: wowdata validate pipeline.yaml)

  5. Parses YAML + IR and runs preflight checks on source/sink paths.

  6. wow schema pipeline.yaml (fallback: wowdata schema pipeline.yaml)

  7. Infers output schema without full pipeline execution.

  8. wow lock-schema pipeline.yaml -o pipeline.locked.yaml (fallback: wowdata lock-schema ...)

  9. Writes a schema-locked YAML by embedding per-transform output_schema.

Common flags

  • --base-dir PATH resolve relative paths in YAML from a specific directory.
  • --json print machine-readable JSON output.
  • --sample-rows N used by schema and lock-schema for bounded inference.
  • --force recompute schema inference even if cached.

CLI examples

# Run a serialized pipeline
wow run pipeline.yaml

# Validate structure and file paths before execution
wow validate pipeline.yaml

# Print inferred output schema as JSON
wow schema pipeline.yaml --json

# Save a locked pipeline snapshot
wow lock-schema pipeline.yaml -o pipeline.locked.yaml

Exit codes

  • 0: success
  • 2: CLI usage error
  • 3: pipeline parse/validation error
  • 4: pipeline runtime execution error