Skip to content

Philosophy

Why WowData™?

Most data tools assume that if you are touching data, you are already an expert.

WowData™ rejects this assumption.

We believe that:

  • data engineering is a foundational skill,

  • learning should be fast and durable,

  • and tools should teach rather than intimidate.

WowData™ is designed so that anyone can build, read, and reason about a data pipeline—even when that pipeline performs non-trivial work.

Core Concepts

WowData™ is built on four stable, universal concepts:

  • Source — where data comes from

  • Transform — how data changes

  • Sink — where data goes

  • Pipeline — how everything fits together

These concepts are deliberately persistent and reused everywhere.

We avoid hidden helpers, magical shortcuts, and proliferating abstractions.

If you understand these four ideas, you understand WowData™.

Example

from wowdata import Source, Transform, Sink, Pipeline

pipe = (  
    Pipeline(Source("people.csv"))  
    .then(Transform("cast", params={"types": {"age": "integer"}, "on_error": "null"}))  
    .then(Transform("filter", params={"where": "age >= 30 and country == 'KE'"}))  
    .then(Sink("out_filtered.csv"))  
)

pipe.run()

This pipeline:

  1. reads a CSV file,

  2. explicitly casts a column,

  3. filters rows using a small, teachable expression language,

  4. and writes the result to disk.

Nothing is hidden. Nothing is inferred without consent.

Learning-First by Design

WowData™ makes deliberate trade-offs to accelerate learning:

  • A small, closed vocabulary of concepts

  • A restricted expression language that can be mastered quickly

  • Explicit transforms instead of silent automation

  • Deterministic checkpoints for inspection

  • A serializable pipeline that can be read like a recipe

If a feature makes learning harder—even if it saves time—we reject it.

Example: Learning-First Explicitness

from wowdata import Source, Transform, Sink, Pipeline

pipe = (
    Pipeline(Source("people.csv"))
    .then(
        Transform(
            "cast",
            params={
                "types": {"age": "integer"},
                "on_error": "null"   # explicit choice, not hidden behaviour
            }
        )
    )
    .then(
        Transform(
            "filter",
            params={
                "where": "age >= 18"
            }
        )
    )
    .then(Sink("adults.csv"))
)

pipe.run()
pipe.to_yaml("pipeline.yaml")

Serialization That Humans Can Read

Every WowData™ pipeline can be serialized into a human-readable form.

Serialization is not configuration for machines—it is a cognitive artifact for people.

Our goal is that an ordinary user can:

  • read a serialized pipeline,

  • understand what it does,

  • explain it to someone else,

  • and safely modify it.

Example: Human-Readable IR Serialization

The same pipeline can be represented as a simple, inspectable Intermediate Representation (IR) saved as pipeline.yaml:

wowdata: 0
pipeline:
  start:
    uri: people.csv
    type: csv
  steps:
    - transform:
        op: cast
        params:
          types:
            age: integer
          on_error: null
    - transform:
        op: filter
        params:
          where: "age >= 18"
    - sink:
        uri: adults.csv
        type: csv

This IR is deliberately verbose and stable. It is designed to be read, reviewed, versioned, and edited by humans — not generated once and forgotten.

Because the IR mirrors the core concepts (Source → Transform → Sink), anyone who understands WowData™ can understand what this pipeline does.

Example: Loading a Pipeline from IR (YAML)

A serialized pipeline can be loaded back into WowData™ and executed:

from wowdata import Pipeline

pipe = Pipeline.from_yaml("pipeline.yaml")
pipe.run()

This allows pipelines to be: - authored or reviewed as YAML, - stored in version control, - shared between users or systems, - and executed without modifying Python code.

The same IR is used by both the programmatic API and future graphical tools.

Errors That Teach

In WowData™, error messages are part of the interface.

Every user-facing error:

  1. explains what went wrong,

  2. explains why it happened,

  3. suggests what to do next.

Errors are designed to teach correct mental models, not expose internal stack traces.

Example: Errors That Teach

from wowdata import Source, Sink, Pipeline

pipe = Pipeline(Source("missing.csv")).then(Sink("out.csv"))
pipe.run()

produces the following error

wowdata.errors.WowDataUserError: [E_SOURCE_NOT_FOUND] Source file not found: 'missing.csv'.
Hint: Check the path, working directory, and filename. If the file is elsewhere, pass an absolute path.

Built on the Best

WowData™ does not reinvent proven tools.

Instead, it piggybacks on best-in-class ecosystems:

  • mature ETL engines,

  • established data modelling and validation frameworks,

  • battle-tested execution backends.

Our contribution is an opinionated, human-centred layer that makes these tools usable by more people.

Open Source, Unapologetically

WowData™ is open source by principle, not convenience.

We believe that tools shaping how people think about data must be:

  • transparent,

  • inspectable,

  • extensible,

  • and owned by the community.

What WowData™ Is Not

WowData™ is not:

  • a low-code gimmick,

  • a black-box automation tool,

  • a thin wrapper around someone else’s API,

  • or a system that only experts can use safely.

If forced to choose between power and clarity, we choose clarity.

WowData™ is not trying to make data engineering smaller.

It is trying to make it thinkable.