Introduction

Data Frame

Data Frame
- Building Blocks
- Data Retrieval
- Data Manipulation
- Select/Drop
- Rename
- Map
- Filter
- Join
- Group By
  - Aggregations
- Pivot
- Window Functions
- Sort
- Limit
- Offset
- Until
- Batch Processing
- Caching
- Partitioning
- Constraints
- Schema
- Display
- Error Handling
CLI

Schema

⬅️️ Back

Schema defines the structure and validation rules for DataFrame data. It provides type safety, data validation, and metadata management for your data processing pipelines.

Understanding Schema Components

A schema consists of entry definitions that specify:

Entry Name: The column/field identifier
Type: The expected data type (class string)
Nullable: Whether NULL values are permitted
Metadata: Key-value pairs for additional context

Schema Validation Strategies

Flow PHP provides two built-in validation strategies:

StrictValidator - Rows must exactly match the schema; extra entries cause validation failure
SelectiveValidator - Only validates entries defined in schema; ignores extra entries

By default, DataFrame uses StrictValidator, but you can specify a different validator as the second parameter to DataFrame::match().

Basic Schema Matching

Use DataFrame::match() to validate data against a schema:

<?php

use function Flow\ETL\DSL\{data_frame, from_array, schema, int_schema, str_schema, bool_schema, to_output};
use Flow\ETL\Row\Schema\Metadata;

data_frame()
    ->read(from_array([
        ['id' => 1, 'name' => 'Product 1', 'active' => true],
        ['id' => 2, 'name' => 'Product 2', 'active' => false],
        ['id' => 3, 'name' => 'Product 3', 'active' => true]
    ]))
    ->match(
        schema(
            int_schema('id', $nullable = false),
            str_schema('name', $nullable = false),
            bool_schema('active', $nullable = false, Metadata::empty()->add('key', 'value')),
        )
    )
    ->write(to_output(false, Output::rows_and_schema))
    ->run();

Schema Validation with Selective Strategy

<?php

// Only validate defined fields, ignore extra ones
data_frame()
    ->read(from_array([
        ['id' => 1, 'name' => 'John', 'extra_field' => 'ignored'],
        ['id' => 2, 'name' => 'Jane', 'another_extra' => 'also ignored'],
    ]))
    ->match(
        schema(
            int_schema('id'),
            str_schema('name')
        ),
        schema_selective_validator() // Only validate id and name, ignore other fields
    )
    ->write(to_output())
    ->run();

Adapters

Libraries

Bridges

Contributors

Join us on GitHub

Introduction

Data Frame

#Schema

#Understanding Schema Components

#Schema Validation Strategies

#Basic Schema Matching

#Schema Validation with Selective Strategy