Documentation

Example: Apply

Topic: Schema

Description

While iterating through dataset that comes from a source which does not support strict schema, like CSV/XML/JSON, you can tell the extractor what schema to apply to each read column.

Otherwise, DataFrame will try to guess the schema based on the data in the column. It might be problematic if the first rows would be empty or null. If the first row is a null, entry factory (mechanism responsible for creating entries) will assume that the column is of type string.

composer.json

{
    "name": "flow-php/examples",
    "description": "Flow PHP - Examples",
    "license": "MIT",
    "type": "library",
    "require": {
        "flow-php/etl": "1.x-dev"
    }
}

code.php

<?php

declare(strict_types=1);

use function Flow\ETL\DSL\{bool_schema, data_frame, from_array, int_schema, schema, str_schema, to_stream};
use Flow\ETL\Loader\StreamLoader\Output;
use Flow\ETL\Row\Schema\Metadata;

require __DIR__ . '/vendor/autoload.php';

$schema = schema(
    int_schema('id', $nullable = false),
    str_schema('name', $nullable = true),
    bool_schema('active', $nullable = false, Metadata::empty()->add('key', 'value')),
);

data_frame()
    ->read(
        from_array([
            ['id' => 1, 'name' => 'Product 1', 'active' => true],
            ['id' => 2, 'name' => 'Product 2', 'active' => false],
            ['id' => 3, 'name' => 'Product 3', 'active' => true],
        ])->withSchema($schema)
    )
    ->collect()
    ->write(to_stream(__DIR__ . '/output.txt', truncate: false, output: Output::schema))
    ->run();

Output

schema
|-- id: integer
|-- name: ?string
|-- active: boolean

Examples

- Array
- Parquet
- Csv
- Json
- Jsonl
- Xml
- Database
- Data frame
- Elasticsearch
- Http dynamic
- Array
- Parquet
- Csv
- Jsonl
- Text
- Xml
- Database
- Database upsert
- Elasticsearch
- Array expand
- Array unpack
- Filter divide
- Filter mod
- Literals
- Math
- Size
- Sort
- When
- When null
- When odd
- Local
- Azure
- S3
- Stdout
- Validate
- Apply
- Display
- Inferring
- Join
- Join each
- Average
- First
- Group by
- Group by sum
- Last
- Max
- Min
- Sum
- Dens rank

Contributors

Join us on GitHub

Documentation

Core

Adapters

Libraries