flow php

Example: Apply

Topic: Schema

Description

While iterating through dataset that comes from a source which does not support strict schema, like CSV/XML/JSON, you can tell the extractor what schema to apply to each read column.

Otherwise, DataFrame will try to guess the schema based on the data in the column. It might be problematic if the first rows would be empty or null. If the first row is a null, entry factory (mechanism responsible for creating entries) will assume that the column is of type string.

Code

<?php

declare(strict_types=1);

use function Flow\ETL\DSL\{bool_schema, data_frame, from_array, int_schema, schema, str_schema, to_stream};
use Flow\ETL\Loader\StreamLoader\Output;
use Flow\ETL\Row\Schema\Metadata;

require __DIR__ . '/../../../autoload.php';

$schema = schema(
    int_schema('id', $nullable = false),
    str_schema('name', $nullable = true),
    bool_schema('active', $nullable = false, Metadata::empty()->add('key', 'value')),
);

data_frame()
    ->read(
        from_array([
            ['id' => 1, 'name' => 'Product 1', 'active' => true],
            ['id' => 2, 'name' => 'Product 2', 'active' => false],
            ['id' => 3, 'name' => 'Product 3', 'active' => true],
        ])->withSchema($schema)
    )
    ->collect()
    ->write(to_stream(__DIR__ . '/output.txt', truncate: false, output: Output::schema))
    ->run();

Output

schema
|-- id: integer
|-- name: string
|-- active: boolean

Contributors

Join us on GitHub external resource
scroll back to top