flow php

UNIFIED DATA PROCESSING FRAMEWORK

composer require flow-php/etl ^0.10.0

Changelog

elephant
extract

Extracts

Read from various data sources.

arrow
transform

Transforms

Shape and optimize for your needs.

arrow
load

Loads

Store and secure in one of many available data sinks.

Examples:

Description

While iterating through dataset that comes from a source which does not support strict schema, like CSV/XML/JSON, you can tell the extractor what schema to apply to each read column.

Otherwise, DataFrame will try to guess the schema based on the data in the column. It might be problematic if the first rows would be empty or null. If the first row is a null, entry factory (mechanism responsible for creating entries) will assume that the column is of type string.

Code

<?php

declare(strict_types=1);

use function Flow\ETL\DSL\{bool_schema, data_frame, from_array, int_schema, schema, str_schema, to_stream};
use Flow\ETL\Loader\StreamLoader\Output;
use Flow\ETL\Row\Schema\Metadata;

require __DIR__ . '/../../../autoload.php';

$schema = schema(
    int_schema('id', $nullable = false),
    str_schema('name', $nullable = true),
    bool_schema('active', $nullable = false, Metadata::empty()->add('key', 'value')),
);

data_frame()
    ->read(
        from_array([
            ['id' => 1, 'name' => 'Product 1', 'active' => true],
            ['id' => 2, 'name' => 'Product 2', 'active' => false],
            ['id' => 3, 'name' => 'Product 3', 'active' => true],
        ])->withSchema($schema)
    )
    ->collect()
    ->write(to_stream(__DIR__ . '/output.txt', truncate: false, output: Output::schema))
    ->run();

Output

schema
|-- id: integer
|-- name: string
|-- active: boolean

Contributors

Join us on GitHub external resource
scroll back to top