UNIFIED DATA PROCESSING FRAMEWORK

composer require flow-php/etl ~0.22.0

Changelog Release Cycle

Extracts

Read from various data sources.

Transforms

Shape and optimize for your needs.

Loads

Store and secure in one of many available data sinks.

Examples:

Description

Batch size defines the size of data frame. In other words, it defines how many rows are processed at once. This is useful when you have a large dataset, and you want to process it in smaller chunks. Larger batch size can speed up the processing, but it also requires more memory. There is no universal rule for the optimal batch size, it depends on the dataset and types of applied transformations.

The Default batch size is 1 this means that each extractor will yield one row at a time.

To process all rows at once, you can use collect or set batchSize to -1.

Download

composer.json

{
    "name": "flow-php/examples",
    "description": "Flow PHP - Examples",
    "license": "MIT",
    "type": "library",
    "require": {
        "flow-php/etl": "1.x-dev"
    }
}

code.php

<?php

declare(strict_types=1);

use function Flow\ETL\DSL\{data_frame, from_array, to_stream};

require __DIR__ . '/vendor/autoload.php';

data_frame()
    ->read(from_array([
        ['id' => 1, 'name' => 'John'],
        ['id' => 2, 'name' => 'Doe'],
        ['id' => 3, 'name' => 'Jane'],
        ['id' => 4, 'name' => 'Smith'],
        ['id' => 5, 'name' => 'Alice'],
    ]))
    ->batchSize(2)
    ->write(to_stream(__DIR__ . '/output.txt', truncate: false))
    ->run();

Output

+----+------+
| id | name |
+----+------+
|  1 | John |
|  2 |  Doe |
+----+------+
2 rows
+----+-------+
| id |  name |
+----+-------+
|  3 |  Jane |
|  4 | Smith |
+----+-------+
2 rows
+----+-------+
| id |  name |
+----+-------+
|  5 | Alice |
+----+-------+
1 rows

Contributors

Join us on GitHub