flow php

UNIFIED DATA PROCESSING FRAMEWORK

composer require flow-php/etl ^0.10.0

Changelog

elephant
extract

Extracts

Read from various data sources.

arrow
transform

Transforms

Shape and optimize for your needs.

arrow
load

Loads

Store and secure in one of many available data sinks.

Examples:

Description

Batch size defines the size of data frame. In other words, it defines how many rows are processed at once. This is useful when you have a large dataset, and you want to process it in smaller chunks. Larger batch size can speed up the processing, but it also requires more memory. There is no universal rule for the optimal batch size, it depends on the dataset and types of applied transformations.

The Default batch size is 1 this means that each extractor will yield one row at a time.

To process all rows at once, you can use collect or set batchSize to -1.

Code

<?php

declare(strict_types=1);

use function Flow\ETL\DSL\{data_frame, from_array, to_stream};

require __DIR__ . '/../../../autoload.php';

data_frame()
    ->read(from_array([
        ['id' => 1, 'name' => 'John'],
        ['id' => 2, 'name' => 'Doe'],
        ['id' => 3, 'name' => 'Jane'],
        ['id' => 4, 'name' => 'Smith'],
        ['id' => 5, 'name' => 'Alice'],
    ]))
    ->batchSize(2)
    ->write(to_stream(__DIR__ . '/output.txt', truncate: false))
    ->run();

Output

+----+------+
| id | name |
+----+------+
|  1 | John |
|  2 |  Doe |
+----+------+
2 rows
+----+-------+
| id |  name |
+----+-------+
|  3 |  Jane |
|  4 | Smith |
+----+-------+
2 rows
+----+-------+
| id |  name |
+----+-------+
|  5 | Alice |
+----+-------+
1 rows

Contributors

Join us on GitHub external resource
scroll back to top