flow php

Example: Batch size

Topic: Data frame

Description

Batch size defines the size of data frame. In other words, it defines how many rows are processed at once. This is useful when you have a large dataset, and you want to process it in smaller chunks. Larger batch size can speed up the processing, but it also requires more memory. There is no universal rule for the optimal batch size, it depends on the dataset and types of applied transformations.

The Default batch size is 1 this means that each extractor will yield one row at a time.

To process all rows at once, you can use collect or set batchSize to -1.

Code

<?php

declare(strict_types=1);

use function Flow\ETL\DSL\{data_frame, from_array, to_stream};

require __DIR__ . '/../../../autoload.php';

data_frame()
    ->read(from_array([
        ['id' => 1, 'name' => 'John'],
        ['id' => 2, 'name' => 'Doe'],
        ['id' => 3, 'name' => 'Jane'],
        ['id' => 4, 'name' => 'Smith'],
        ['id' => 5, 'name' => 'Alice'],
    ]))
    ->batchSize(2)
    ->write(to_stream(__DIR__ . '/output.txt', truncate: false))
    ->run();

Output

+----+------+
| id | name |
+----+------+
|  1 | John |
|  2 |  Doe |
+----+------+
2 rows
+----+-------+
| id |  name |
+----+-------+
|  3 |  Jane |
|  4 | Smith |
+----+-------+
2 rows
+----+-------+
| id |  name |
+----+-------+
|  5 | Alice |
+----+-------+
1 rows

Contributors

Join us on GitHub external resource
scroll back to top