Skip to content
Search
Examples

Partitioning

Description

Skip entire partitions without reading their data using filterPartitions(). Unlike filter() which reads all data then filters, partition pruning evaluates metadata first and only reads matching partitions - dramatically improving performance for large datasets.

Documentation

Code

<?php

declare(strict_types=1);

use function Flow\ETL\Adapter\CSV\from_csv;
use function Flow\ETL\DSL\{data_frame, lit, ref, to_output};

require __DIR__ . '/vendor/autoload.php';

data_frame()
    ->read(from_csv(__DIR__ . '/input/color=*/sku=*/*.csv'))
    ->filterPartitions(ref('color')->notEquals(lit('green')))
    ->collect()
    ->write(to_output(truncate: false))
    ->run();
Contributors

Built in the open.

Join us on GitHub
scroll back to top