flow php

Data Retrieval

DataFrame provides several methods for retrieving processed data. These methods are trigger operations that execute the entire pipeline.

Memory-Safe Retrieval (Recommended)

These methods use generators to maintain constant memory usage regardless of dataset size:

get() - Retrieve as Rows batches

<?php

use function Flow\ETL\DSL\{data_frame, from_array};

$dataFrame = data_frame()->read(from_array($largeDataset));

foreach ($dataFrame->get() as $rows) {
    echo "Processing batch of " . $rows->count() . " rows\n";
    // Process each batch
    foreach ($rows as $row) {
        // Process individual row
    }
}

getEach() - Retrieve individual Rows

<?php

foreach ($dataFrame->getEach() as $row) {
    echo "ID: " . $row->get('id')->value() . "\n";
    echo "Name: " . $row->get('name')->value() . "\n";
}

getAsArray() - Retrieve as array batches

<?php

foreach ($dataFrame->getAsArray() as $rowsArray) {
    // $rowsArray is an array of arrays
    foreach ($rowsArray as $rowArray) {
        echo "ID: " . $rowArray['id'] . "\n";
    }
}

getEachAsArray() - Retrieve individual arrays

<?php

foreach ($dataFrame->getEachAsArray() as $rowArray) {
    echo "ID: " . $rowArray['id'] . "\n";
    echo "Name: " . $rowArray['name'] . "\n";
}

fetch() - Load into memory

<?php

// Fetch limited results (safe)
$firstTen = $dataFrame->fetch(10);
foreach ($firstTen as $row) {
    // Process row
}

// Fetch all results (dangerous for large datasets!)
$allRows = $dataFrame->fetch(); // Can cause memory exhaustion

⚠️ Memory Warning: The fetch() method loads all requested rows into memory at once. Without a limit parameter, it will attempt to load the entire dataset into memory, which can cause memory exhaustion. Always use with a reasonable limit or prefer generator-based methods.

count() - Count total rows

<?php

$totalCount = $dataFrame->count();
echo "Total rows: $totalCount\n";

⚠️ Performance Warning: The count() method must process the entire dataset to return the total count, which can be expensive for large datasets. Consider whether you actually need the exact count or if an approximation would suffice.

Iteration with Callback

forEach() - Process with callback

<?php

$dataFrame->forEach(function (Rows $rows) {
    echo "Processing batch of " . $rows->count() . " rows\n";
    // Custom processing logic
});

Contributors

Join us on GitHub external resource
scroll back to top