flow php

Save Mode

Flow DataFrame provides four save modes that control how data is written when the destination file or path already exists:

  • ExceptionIfExists (default): Throws an exception if the destination already exists
  • Append: Appends data to existing files (may cause duplicates)
  • Overwrite: Removes existing files and writes new data
  • Ignore: Skips writing if destination already exists

Changing Save Mode

Save mode is set using the same DataFrame::mode() method as ExecutionMode:

use Flow\ETL\Filesystem\SaveMode;

(data_frame())
    ->read(from_array([
        ['id' => 1, 'name' => 'John'],
        ['id' => 2, 'name' => 'Jane'],
    ]))
    ->mode(SaveMode::Overwrite)
    ->write(to_csv(__DIR__ . '/output.csv'))
    ->run();

Save Mode Behavior

ExceptionIfExists (Default)

Fails immediately if the destination file already exists:

(data_frame())
    ->read(from_array([['id' => 1]]))
    ->write(to_csv(__DIR__ . '/data.csv'))
    ->run();

// Running again throws:
// RuntimeException: Destination path "/path/to/data.csv" already exists

Use when: You want to ensure data is never accidentally overwritten.

Append

Creates additional files in the same directory when destination already exists:

(data_frame())
    ->read(from_array([['id' => 3]]))
    ->mode(SaveMode::Append)
    ->write(to_csv(__DIR__ . '/data.csv'))
    ->run();

// First run creates: data.csv
// Second run creates: data_<randomized_suffix>.csv (e.g., data_5f8a3b2c.csv)
// When reading data.csv, Flow reads all files matching the pattern: data*.csv

How it works:

  • If destination file doesn't exist: writes normally
  • If destination file exists: generates a new file with randomized name in the same directory
  • Flow treats file paths as directories that can contain multiple files with the same extension

File structure after multiple runs:

output/
├── orders.csv              # First run
├── orders_5ea42a0310.csv   # Second run
├── orders_ceadbdb4d1.csv   # Third run
└── orders_2140bfc5fd.csv   # Fourth run

When you read from orders.csv, Flow automatically reads all orders*.csv files in the directory.

Important: Flow does not check for duplicates. If you run the same pipeline twice, data will be duplicated across multiple files.

Use when:

  • Incrementally building datasets over multiple pipeline runs
  • Writing to log directories
  • Accumulating results where each run adds new files

Overwrite

Removes all existing files at the destination and writes fresh data:

(data_frame())
    ->read(from_array([['id' => 100]]))
    ->mode(SaveMode::Overwrite)
    ->write(to_csv(__DIR__ . '/data.csv'))
    ->run();

// File now contains only: id 100

Implementation details:

  • Files are written to temporary files with ._flow_php_tmp. prefix
  • After writing completes, existing files are removed
  • Temporary files are renamed to final names
  • For partitioned writes, removes all files in partition directories

Use when:

  • Regenerating reports or exports
  • Running pipelines that should replace previous results
  • Development and testing

Ignore

Silently skips writing if the destination already exists:

(data_frame())
    ->read(from_array([['id' => 999]]))
    ->mode(SaveMode::Ignore)
    ->write(to_csv(__DIR__ . '/data.csv'))
    ->run();

// If file exists: nothing happens, no error thrown
// If file doesn't exist: data is written normally

Use when:

  • Idempotent pipelines where re-running should have no effect
  • Avoiding duplicate work in batch processing
  • Resume-like behavior for incremental processing

Partitioned Writes

Save modes work with partitioned data:

(data_frame())
    ->read(from_array([
        ['date' => '2024-01-01', 'value' => 100],
        ['date' => '2024-01-02', 'value' => 200],
    ]))
    ->mode(SaveMode::Overwrite)
    ->partitionBy('date')
    ->write(to_parquet(__DIR__ . '/data'))
    ->run();

// Structure:
// data/date=2024-01-01/file.parquet
// data/date=2024-01-02/file.parquet

When using SaveMode::Overwrite with partitions:

  • All files within each affected partition directory are removed
  • Only partitions being written to are affected
  • Unrelated partitions remain untouched

Contributors

Join us on GitHub external resource
scroll back to top