Introduction

Data Frame

Data Frame
- Building Blocks
- Data Retrieval
- Data Manipulation
- Select/Drop
- Rename
- Map
- Filter
- Execution Mode
- Save Mode
- Join
- Group By
  - Aggregations
- Pivot
- Window Functions
- Sort
- Limit
- Offset
- Until
- Batch Processing
- Caching
- Partitioning
- Constraints
- Schema
- Display
- Error Handling
CLI

Save Mode

Changing Save Mode
Save Mode Behavior
Partitioned Writes

Flow DataFrame provides four save modes that control how data is written when the destination file or path already exists:

ExceptionIfExists (default): Throws an exception if the destination already exists
Append: Appends data to existing files (may cause duplicates)
Overwrite: Removes existing files and writes new data
Ignore: Skips writing if destination already exists

Changing Save Mode

Save mode is set using the same DataFrame::mode() method as ExecutionMode:

use Flow\ETL\Filesystem\SaveMode;

(data_frame())
    ->read(from_array([
        ['id' => 1, 'name' => 'John'],
        ['id' => 2, 'name' => 'Jane'],
    ]))
    ->mode(SaveMode::Overwrite)
    ->write(to_csv(__DIR__ . '/output.csv'))
    ->run();

Save Mode Behavior

ExceptionIfExists (Default)

Fails immediately if the destination file already exists:

(data_frame())
    ->read(from_array([['id' => 1]]))
    ->write(to_csv(__DIR__ . '/data.csv'))
    ->run();

// Running again throws:
// RuntimeException: Destination path "/path/to/data.csv" already exists

Use when: You want to ensure data is never accidentally overwritten.

Append

Creates additional files in the same directory when destination already exists:

(data_frame())
    ->read(from_array([['id' => 3]]))
    ->mode(SaveMode::Append)
    ->write(to_csv(__DIR__ . '/data.csv'))
    ->run();

// First run creates: data.csv
// Second run creates: data_<randomized_suffix>.csv (e.g., data_5f8a3b2c.csv)
// When reading data.csv, Flow reads all files matching the pattern: data*.csv

How it works:

If destination file doesn't exist: writes normally
If destination file exists: generates a new file with randomized name in the same directory
Flow treats file paths as directories that can contain multiple files with the same extension

File structure after multiple runs:

output/
├── orders.csv              # First run
├── orders_5ea42a0310.csv   # Second run
├── orders_ceadbdb4d1.csv   # Third run
└── orders_2140bfc5fd.csv   # Fourth run

When you read from orders.csv, Flow automatically reads all orders*.csv files in the directory.

Important: Flow does not check for duplicates. If you run the same pipeline twice, data will be duplicated across multiple files.

Use when:

Incrementally building datasets over multiple pipeline runs
Writing to log directories
Accumulating results where each run adds new files

Overwrite

Removes all existing files at the destination and writes fresh data:

(data_frame())
    ->read(from_array([['id' => 100]]))
    ->mode(SaveMode::Overwrite)
    ->write(to_csv(__DIR__ . '/data.csv'))
    ->run();

// File now contains only: id 100

Implementation details:

Files are written to temporary files with ._flow_php_tmp. prefix
After writing completes, existing files are removed
Temporary files are renamed to final names
For partitioned writes, removes all files in partition directories

Use when:

Regenerating reports or exports
Running pipelines that should replace previous results
Development and testing

Ignore

Silently skips writing if the destination already exists:

(data_frame())
    ->read(from_array([['id' => 999]]))
    ->mode(SaveMode::Ignore)
    ->write(to_csv(__DIR__ . '/data.csv'))
    ->run();

// If file exists: nothing happens, no error thrown
// If file doesn't exist: data is written normally

Use when:

Idempotent pipelines where re-running should have no effect
Avoiding duplicate work in batch processing
Resume-like behavior for incremental processing

Partitioned Writes

Save modes work with partitioned data:

(data_frame())
    ->read(from_array([
        ['date' => '2024-01-01', 'value' => 100],
        ['date' => '2024-01-02', 'value' => 200],
    ]))
    ->mode(SaveMode::Overwrite)
    ->partitionBy('date')
    ->write(to_parquet(__DIR__ . '/data'))
    ->run();

// Structure:
// data/date=2024-01-01/file.parquet
// data/date=2024-01-02/file.parquet

When using SaveMode::Overwrite with partitions:

All files within each affected partition directory are removed
Only partitions being written to are affected
Unrelated partitions remain untouched

Adapters

Libraries

PHP Extensions

pg_query

Bridges

Contributors

Join us on GitHub

Introduction

Data Frame

#Save Mode

#Changing Save Mode

#Save Mode Behavior

#ExceptionIfExists (Default)

#Append

#Overwrite

#Ignore

#Partitioned Writes

Adapters

Libraries

PHP Extensions

Bridges

Contributors

Save Mode

Changing Save Mode

Save Mode Behavior

ExceptionIfExists (Default)

Append

Overwrite

Ignore

Partitioned Writes