Introduction

Data Frame

Data Frame
- Building Blocks
- Data Retrieval
- Data Manipulation
- Select/Drop
- Rename
- Map
- Filter
- Execution Mode
- Save Mode
- Join
- Group By
  - Aggregations
- Pivot
- Window Functions
- Sort
- Limit
- Offset
- Until
- Batch Processing
- Caching
- Partitioning
- Constraints
- Schema
- Display
- Error Handling
CLI

Edit

Benchmarks

⬅️️ Back

Table of Contents

Infrastructure
How Benchmarks Work
- Baseline Generation
- Pull Request Comparison
Benchmark Categories
Running Benchmarks Locally
Interpreting Results

Infrastructure

All benchmarks run on a dedicated self-hosted GitHub Actions runner hosted on Digital Ocean:

Specification	Value
Runner Name	`flow-php-runner`
vCPUs	2
Memory	2 GB
OS	Ubuntu 24.04 (LTS) x64

Using a dedicated runner ensures benchmark results are consistent and not affected by varying loads on shared GitHub-hosted runners.

How Benchmarks Work

Baseline Generation

When code is merged to the 1.x branch, the baseline workflow automatically runs all benchmarks and stores the results as artifacts. This baseline serves as the reference point for all future comparisons.

Baseline generation is triggered by:

Push to 1.x branch
Daily schedule (3 AM UTC)
Manual workflow dispatch

Pull Request Comparison

When a pull request is opened, benchmarks run on the dedicated runner and compare results against the stored 1.x baseline. The comparison is posted as a job summary.

The workflow uses pull_request_target trigger, which means the workflow code always comes from the 1.x branch. This prevents attackers from modifying the workflow file in their PR to execute malicious code on the self-hosted runner.

Benchmark Categories

Benchmarks are organized into five categories:

Category	Description
Extractors	Performance of data extraction from various sources (CSV, JSON, Parquet, etc.)
Transformers	Performance of data transformation operations
Loaders	Performance of data loading to various destinations
Building Blocks	Core framework operations (rows, entries, schema, etc.)
Parquet Library	Low-level Parquet file operations

Running Benchmarks Locally

To run benchmarks locally:

# Run all benchmarks
composer test:benchmark

# Run specific category
composer test:benchmark:extractor
composer test:benchmark:transformer
composer test:benchmark:loader
composer test:benchmark:building_blocks
composer test:benchmark:parquet-library

Interpreting Results

Benchmark results show timing comparisons between your changes and the baseline. Key metrics include:

Mean time - Average execution time
Mode - Most frequent execution time
Best/Worst - Range of execution times
Memory - Peak memory usage

A significant increase in execution time or memory usage may indicate a performance regression that should be investigated before merging.

Adapters

Libraries

PHP Extensions

pg_query

Symfony

Bridges

Contributors

Join us on GitHub

Introduction

Data Frame

#Benchmarks

#Infrastructure

#How Benchmarks Work

#Baseline Generation

#Pull Request Comparison

#Benchmark Categories

#Running Benchmarks Locally

#Interpreting Results