Introduction
Upgrade Guide
- Upgrading from 0.35.x to 0.36.x
- 1) flow-php/postgresql - RawCondition and RawExpression removed
- 2) flow-php/postgresql - Condition now extends Expression
- 3) flow-php/postgresql - DSL condition function renames
- 4) flow-php/postgresql - Schema builder methods accept Expression/Condition instead of strings
- 5) flow-php/postgresql - DSL functions split into separate files
- 6) flow-php/filesystem - Protocol and Backend removed, Mount rewritten
- 7) flow-php/filesystem - NativeLocalFilesystem::list() no longer sorts results
- 8) flow-php/symfony-filesystem-bundle - YAML schema now uses type: + protocol-as-key
- 9) flow-php/symfony-filesystem-bundle - FilesystemFactory interface and attribute changed
- 10) flow-php/symfony-filesystem-bundle - flow:filesystem:ls CLI flags reshuffled
- 11) flow-php/symfony-filesystem-bundle - flow:filesystem:stat rejects pattern paths
- 12) flow-php/filesystem-async-aws-bridge, flow-php/filesystem-azure-bridge - DSL protocol is the last argument with a default
- 13) flow-php/filesystem - path_memory() and path_stdout() DSL helpers removed
- 14) flow-php/etl - ConfigBuilder::cacheFilesystem() and externalSortFilesystem() added
- 15) flow-php/symfony-http-foundation-bridge - Output interface collapsed to a single loader(Path)
- 16) flow-php/filesystem - StdOutFilesystem tracks open streams per php:// target
- Upgrading from 0.34.x to 0.35.x
- Upgrading from 0.31.x to 0.32.x
- 1) Removal of Meilisearch Adapter
- 2) Removed deprecated DSL functions
- 3) Removed deprecated DataFrame methods
- 4) Removed deprecated Schema methods
- 5) Removed deprecated Definition methods
- 6) Removed deprecated FileExtractor and PathFiltering methods
- 7) Removed deprecated ScalarFunctionChain methods
- 8) Removed deprecated Config constants
- 9) Removed deprecated Transformers
- 10) Removed deprecated classes
- Upgrading from 0.28.x to 0.29.x
- Upgrading from 0.26.x to 0.27.x
- Upgrading from 0.16.x to 0.17.x
- Upgrading from 0.15.x to 0.16.x
- Upgrading from 0.14.x to 0.15.x
- Upgrading from 0.11.x to 0.14.x
- Upgrading from 0.10.x to 0.11.x
- Upgrading from 0.8.x to 0.10.x
- Upgrading from 0.7.x to 0.8.x
- Upgrading from 0.6.x to 0.7.x
- Upgrading from 0.5.x to 0.6.x
- Upgrading from 0.4.x to 0.5.x
- 1) Entry factory moved from extractors to FlowContext
- 2) Invalid schema has no fallback in NativeEntryFactory
- 3) BufferLoader was removed
- 4) Pipeline Closure
- 5) Parallelize
- 6) Rows in batch - Extractors
- 7) GoogleSheetExtractor
- 8) DataFrame::threadSafe() method was replaced by DataFrame::appendSafe()
- 9) Loaders - chunk size
- 10) Removed DSL functions: datetime_string(), json_string()
- 11) Removed Asynchronous Processing
- 12) CollectionEntry removal
- 13) Removed from*() methods from scalar entries
- 14) Removed deprecated Sha1IdFactory
- 15) Deprecate DSL Static classes
- Upgrading from 0.3.x to 0.4.x
This document provides guidelines for upgrading between versions of Flow PHP. Please follow the instructions for your specific version to ensure a smooth upgrade process.
Upgrading from 0.35.x to 0.36.x
1) flow-php/postgresql - RawCondition and RawExpression removed
The raw_cond() and raw_expr() escape hatches have been removed. All query builder operations are now covered by
type-safe DSL functions.
| Removed | Replacement |
|---|---|
raw_cond('NOT col') |
not_(is_true(col('col'))) |
raw_cond('a = ANY(b)') |
any_(col('a'), ComparisonOperator::EQ, col('b')) |
raw_cond("x IN ('a', 'b')") |
in_(col('x'), [literal('a'), literal('b')]) |
raw_cond("x NOT LIKE 'pg_%'") |
not_like(col('x'), literal('pg_%')) |
raw_expr('NOT col') |
not_(col('col')) |
raw_expr('a || b') |
concat(col('a'), col('b')) or binary_expr(col('a'), '||', col('b')) |
raw_expr('CASE x WHEN ...') |
case_when([when(...)], operand: col('x')) |
raw_expr('array_agg(DISTINCT ...)') |
agg('array_agg', [...], distinct: true)->withOrderBy(...) |
RawCondition class |
Use specific condition classes |
RawExpression class |
Use specific expression classes |
2) flow-php/postgresql - Condition now extends Expression
Conditions are now expressions — they can be used in SELECT lists, CASE WHEN, ORDER BY, etc.
// Conditions can now be aliased and used as expressions:
select(eq(col('a'), col('b'))->as('is_equal'));
// NOT works in both WHERE and SELECT:
not_(col('is_deleted'))->as('is_active');
// CASE WHEN accepts conditions directly:
case_when([when(eq(col('x'), literal(0)), literal('zero'))]);
3) flow-php/postgresql - DSL condition function renames
Function names have been unified following standard SQL builder conventions (jOOQ, Diesel, SQLAlchemy).
| Removed | Replacement | Reason |
|---|---|---|
neq() |
ne() |
Standard short form |
lte() |
le() |
Standard short form |
gte() |
ge() |
Standard short form |
is_in() |
in_() |
Drop is_ prefix, trailing underscore for PHP keyword |
is_distinct_from() |
distinct_from() |
Drop is_ prefix |
cond_and() |
and_() |
Drop cond_ prefix, trailing underscore for PHP keyword |
cond_or() |
or_() |
Drop cond_ prefix, trailing underscore for PHP keyword |
cond_not() |
not_() |
Drop cond_ prefix, trailing underscore for PHP keyword |
any_sub_select() |
any_() |
Unified — accepts both Expression and SelectFinalStep |
all_sub_select() |
all_() |
Unified — accepts both Expression and SelectFinalStep |
cond_true() |
is_true(literal(true)) |
Use is_true() with literal |
cond_false() |
is_true(literal(false)) |
Use is_true() with literal |
bool_cond() |
is_true() |
Wraps expression as boolean condition |
any_array() |
any_() |
Merged into unified any_() |
all_array() |
all_() |
Merged into unified all_() |
New functions added:
| Function | Purpose |
|---|---|
is_true(Expression) |
Wrap expression as boolean condition for WHERE |
not_like(Expression, Expression) |
NOT LIKE condition |
concat(Expression, ...) |
String concatenation with || operator |
Before:
use function Flow\PostgreSql\DSL\{cond_and, cond_not, cond_true, neq, lte, gte, is_in, any_sub_select};
select(col('name'))
->where(cond_and(
neq(col('status'), literal('deleted')),
lte(col('age'), literal(65)),
gte(col('age'), literal(18)),
is_in(col('role'), [literal('admin'), literal('user')]),
));
After:
use function Flow\PostgreSql\DSL\{and_, ne, le, ge, in_};
select(col('name'))
->where(and_(
ne(col('status'), literal('deleted')),
le(col('age'), literal(65)),
ge(col('age'), literal(18)),
in_(col('role'), [literal('admin'), literal('user')]),
));
4) flow-php/postgresql - Schema builder methods accept Expression/Condition instead of strings
Methods that previously accepted raw SQL strings now require typed Expression or Condition objects.
| Method | Before (string) | After (typed) |
|---|---|---|
ColumnDefinition::check() |
->check('age > 0') |
->check(gt(col('age'), literal(0))) |
ColumnDefinition::defaultRaw() |
->defaultRaw('CURRENT_TIMESTAMP') |
->defaultRaw(current_timestamp()) |
ColumnDefinition::generatedAs() |
->generatedAs("a || b") |
->generatedAs(concat(col('a'), col('b'))) |
CheckConstraint::create() |
::create('age > 0') |
::create(gt(col('age'), literal(0))) |
ExcludeConstraint::element() |
->element('col', '=') |
->element(col('col'), '=') |
ExcludeConstraint::where() |
->where('active = true') |
->where(eq(col('active'), literal(true))) |
CreateDomainBuilder::check() |
->check('VALUE > 0') |
->check(gt(col('VALUE'), literal(0))) |
CreateDomainBuilder::default() |
->default("'text'") |
->default(literal('text')) |
AlterDomainBuilder::addConstraint() |
->addConstraint('name', 'VALUE > 0') |
->addConstraint('name', gt(col('VALUE'), literal(0))) |
AlterDomainBuilder::setDefault() |
->setDefault('100') |
->setDefault(literal(100)) |
AlterTableBuilder::alterColumnSetDefault() |
->alterColumnSetDefault('col', "'val'") |
->alterColumnSetDefault('col', literal('val')) |
CreateRuleBuilder::where() |
->where("OLD.role = 'admin'") |
->where(eq(col('role', 'OLD'), literal('admin'))) |
5) flow-php/postgresql - DSL functions split into separate files
The monolithic functions.php has been split into 5 focused files (same namespace, no import changes needed):
| File | Purpose |
|---|---|
query.php |
Query building, expressions, tables, ordering, CTE, window, locking, transactions, cursors |
condition.php |
Comparisons, predicates, logic, JSON/array/regex operators |
schema.php |
DDL, constraints, indexes, maintenance, privileges, types, schema definitions |
client.php |
Connections, telemetry, mappers |
parser.php |
SQL parsing, formatting, analysis |
6) flow-php/filesystem - Protocol and Backend removed, Mount rewritten
The filesystem library has been redesigned around a single mount-protocol string. The Protocol and
Backend value types are gone; Mount now wraps just a protocol name.
Protocol class removed. Path::protocol() now returns string instead of a Protocol object.
Callsites that unpacked Protocol::$name / Protocol::scheme() / Protocol::is() are mechanical updates:
| Before | After |
|---|---|
$path->protocol()->name |
$path->protocol() |
$path->protocol()->scheme() |
$path->protocol() . '://' |
$path->protocol()->is('file') |
$path->protocol() === 'file' |
$fs->protocol()->validateScheme($path) |
$fs->mount()->supports($path) || throw new InvalidSchemeException(...) |
new Protocol('file') |
new Mount('file') (if you need a Mount) or plain 'file' (FilesystemTable::for accepts strings) |
Backend enum removed. There's no closed set of backends anymore — any filesystem can mount
under any protocol. The Symfony bundle schema now uses a plain string type: field (see below). If
you branched on Backend cases in application code, replace with string comparisons against the
factory type() or the mount protocol, whichever fits.
Mount rewritten. The shape is now:
final readonly class Mount
{
public string $protocol;
public function __construct(string $protocol); // validates against PROTOCOL_REGEX
public function supports(Path|string $path) : bool;
}
Filesystem::protocol() renamed to Filesystem::mount(). The return type changed from
Protocol to Mount. Every Filesystem implementation must rename the method.
Filesystem ctors take Mount directly. NativeLocalFilesystem, MemoryFilesystem,
StdOutFilesystem, AsyncAWSS3Filesystem, AzureBlobFilesystem now accept Mount as the first
constructor argument (local filesystems have a sensible default). DSL factory functions
(native_local_filesystem, memory_filesystem, stdout_filesystem, aws_s3_filesystem,
azure_filesystem) accept string $protocol as the last argument with a sensible default
('file', 'memory', 'stdout', 'aws-s3', 'azure-blob') and build the Mount internally — no
caller change needed unless you instantiate the filesystem class directly or mount two filesystems of
the same backend under distinct protocols.
Auto-alias dropped. Previously, mounting a single filesystem of a given backend would auto-register
its canonical scheme as an additional alias (e.g. mounting S3 as warehouse also made aws-s3
available). That behavior is gone — every mount is registered under exactly the protocol you pick. If
you need two protocols for the same filesystem, mount it twice explicitly.
FilesystemTable::for(Path|Protocol) → for(Path|string). Pass a Path or a plain protocol string.
7) flow-php/filesystem - NativeLocalFilesystem::list() no longer sorts results
Glob::glob() was replaced with lazy Webmozart\Glob\Iterator\GlobIterator to avoid materializing
the entire matching set up front (this gives a ~30× speedup on large trees when the caller only needs
the first N entries).
Side effect: NativeLocalFilesystem::list() no longer returns results in alphabetical order.
Output now follows filesystem traversal order. If your code depends on sort order, sort client-side
after consuming the generator:
$statuses = iterator_to_array($fs->list(path('/some/dir/**/*.txt')));
usort($statuses, static fn (FileStatus $a, FileStatus $b) => $a->path->uri() <=> $b->path->uri());
8) flow-php/symfony-filesystem-bundle - YAML schema now uses type: + protocol-as-key
The configuration schema changed significantly. The YAML key under filesystems: is now the mount
protocol (any valid URI scheme), and a separate type: field picks the factory.
Before:
flow_filesystem:
fstabs:
default:
filesystems:
file: ~
memory: ~
aws-s3:
bucket: '%env(S3_BUCKET)%'
After:
flow_filesystem:
fstabs:
default:
filesystems:
file:
type: file
memory:
type: memory
aws-s3: # mount protocol — can be any valid URI scheme
type: aws_s3 # factory lookup key
bucket: '%env(S3_BUCKET)%'
Benefits of the new shape:
- Mount the same backend twice under different protocols (e.g.
warehouse+archivebothtype: aws_s3with different buckets). - Protocol names are no longer tied to factory names — pick whatever reads well in your application.
Built-in type values: file, memory, stdout, aws_s3, azure_blob.
9) flow-php/symfony-filesystem-bundle - FilesystemFactory interface and attribute changed
// Before
interface FilesystemFactory
{
public function protocol() : Protocol;
public function create(string $mountName, array $config) : Filesystem;
}
#[AsFilesystemFactory(protocol: 'my-fs')]
// After
interface FilesystemFactory
{
public function type() : string;
public function create(string $protocol, array $config) : Filesystem;
}
#[AsFilesystemFactory(type: 'my_backend')]
The DI tag attribute is renamed from protocol to type. FilesystemFactoryRegistry::get() takes a
string $type instead of a Backend.
10) flow-php/symfony-filesystem-bundle - flow:filesystem:ls CLI flags reshuffled
| Before | After |
|---|---|
--long (default: off) |
4-column output is now the default; use --short to drop it |
--no-limit |
Removed — default is unlimited now. Use --limit=N to cap. |
(no --page-size) |
New --page-size=N (default 10) controls table page size |
(no --offset) |
New --offset=N skips the first N entries |
--format=json → JSON array |
--format=json now emits NDJSON (one JSON object per line) |
Default behavior: list all entries, paginated in tables of 10 rows; interactive terminals prompt between pages (Enter continues, "no" stops), piped output flows continuously. Size is formatted with binary units, Modified as ISO-8601 — both read from the backend listing response, no per-file HEAD.
11) flow-php/symfony-filesystem-bundle - flow:filesystem:stat rejects pattern paths
stat now returns Command::FAILURE with a clear error when given a pattern path (memory://*.txt,
**/*.parquet, …). Previously it returned metadata for the first match — confusing semantics. Use
flow:filesystem:ls for pattern inspection.
12) flow-php/filesystem-async-aws-bridge, flow-php/filesystem-azure-bridge - DSL protocol is the last argument with a default
The DSL factories expose the mount protocol as an optional last argument, defaulted to the conventional scheme. Common cases work without passing it:
aws_s3_filesystem($bucket, $client); // mounts under 'aws-s3'
azure_filesystem($blobService); // mounts under 'azure-blob'
// Pick a different protocol — e.g. mount the same bucket twice
aws_s3_filesystem($bucket, $client, protocol: 'warehouse');
azure_filesystem($blobService, protocol: 'archive');
13) flow-php/filesystem - path_memory() and path_stdout() DSL helpers removed
Build Path directly instead:
// Before
$mem = path_memory();
$out = path_stdout(['stream' => 'output']);
// After
$mem = path('memory://' . bin2hex(random_bytes(16)) . '.memory');
$out = path('stdout://' . bin2hex(random_bytes(16)) . '.stdout', ['stream' => 'output']);
14) flow-php/etl - ConfigBuilder::cacheFilesystem() and externalSortFilesystem() added
Point cache and external-sort mechanisms at any mounted protocol; defaults remain 'file'. The
CacheConfig and SortConfig value objects expose the chosen protocol as ->filesystemProtocol.
$config = config_builder()
->mount(aws_s3_filesystem($bucket, $client, protocol: 'sort-scratch'))
->externalSortFilesystem('sort-scratch')
->build();
15) flow-php/symfony-http-foundation-bridge - Output interface collapsed to a single loader(Path)
Output::memoryLoader(string $id) and Output::stdoutLoader() were replaced by
Output::loader(Path $path). FlowBufferedResponse gained a string $filesystem = 'memory'
constructor argument (buffer protocol); FlowStreamedResponse gained
string $stdoutFilesystemProtocol = 'stdout'. Each response builds the path with its configured
protocol and passes it to the Output.
// Before
new FlowBufferedResponse($extractor, new CsvOutput(), $transformations);
// After — same defaults, new constructor param available
new FlowBufferedResponse($extractor, new CsvOutput(), $transformations, filesystem: 'memory');
16) flow-php/filesystem - StdOutFilesystem tracks open streams per php:// target
Previously the "only one stdout stream" guard lived in FilesystemStreams (ETL core) and fired when
two writing streams used the stdout:// protocol. The check now lives in StdOutFilesystem itself
and is precise per underlying php:// target (stdout / stderr / output): two streams with
['stream' => 'stdout'] conflict; one stdout stream + one stderr stream do not. Error message
changed from "Only one stdout filesystem stream can be open at the same time" to
"Only one stream can be open at the same time for php://{target}".
Upgrading from 0.34.x to 0.35.x
1) flow-php/postgresql - DataType renamed to ColumnType
The DataType class used for schema/DDL definitions has been renamed to ColumnType to better communicate its purpose.
All related DSL functions have been renamed from data_type_* to column_type_*.
| Removed | Replacement |
|---|---|
Flow\PostgreSql\QueryBuilder\Schema\DataType |
Flow\PostgreSql\QueryBuilder\Schema\ColumnType |
Flow\PostgreSql\Parser\DataTypeParser |
Flow\PostgreSql\Parser\ColumnTypeParser |
data_type_integer() |
column_type_integer() |
data_type_smallint() |
column_type_smallint() |
data_type_bigint() |
column_type_bigint() |
data_type_boolean() |
column_type_boolean() |
data_type_text() |
column_type_text() |
data_type_varchar() |
column_type_varchar() |
data_type_char() |
column_type_char() |
data_type_numeric() |
column_type_numeric() |
data_type_decimal() |
column_type_decimal() |
data_type_real() |
column_type_real() |
data_type_double_precision() |
column_type_double_precision() |
data_type_date() |
column_type_date() |
data_type_time() |
column_type_time() |
data_type_timestamp() |
column_type_timestamp() |
data_type_timestamptz() |
column_type_timestamptz() |
data_type_interval() |
column_type_interval() |
data_type_uuid() |
column_type_uuid() |
data_type_json() |
column_type_json() |
data_type_jsonb() |
column_type_jsonb() |
data_type_bytea() |
column_type_bytea() |
data_type_inet() |
column_type_inet() |
data_type_cidr() |
column_type_cidr() |
data_type_macaddr() |
column_type_macaddr() |
data_type_serial() |
column_type_serial() |
data_type_smallserial() |
column_type_smallserial() |
data_type_bigserial() |
column_type_bigserial() |
data_type_array() |
column_type_array() |
data_type_custom() |
column_type_custom() |
data_type_from_string() |
column_type_from_string() |
Before:
use Flow\PostgreSql\QueryBuilder\Schema\DataType;
use function Flow\PostgreSql\DSL\data_type_integer;
use function Flow\PostgreSql\DSL\data_type_varchar;
column('age', data_type_integer());
column('name', data_type_varchar(255));
cast(ref('id'), DataType::bigint());
After:
use Flow\PostgreSql\QueryBuilder\Schema\ColumnType;
use function Flow\PostgreSql\DSL\column_type_integer;
use function Flow\PostgreSql\DSL\column_type_varchar;
column('age', column_type_integer());
column('name', column_type_varchar(255));
cast(ref('id'), ColumnType::bigint());
2) flow-php/postgresql - PostgreSqlType renamed to ValueType
The PostgreSqlType enum used for value binding/casting has been renamed to ValueType to better communicate its
purpose.
All related DSL functions have been renamed from pgsql_type_* to value_type_*.
| Removed | Replacement |
|---|---|
Flow\PostgreSql\Client\Types\PostgreSqlType |
Flow\PostgreSql\Client\Types\ValueType |
pgsql_type_text() |
value_type_text() |
pgsql_type_varchar() |
value_type_varchar() |
pgsql_type_integer() |
value_type_integer() |
pgsql_type_bigint() |
value_type_bigint() |
pgsql_type_smallint() |
value_type_smallint() |
pgsql_type_boolean() |
value_type_boolean() |
pgsql_type_float4() |
value_type_float4() |
pgsql_type_float8() |
value_type_float8() |
pgsql_type_numeric() |
value_type_numeric() |
pgsql_type_date() |
value_type_date() |
pgsql_type_timestamp() |
value_type_timestamp() |
pgsql_type_timestamptz() |
value_type_timestamptz() |
pgsql_type_json() |
value_type_json() |
pgsql_type_jsonb() |
value_type_jsonb() |
pgsql_type_uuid() |
value_type_uuid() |
pgsql_type_bytea() |
value_type_bytea() |
pgsql_type_inet() |
value_type_inet() |
pgsql_type_cidr() |
value_type_cidr() |
All other pgsql_type_*() functions |
Corresponding value_type_*() functions |
Before:
use Flow\PostgreSql\Client\Types\PostgreSqlType;
use function Flow\PostgreSql\DSL\pgsql_type_uuid;
use function Flow\PostgreSql\DSL\pgsql_type_text_array;
typed('550e8400-e29b-41d4-a716-446655440000', pgsql_type_uuid());
typed(['tag1', 'tag2'], pgsql_type_text_array());
typed(42, PostgreSqlType::INT4);
After:
use Flow\PostgreSql\Client\Types\ValueType;
use function Flow\PostgreSql\DSL\value_type_uuid;
use function Flow\PostgreSql\DSL\value_type_text_array;
typed('550e8400-e29b-41d4-a716-446655440000', value_type_uuid());
typed(['tag1', 'tag2'], value_type_text_array());
typed(42, ValueType::INT4);
Upgrading from 0.31.x to 0.32.x
1) Removal of Meilisearch Adapter
The Meilisearch adapter has been removed from Flow PHP. If you were using it, please migrate to Elasticsearch adapter.
2) Removed deprecated DSL functions
All type-related DSL functions have been moved from Flow\ETL\DSL to Flow\Types\DSL. Update your imports accordingly.
| Removed Function | Replacement |
|---|---|
chunks_from() |
batches() |
type_structure() |
\Flow\Types\DSL\type_structure() |
type_union() |
\Flow\Types\DSL\type_union() |
type_optional() |
\Flow\Types\DSL\type_optional() |
type_from_array() |
\Flow\Types\DSL\type_from_array() |
is_nullable() |
\Flow\Types\DSL\type_is_nullable() |
type_equals() |
\Flow\Types\DSL\type_equals() |
types() |
\Flow\Types\DSL\types() |
type_list() |
\Flow\Types\DSL\type_list() |
type_map() |
\Flow\Types\DSL\type_map() |
type_json() |
\Flow\Types\DSL\type_json() |
type_datetime() |
\Flow\Types\DSL\type_datetime() |
type_date() |
\Flow\Types\DSL\type_date() |
type_time() |
\Flow\Types\DSL\type_time() |
type_xml() |
\Flow\Types\DSL\type_xml() |
type_xml_element() |
\Flow\Types\DSL\type_xml_element() |
type_uuid() |
\Flow\Types\DSL\type_uuid() |
type_int() |
\Flow\Types\DSL\type_integer() |
type_integer() |
\Flow\Types\DSL\type_integer() |
type_string() |
\Flow\Types\DSL\type_string() |
type_float() |
\Flow\Types\DSL\type_float() |
type_boolean() |
\Flow\Types\DSL\type_boolean() |
type_instance_of() |
\Flow\Types\DSL\type_instance_of() |
type_resource() |
\Flow\Types\DSL\type_resource() |
type_array() |
\Flow\Types\DSL\type_array() |
type_callable() |
\Flow\Types\DSL\type_callable() |
type_null() |
\Flow\Types\DSL\type_null() |
type_enum() |
\Flow\Types\DSL\type_enum() |
struct_schema() |
structure_schema() |
get_type() |
\Flow\Types\DSL\get_type() |
print_schema() |
schema_to_ascii() |
type_is() |
\Flow\Types\DSL\type_is() |
type_is_any() |
\Flow\Types\DSL\type_is_any() |
dom_element_to_string() |
\Flow\Types\DSL\dom_element_to_string() |
3) Removed deprecated DataFrame methods
| Removed Method | Replacement |
|---|---|
DataFrame::validate() |
DataFrame::match() |
DataFrame::renameAll() |
DataFrame::renameEach(rename_replace(...)) |
DataFrame::renameAllLowerCase() |
DataFrame::renameEach(rename_style(StringStyles::LOWER)) |
DataFrame::renameAllUpperCase() |
DataFrame::renameEach(rename_style(StringStyles::UPPER)) |
DataFrame::renameAllUpperCaseFirst() |
DataFrame::renameEach(rename_style(StringStyles::UCFIRST)) |
DataFrame::renameAllUpperCaseWord() |
DataFrame::renameEach(rename_style(StringStyles::UCWORDS)) |
DataFrame::renameAllStyle() |
DataFrame::renameEach(rename_style(...)) |
4) Removed deprecated Schema methods
| Removed Method | Replacement |
|---|---|
Schema::entries() |
Schema::references()->all() |
Schema::getDefinition() |
Schema::get() |
Schema::nullable() |
Schema::makeNullable() |
5) Removed deprecated Definition methods
| Removed Method | Replacement |
|---|---|
Definition::nullable() |
Definition::makeNullable() |
This applies to all Definition implementations: BooleanDefinition, DateDefinition, DateTimeDefinition,
EnumDefinition, FloatDefinition, HTMLDefinition, HTMLElementDefinition, IntegerDefinition, JsonDefinition,
ListDefinition, MapDefinition, StringDefinition, StructureDefinition, TimeDefinition, UuidDefinition,
XMLDefinition, XMLElementDefinition.
6) Removed deprecated FileExtractor and PathFiltering methods
| Removed Method | Replacement |
|---|---|
FileExtractor::addFilter() |
FileExtractor::withPathFilter() |
PathFiltering::addFilter() |
PathFiltering::withPathFilter() |
7) Removed deprecated ScalarFunctionChain methods
| Removed Method | Replacement |
|---|---|
ScalarFunctionChain::domElementAttribute() |
ScalarFunctionChain::domElementAttributeValue() |
8) Removed deprecated Config constants
| Removed Constant | Replacement |
|---|---|
Config::CACHE_DIR_ENV |
CacheConfig::CACHE_DIR_ENV |
Config::SORT_MAX_MEMORY_ENV |
SortConfig::SORT_MAX_MEMORY_ENV |
9) Removed deprecated Transformers
| Removed Transformer | Replacement |
|---|---|
EntryNameStyleConverterTransformer |
Use DataFrame::renameEach(rename_style(...)) |
RenameAllCaseTransformer |
Use DataFrame::renameEach(rename_style(...)) |
RenameStrReplaceAllEntriesTransformer |
Use DataFrame::renameEach(rename_replace(...)) |
10) Removed deprecated classes
| Removed Class | Replacement |
|---|---|
Flow\ETL\Function\StyleConverter\StringStyles |
Flow\ETL\String\StringStyles |
Upgrading from 0.28.x to 0.29.x
1) JsonType now uses Json value object instead of string
The JsonType has been refactored to use a dedicated Json value object (similar to Uuid/UuidType pattern).
This allows static analysis tools to distinguish between regular strings and JSON strings.
Breaking Changes:
JsonType::assert()now returnsJsoninstance instead ofstringJsonType::cast()now returnsJsoninstance instead ofstringJsonType::isValid()now checks forJsoninstance (plain strings are no longer valid)Cast::cast('json', $value)function now returnsJsonobject instead of stringtype_json()return type annotation changed fromType<string>toType<Json>JsonEntry::value()now returns?Jsoninstead of?array(consistent withUuidEntry::value()returning?Uuid)JsonEntry::json()method removed (usevalue()instead)
Migration:
If you were using type_json()->cast($value) and expected a string, use ->toString():
Before:
$jsonString = type_json()->cast($array); // was string
After:
$json = type_json()->cast($array); // now Json object
$jsonString = $json->toString(); // get the string
$jsonArray = $json->toArray(); // get as array
If you were using JsonEntry::value() and expected an array:
Before:
$entry = json_entry('data', ['key' => 'value']);
$array = $entry->value(); // was array
After:
$entry = json_entry('data', ['key' => 'value']);
$json = $entry->value(); // now Json object
$array = $json?->toArray(); // get as array
$string = $json?->toString(); // get as string
If you were using JsonEntry::json():
Before:
$json = $entry->json();
After:
$json = $entry->value(); // json() method removed, use value() instead
New Json value object features:
use Flow\Types\Value\Json;
// Create from string
$json = new Json('{"key": "value"}');
// Create from array
$json = Json::fromArray(['key' => 'value']);
// Check if valid JSON
Json::isValid('{"key": "value"}'); // true
// Convert to string/array
$json->toString(); // '{"key":"value"}'
$json->toArray(); // ['key' => 'value']
// Json implements Stringable
(string) $json; // '{"key":"value"}'
// Json implements JsonSerializable
json_encode($json); // '{"key":"value"}'
Note: JsonEntry::value() now returns ?Json for consistency with UuidEntry::value() returning ?Uuid. Use
->toArray() or ->toString() on the Json object to get the underlying data.
Row methods behavior:
// Row::toArray() converts Json to array automatically (for convenient serialization)
$row->toArray(); // Returns ['data' => ['key' => 'value']] not ['data' => Json(...)]
// Row::valueOf() returns the raw value (Json object for json entries)
$row->valueOf('data'); // Returns Json object (use ->toArray() if you need array)
// Entry value() returns the typed value
$row->get('data')->value(); // Returns Json object (use ->toArray() if you need array)
Upgrading from 0.26.x to 0.27.x
1) Force EntryFactory $entryFactory to be required on array_to_row & array_to_row(s)
Before:
to_entry('name', 'data');
array_to_row([]);
array_to_rows([]);
After:
to_entry('name', 'data', flow_context(config())->entryFactory());
array_to_row([], flow_context(config())->entryFactory());
array_to_rows([], flow_context(config())->entryFactory());
Upgrading from 0.16.x to 0.17.x
1) Removed $nullable property from all types
Before:
type_string(nullable:true)->toString() // ?string
After:
type_optional(string())->toString() // ?string
2) Removed precision from float_type()
Before float_type() use to have default precision 6. This means that any operations on float had to round values
to given precision. The problem with this approach is that all operations now need to receive a dedicated rounding
option.
Instead, end users should handle precision of float columns through round() scalar function.
3) Moved all Types to Flow\Types\Type namespace
Before
\Flow\ETL\DSL\type_string(); // now deprecated, alias for \Flow\Types\DSL\type_string();
After
\Flow\Types\DSL\type_string();
Upgrading from 0.15.x to 0.16.x
1) Deprecated Flow\ETL\DataFrame::renameAll* methods
Methods:
Flow\ETL\DataFrame::renameAll(),Flow\ETL\DataFrame::renameAllLowerCase(),Flow\ETL\DataFrame::renameAllUpperCase(),Flow\ETL\DataFrame::renameAllUpperCaseFirst(),Flow\ETL\DataFrame::renameAllUpperCaseWord(),
Were deprecated in favor of using new method: DataFrame::renameEach() with proper RenameEntryStrategy object.
2) Deprecated RenameAllCaseTransformer & RenameStrReplaceAllEntriesTransformer
Selected transformers were deprecated in favor of using DataFrame::renameEach() with related RenameEntryStrategy:
RenameAllCaseTransformer->RenameCaseTransformer,RenameStrReplaceAllEntriesTransformer->RenameReplaceStrategy,
Upgrading from 0.14.x to 0.15.x
1) Removed Flow\ETL\Row\Schema\Matcher and implementations
Schema Matcher was the initial attempt to implement a schema evolution next to schema validation that over time got replaced with a different implementation of Schema Validator.
2) Renamed Flow\ETL\Row\Schema namespace into Flow\ETL\Schema.
This means all classes related to Schema now live under Flow\ETL\Schema namespace.
Upgrading from 0.11.x to 0.14.x
1) Replaced Flow\ETL\DataFrame::validate() with Flow\ETL\DataFrame::match()
The old method is now deprecated and will be removed in the next release.
2) Replaced Flow\ETL\Function\ScalarFunction\TypedScalarFunction with
Flow\ETL\Function\ScalarFunction\ScalarResult.
The old interface was used to allow defining the return type of the ScalarFunctions. It was replaced with a ScalarResult value object that is much more flexible than the interface, because it's allowing to return any type dynamically without making the scalar function stateful.
Upgrading from 0.10.x to 0.11.x
1) Removed StructureElement/struct_element/structure_element from StructureType Definition
Before:
type_structure([
struct_element('name', string()),
struct_element('age', integer()),
]);
After:
type_structure([
'name' => string(),
'age' => integer(),
]);
2) Doctrine DBAL Adapter
From now options for:
to_dbal_table_insert()to_db_table_update()
are passed as objects (instance of UpdateOptions|InsertOptions interfaces) and they are platform specific, so please use the proper class for the platform you are using.
- PostgreSQL
- PostgreSQLInsertOptions
- PostgreSQLUpdateOptions
- MySQL
- MySQLInsertOptions
- MySQLUpdateOptions
- Sqlite
- SQLiteInsertOptions
- SQLiteUpdateOptions
Upgrading from 0.8.x to 0.10.x
1) Providing multiple paths to a single extractor
From now to read from multiple locations use from_all(Extractor ...$extractors) : Exctractor extractor.
Before:
<?php
from_parquet([
path(__DIR__ . '/data/1.parquet'),
path(__DIR__ . '/data/2.parquet'),
]);
After:
<?php
from_all(
from_parquet(path(__DIR__ . '/data/1.parquet')),
from_parquet(path(__DIR__ . '/data/2.parquet')),
);
2) Passing optional arguments to extractors/loaders
From now all extractors/loaders are accepting only mandatory arguments,
all optional arguments should be passed through with* methods and fluent interface.
Before:
<?php
from_parquet(path(__DIR__ . '/data/1.parquet'), schema: $schema);
After:
<?php
from_parquet(path(__DIR__ . '/data/1.parquet'))->withSchema($schema);
Upgrading from 0.7.x to 0.8.x
1) Joins
To support joining bigger datasets, we had to move from initial NestedLoop join algorithm into Hash Join algorithm.
- the only supported coin expression is
=(equals) that can be grouped withANDandORoperators. joinPrefixis now always required, and by default is set to 'joined_'- join will always result all columns from both datasets, columns used in join condition will be prefixed with
joinPrefix.
Other than that, API stays the same.
Above changes were introduced in all 3 types of joins:
DataFrame::join()DataFrame::joinEach()DataFrame::crossJoin()
2) GroupBy
From now on, DataFrame::groupBy() method will return GroupedDataFrame object, which is nothing more than a GroupBy
statement Builder. To get the results, you first need to define the aggregation functions or optionally pivot the data.
Upgrading from 0.6.x to 0.7.x
1) DataFrame::appendSafe() method was removed
DataFrame::appendSafe() aka DataFrame::threadSafe() method was removed as it was introducing additional complexity
and was not used in any of the adapters.
Upgrading from 0.5.x to 0.6.x
1) Rows::merge() accepts single instance of Rows
Before:
Rows::merge(Rows ...$rows) : Rows
After:
Rows::merge(Rows $rows) : Rows
Upgrading from 0.4.x to 0.5.x
1) Entry factory moved from extractors to FlowContext
To improve code quality and reduce code coupling EntryFactory was removed from all constructors of extractors, in
favor of passing it into FlowContext & re-using same entry factory in a whole pipeline.
2) Invalid schema has no fallback in NativeEntryFactory
Before, passing Schema into NativeEntryFactory::create() had fallback when the given entry was not found in a passed
schema, now the schema has higher priority & fallback is no longer available, instead when the definition is missing in
a passed schema, InvalidArgumentException will be thrown.
3) BufferLoader was removed
BufferLoader was removed in favor of DataFrame::collect(int $batchSize = null) method which now accepts additional
argument $batchSize that will keep collecting Rows from Extractor until the given batch size is reached.
Which does exactly the same thing as BufferLoader did, but in a more generic way.
4) Pipeline Closure
Pipeline Closure was reduced to be only Loader Closure and it was moved to \Flow\ETL\Loader namespace. Additionally, \Closure::close method no longer requires Rows to be passed as an argument.
5) Parallelize
DataFrame::parallelize() method is deprecated, and it will be removed, instead use DataFrame::batchSize(int $size) method.
6) Rows in batch - Extractors
From now, file-based Extractors will always throw one Row at time, in order to merge them into bigger groups
use DataFrame::batchSize(int $size) just after extractor method.
Before:
<?php
(new Flow())
->read(CSV::from(__DIR__ . '/1_mln_rows.csv', rows_in_batch: 100))
->write(To::output())
->count();
After:
(new Flow())
->read(CSV::from(__DIR__ . '/1_mln_rows.csv',))
->batchSize(100)
->write(To::output())
->count();
Affected extractors:
- CSV
- Parquet
- JSON
- Text
- XML
- Avro
- DoctrineDBAL -
rows_in_batchwasn't removed, but now results are thrown row by row, instead of whole page. - GoogleSheet
7) GoogleSheetExtractor
Argument $rows_in_batch was renamed to $rows_per_page which no longer determines the size of the batch, but the size
of the page that will be fetched from Google API.
Rows are yielded one by one.
8) DataFrame::threadSafe() method was replaced by DataFrame::appendSafe()
DataFrame::appendSafe() is doing exactly the same thing as the old method, it's just more
descriptive and self-explanatory.
It's no longer mandatory to set this flat to true when using SaveMode::APPEND, it's now set automatically.
9) Loaders - chunk size
Loaders are no longer accepting chunk_size parameter, from now in order to control
the number of rows saved at once use DataFrame::batchSize(int $size) method.
10) Removed DSL functions: datetime_string(), json_string()
Those functions were removed in favor of accepting string values in related DSL functions:
datetime_string()=>datetime(),json_string()=>json()&json_object()
11) Removed Asynchronous Processing
More details can be found in this issue.
- Removed etl-adapter-amphp
- Removed etl-adapter-reactphp
- Removed
LocalSocketPipeline - Removed
DataFrame::pipeline()
12) CollectionEntry removal
After adding native & logical types into the Flow, we remove the CollectionEntry as obsolete. New types that cover it
better are: ListType, MapType & StructureType along with related new entry types.
13) Removed from*() methods from scalar entries
Removed BooleanEntry::from(), FloatEntry::from(), IntegerEntry::from(), StringEntry::fromDateTime() methods in
favor of using DSL functions.
14) Removed deprecated Sha1IdFactory
Class Sha1IdFactory was removed, use HashIdFactory class:
(new HashIdFactory('entry_name'))->withAlgorithm('sha1');
15) Deprecate DSL Static classes
DSL static classes were deprecated in favor of using functions defined in src/core/etl/src/Flow/ETL/DSL/functions.php
file.
Deprecated classes:
src/core/etl/src/Flow/ETL/DSL/From.phpsrc/core/etl/src/Flow/ETL/DSL/Handler.phpsrc/core/etl/src/Flow/ETL/DSL/To.phpsrc/core/etl/src/Flow/ETL/DSL/Transform.phpsrc/core/etl/src/Flow/ETL/DSL/Partitions.phpsrc/adapter/etl-adapter-avro/src/Flow/ETL/DSL/Avro.phpsrc/adapter/etl-adapter-chartjs/src/Flow/ETL/DSL/ChartJS.phpsrc/adapter/etl-adapter-csv/src/Flow/ETL/DSL/CSV.phpsrc/adapter/etl-adapter-doctrine/src/Flow/ETL/DSL/Dbal.phpsrc/adapter/etl-adapter-elasticsearch/src/Flow/ETL/DSL/Elasticsearch.phpsrc/adapter/etl-adapter-google-sheet/src/Flow/ETL/DSL/GoogleSheet.phpsrc/adapter/etl-adapter-json/src/Flow/ETL/DSL/Json.phpsrc/adapter/etl-adapter-meilisearch/src/Flow/ETL/DSL/Meilisearch.phpsrc/adapter/etl-adapter-parquet/src/Flow/ETL/DSL/Parquet.phpsrc/adapter/etl-adapter-text/src/Flow/ETL/DSL/Text.phpsrc/adapter/etl-adapter-xml/src/Flow/ETL/DSL/XML.php
Upgrading from 0.3.x to 0.4.x
1) Transformers replaced with scalar functions
Transformers are a really powerful tool that was used in Flow since the beginning, but that tool was too powerful for the simple cases that were needed, and introduced additional complexity and maintenance issues when they were handwritten.
We reworked most of the internal transformers to new scalar functions and entry scalar functions (based on the built-in functions), and we still internally use that powerful tool, but we don't expose it to end users, instead, we provide easy-to-use, covering all user needs functions.
All available functions can be found in ETL\Row\Function folder or in
ETL\DSL\functions file, and entry scalar functions are defined in
EntryScalarFunction.
Before:
<?php
use Flow\ETL\Extractor\MemoryExtractor;
use Flow\ETL\Flow;
use Flow\ETL\DSL\Transform;
(new Flow())
->read(new MemoryExtractor())
->rows(Transform::string_concat(['name', 'last name'], ' ', 'name'))
After:
<?php
use function Flow\ETL\DSL\concat;
use function Flow\ETL\DSL\lit;
use Flow\ETL\Extractor\MemoryExtractor;
use Flow\ETL\Flow;
(new Flow())
->read(new MemoryExtractor())
->withEntry('name', concat(ref('name'), lit(' '), ref('last name')))
2) ref function nullability
ref("entry_name") is no longer returning null when the entry is not found. Instead, it throws an exception.
The same behavior can be achieved through using a newly introduced optional function:
Before:
<?php
use function Flow\ETL\DSL\optional;
use function Flow\ETL\DSL\ref;
ref('non_existing_column')->cast('string');
After:
<?php
use function Flow\ETL\DSL\optional;
use function Flow\ETL\DSL\ref;
optional(ref('non_existing_column'))->cast('string');
// or
optional(ref('non_existing_column')->cast('string'));
3) Extractors output
Affected extractors:
- CSV
- JSON
- Avro
- DBAL
- GoogleSheet
- Parquet
- Text
- XML
Extractors are no longer returning data under an array entry called row, thanks to this unpacking row become
redundant.
Because of that all DSL functions are no longer expecting $entry_row_name parameter, if it was used anywhere,
please remove it.
Before:
<?php
(new Flow())
->read(From::array([['id' => 1, 'array' => ['a' => 1, 'b' => 2, 'c' => 3]]]))
->withEntry('row', ref('row')->unpack())
->renameAll('row.', '')
->drop('row')
->withEntry('array', ref('array')->arrayMerge(lit(['d' => 4])))
->write(To::memory($memory = new ArrayMemory()))
->run();
After:
<?php
(new Flow())
->read(From::array([['id' => 1, 'array' => ['a' => 1, 'b' => 2, 'c' => 3]]]))
->withEntry('array', ref('array')->arrayMerge(lit(['d' => 4])))
->write(To::memory($memory = new ArrayMemory()))
->run();
4) ConfigBuilder::putInputIntoRows() output is now prefixed with _ (underscore)
In order to avoid collisions with datasets columns, additional columns created after using putInputIntoRows()
would now be prefixed with _ (underscore) symbol.
Before:
<?php
$rows = (new Flow(Config::builder()->putInputIntoRows()))
->read(Json::from(__DIR__ . '/../Fixtures/timezones.json', 5))
->fetch();
foreach ($rows as $row) {
$this->assertSame(
[
...
'_input_file_uri',
],
\array_keys($row->toArray())
);
}
After:
<?php
$rows = (new Flow(Config::builder()->putInputIntoRows()))
->read(Json::from(__DIR__ . '/../Fixtures/timezones.json', 5))
->fetch();
foreach ($rows as $row) {
$this->assertSame(
[
...
'_input_file_uri',
],
\array_keys($row->toArray())
);
}