flow php

Flow Command Line Interface

Installation

composer require flow-php/cli

In some cases, it might make sense to install the CLI globally:

composer global require flow-php/cli

Now you can run the CLI using the flow command.

Commands

Config

All Flow CLI Commands can be configured using --config option. The option accepts a path to a configuration file in php that returns an Config or ConfigBuilder instance.

.flow.php

<?php

use function Flow\ETL\DSL\config_builder;

return config_builder()
    ->id('execution-id');

flow read --config .flow.php orders.csv

One of the most common use cases is to mount custom filesystem into Flow fstab to access remote files through CLI.

$ flow
Flow PHP - Data processing framework

Usage:
  command [options] [arguments]

Options:
  -h, --help            Display help for the given command. When no command is given display help for the list command
  -q, --quiet           Do not output any message
  -V, --version         Display this application version
      --ansi|--no-ansi  Force (or disable --no-ansi) ANSI output
  -n, --no-interaction  Do not ask any interactive question
  -v|vv|vvv, --verbose  Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

Available commands:
  completion             Dump the shell completion script
  help                   Display help for a command
  list                   List commands
 file
  file:convert           [convert] Read data from a file.
  file:read              [read] Read data from a file.
  file:rows:count        [count] Read data schema from a file.
  file:schema            [schema] Read data schema from a file.
 parquet
  parquet:read           [parquet:read:data] Read data from parquet file
  parquet:read:metadata  Read metadata from parquet file
 pipeline
  pipeline:run           [run] Execute ETL pipeline from a php/json file.

file:convert alias convert

Description:
  Read data from a file.

Usage:
  file:convert [options] [--] <input-file> <output-file>
  convert

Arguments:
  input-file                                                         Path to a file that should be converted to another format.
  output-file                                                        Path where converted file should be saved.

Options:
      --input-file-format=INPUT-FILE-FORMAT                          File format. When not set file format is guessed from input file path extension
      --input-file-batch-size=INPUT-FILE-BATCH-SIZE                  Number of rows that are going to be read and displayed in one batch, when set to -1 whole dataset will be displayed at once [default: 100]
      --input-file-limit=INPUT-FILE-LIMIT                            Limit number of rows that are going to be used to infer file schema, when not set whole file is analyzed
      --output-file-format=OUTPUT-FILE-FORMAT                        File format. When not set file format is guessed from output file path extension
      --output-overwrite[=OUTPUT-OVERWRITE]                          When set output file will be overwritten if exists
      --schema-auto-cast[=SCHEMA-AUTO-CAST]                          When set Flow will try to automatically cast values to more precise data types, for example datetime strings will be casted to datetime type [default: false]
      --analyze[=ANALYZE]                                            Collect processing statistics and print them. [default: false]
      --config=CONFIG                                                Path to a local php file that MUST return instance of: Flow\ETL\Config
      --input-json-pointer=INPUT-JSON-POINTER                        JSON Pointer to a subtree from which schema should be extracted
      --input-json-pointer-entry-name                                When set, JSON Pointer will be used as an entry name in the schema
      --input-csv-header[=INPUT-CSV-HEADER]                          When set, CSV header will be used as a schema
      --input-csv-empty-to-null[=INPUT-CSV-EMPTY-TO-NULL]            When set, empty CSV values will be treated as NULL values
      --input-csv-separator=INPUT-CSV-SEPARATOR                      CSV separator character
      --input-csv-enclosure=INPUT-CSV-ENCLOSURE                      CSV enclosure character
      --input-csv-escape=INPUT-CSV-ESCAPE                            CSV escape character
      --output-csv-header[=OUTPUT-CSV-HEADER]                        When set, CSV header will be used as a schema
      --output-csv-new-line-separator=OUTPUT-CSV-NEW-LINE-SEPARATOR  When set, empty CSV values will be treated as NULL values
      --output-csv-separator=OUTPUT-CSV-SEPARATOR                    CSV separator character
      --output-csv-enclosure=OUTPUT-CSV-ENCLOSURE                    CSV enclosure character
      --output-csv-escape=OUTPUT-CSV-ESCAPE                          CSV escape character
      --output-csv-date-time-format=OUTPUT-CSV-DATE-TIME-FORMAT      DateTime format for CSV output
      --input-xml-node-path=INPUT-XML-NODE-PATH                      XML node path to a subtree from which schema should be extracted, for example /root/element This is not xpath, just a node names separated by slash
      --input-xml-buffer-size=INPUT-XML-BUFFER-SIZE                  XML buffer size in bytes
      --input-parquet-columns=INPUT-PARQUET-COLUMNS                  Columns to read from parquet file (multiple values allowed)
      --input-parquet-offset=INPUT-PARQUET-OFFSET                    Offset to start reading from
  -h, --help                                                         Display help for the given command. When no command is given display help for the list command
  -q, --quiet                                                        Do not output any message
  -V, --version                                                      Display this application version
      --ansi|--no-ansi                                               Force (or disable --no-ansi) ANSI output
  -n, --no-interaction                                               Do not ask any interactive question
  -v|vv|vvv, --verbose                                               Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

file:schema alias schema

$ flow file:schema --help
Description:
  Read data schema from a file.

Usage:
  file:schema [options] [--] <input-file>
  schema

Arguments:
  input-file                                               Path to a file from which schema should be extracted.

Options:
      --input-file-format=INPUT-FILE-FORMAT                Source file format. When not set file format is guessed from source file path extension
      --input-file-limit=INPUT-FILE-LIMIT                  Limit number of rows that are going to be used to infer file schema, when not set whole file is analyzed
      --output-pretty                                      Pretty print schema
      --output-table                                       Pretty schema as ascii table
      --schema-auto-cast[=SCHEMA-AUTO-CAST]                When set Flow will try to automatically cast values to more precise data types, for example datetime strings will be casted to datetime type [default: false]
      --config=CONFIG                                      Path to a local php file that MUST return instance of: Flow\ETL\Config
      --input-json-pointer=INPUT-JSON-POINTER              JSON Pointer to a subtree from which schema should be extracted
      --input-json-pointer-entry-name                      When set, JSON Pointer will be used as an entry name in the schema
      --input-csv-header[=INPUT-CSV-HEADER]                When set, CSV header will be used as a schema
      --input-csv-empty-to-null[=INPUT-CSV-EMPTY-TO-NULL]  When set, empty CSV values will be treated as NULL values
      --input-csv-separator=INPUT-CSV-SEPARATOR            CSV separator character
      --input-csv-enclosure=INPUT-CSV-ENCLOSURE            CSV enclosure character
      --input-csv-escape=INPUT-CSV-ESCAPE                  CSV escape character
      --input-xml-node-path=INPUT-XML-NODE-PATH            XML node path to a subtree from which schema should be extracted, for example /root/element This is not xpath, just a node names separated by slash
      --input-xml-buffer-size=INPUT-XML-BUFFER-SIZE        XML buffer size in bytes
      --input-parquet-columns=INPUT-PARQUET-COLUMNS        Columns to read from parquet file (multiple values allowed)
      --input-parquet-offset=INPUT-PARQUET-OFFSET          Offset to start reading from
  -h, --help                                               Display help for the given command. When no command is given display help for the list command
  -q, --quiet                                              Do not output any message
  -V, --version                                            Display this application version
      --ansi|--no-ansi                                     Force (or disable --no-ansi) ANSI output
  -n, --no-interaction                                     Do not ask any interactive question
  -v|vv|vvv, --verbose                                     Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

Example:

$ flow schema orders.csv --table --auto-cast
+------------+----------+----------+-------------+----------+
|       name |     type | nullable | scalar_type | metadata |
+------------+----------+----------+-------------+----------+
|   order_id |     uuid |    false |             |       [] |
| created_at | datetime |    false |             |       [] |
| updated_at | datetime |    false |             |       [] |
|   discount |   scalar |     true |      string |       [] |
|    address |     json |    false |             |       [] |
|      notes |     json |    false |             |       [] |
|      items |     json |    false |             |       [] |
+------------+----------+----------+-------------+----------+
7 rows

file:read alias read

$ flow read --help
Description:
  Read data from a file.

Usage:
  file:read [options] [--] <input-file>
  read

Arguments:
  input-file                                               Path to a file from which schema should be extracted.

Options:
      --input-file-format=INPUT-FILE-FORMAT                File format. When not set file format is guessed from source file path extension
      --input-file-batch-size=INPUT-FILE-BATCH-SIZE        Number of rows that are going to be read and displayed in one batch, when set to -1 whole dataset will be displayed at once [default: 100]
      --input-file-limit=INPUT-FILE-LIMIT                  Limit number of rows that are going to be used to infer file schema, when not set whole file is analyzed
      --output-truncate=OUTPUT-TRUNCATE                    Truncate output to given number of characters, when set to -1 output is not truncated at all [default: 20]
      --schema-auto-cast[=SCHEMA-AUTO-CAST]                When set Flow will try to automatically cast values to more precise data types, for example datetime strings will be casted to datetime type [default: false]
      --config=CONFIG                                      Path to a local php file that MUST return instance of: Flow\ETL\Config
      --input-json-pointer=INPUT-JSON-POINTER              JSON Pointer to a subtree from which schema should be extracted
      --input-json-pointer-entry-name                      When set, JSON Pointer will be used as an entry name in the schema
      --input-csv-header[=INPUT-CSV-HEADER]                When set, CSV header will be used as a schema
      --input-csv-empty-to-null[=INPUT-CSV-EMPTY-TO-NULL]  When set, empty CSV values will be treated as NULL values
      --input-csv-separator=INPUT-CSV-SEPARATOR            CSV separator character
      --input-csv-enclosure=INPUT-CSV-ENCLOSURE            CSV enclosure character
      --input-csv-escape=INPUT-CSV-ESCAPE                  CSV escape character
      --input-xml-node-path=INPUT-XML-NODE-PATH            XML node path to a subtree from which schema should be extracted, for example /root/element This is not xpath, just a node names separated by slash
      --input-xml-buffer-size=INPUT-XML-BUFFER-SIZE        XML buffer size in bytes
      --input-parquet-columns=INPUT-PARQUET-COLUMNS        Columns to read from parquet file (multiple values allowed)
      --input-parquet-offset=INPUT-PARQUET-OFFSET          Offset to start reading from
  -h, --help                                               Display help for the given command. When no command is given display help for the list command
  -q, --quiet                                              Do not output any message
  -V, --version                                            Display this application version
      --ansi|--no-ansi                                     Force (or disable --no-ansi) ANSI output
  -n, --no-interaction                                     Do not ask any interactive question
  -v|vv|vvv, --verbose                                     Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

file:rows:count alis count

$ flow count --help
Description:
  Read data schema from a file.

Usage:
  file:rows:count [options] [--] <input-file>
  count

Arguments:
  input-file                                               Path to a file from which schema should be extracted.

Options:
      --input-file-format=INPUT-FILE-FORMAT                Source file format. When not set file format is guessed from source file path extension
      --input-file-limit=INPUT-FILE-LIMIT                  Limit number of rows that are going to be used to infer file schema, when not set whole file is analyzed
      --config=CONFIG                                      Path to a local php file that MUST return instance of: Flow\ETL\Config
      --input-json-pointer=INPUT-JSON-POINTER              JSON Pointer to a subtree from which schema should be extracted
      --input-json-pointer-entry-name                      When set, JSON Pointer will be used as an entry name in the schema
      --input-csv-header[=INPUT-CSV-HEADER]                When set, CSV header will be used as a schema
      --input-csv-empty-to-null[=INPUT-CSV-EMPTY-TO-NULL]  When set, empty CSV values will be treated as NULL values
      --input-csv-separator=INPUT-CSV-SEPARATOR            CSV separator character
      --input-csv-enclosure=INPUT-CSV-ENCLOSURE            CSV enclosure character
      --input-csv-escape=INPUT-CSV-ESCAPE                  CSV escape character
      --input-xml-node-path=INPUT-XML-NODE-PATH            XML node path to a subtree from which schema should be extracted, for example /root/element This is not xpath, just a node names separated by slash
      --input-xml-buffer-size=INPUT-XML-BUFFER-SIZE        XML buffer size in bytes
      --input-parquet-columns=INPUT-PARQUET-COLUMNS        Columns to read from parquet file (multiple values allowed)
      --input-parquet-offset=INPUT-PARQUET-OFFSET          Offset to start reading from
  -h, --help                                               Display help for the given command. When no command is given display help for the list command
  -q, --quiet                                              Do not output any message
  -V, --version                                            Display this application version
      --ansi|--no-ansi                                     Force (or disable --no-ansi) ANSI output
  -n, --no-interaction                                     Do not ask any interactive question
  -v|vv|vvv, --verbose                                     Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

parquet:read:metadata

$ flow parquet:read:metadata --help
Description:
  Read metadata from parquet file

Usage:
  parquet:read:metadata [options] [--] <file>

Arguments:
  file                  path to parquet file

Options:
      --columns         Display column details
      --row-groups      Display row group details
      --column-chunks   Display column chunks details
      --statistics      Display column chunks statistics details
      --page-headers    Display page headers details
  -h, --help            Display help for the given command. When no command is given display help for the list command
  -q, --quiet           Do not output any message
  -V, --version         Display this application version
      --ansi|--no-ansi  Force (or disable --no-ansi) ANSI output
  -n, --no-interaction  Do not ask any interactive question
  -v|vv|vvv, --verbose  Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

pipeline:run

$ flow pipeline:run --help
Description:
  Execute ETL pipeline from a php/json file.

Usage:
  pipeline:run [options] [--] <pipeline-file>
  run

Arguments:
  pipeline-file          Path to a php/json with DataFrame definition.

Options:
      --analyze=ANALYZE  Collect processing statistics and print them. [default: false]
      --config=CONFIG Path to a local php file that MUST return instance of: Flow\ETL\Config      
  -h, --help             Display help for the given command. When no command is given display help for the list command
  -q, --quiet            Do not output any message
  -V, --version          Display this application version
      --ansi|--no-ansi   Force (or disable --no-ansi) ANSI output
  -n, --no-interaction   Do not ask any interactive question
  -v|vv|vvv, --verbose   Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

Help:
  pipeline-file argument must point to a valid php file that returns DataFrame instance.
  Make sure to not execute run() or any other trigger function.
  
  Example of pipeline.php:
  <?php
  return df()
      ->read(from_array([
          ['id' => 1, 'name' => 'User 01', 'active' => true],
          ['id' => 2, 'name' => 'User 02', 'active' => false],
          ['id' => 3, 'name' => 'User 03', 'active' => true],
      ]))
      ->collect()
      ->write(to_output());

Contributors

Join us on GitHub external resource
scroll back to top