Flow PHP

Option

Cases

BROTLI_COMPRESSION_LEVEL
Compression level for Brotli codec. This option is going to be passed to gzcompress function when Compression is set to Brotli.
BYTE_ARRAY_TO_STRING
Some parquet writers might not properly use LogicalTyp for storing Strings or JSON's.
DICTIONARY_PAGE_MIN_CARDINALITY_RATION
Whenever cardinality ratio of the dictionary goes below this value, PagesBuilders is going to fallback to PLAIN encoding.
DICTIONARY_PAGE_SIZE
Whenever size of the dictionary goes above this value, PagesBuilders is going to fallback to PLAIN encoding.
GZIP_COMPRESSION_LEVEL
Compression level for GZIP codec. This option is going to be passed to gzcompress function when Compression is set to GZIP.
INT_96_AS_DATETIME
When this option is set to true, reader will try to convert INT96 logical type to DateTimeImmutable object.
LZ4_COMPRESSION_LEVEL
Compression level for LZ4 codec. This option is going to be passed to lz4_compress function when Compression is set to LZ4.
PAGE_SIZE_BYTES
PageBuilder is going to use this value to determine how many rows should be stored in one page.
ROUND_NANOSECONDS
Since PHP does not support nanoseconds precision for DateTime objects, when this options is set to true, reader will round nanoseconds to microseconds.
ROW_GROUP_SIZE_BYTES
RowGroupBuilder is going to use this value to determine for how long it should keep adding rows to the buffer before flushing it on disk.
ROW_GROUP_SIZE_CHECK_INTERVAL
RowGroupBuilder is going to use this value to determine how often it should check if RowGroup size is not exceeded.
VALIDATE_DATA
This option is going to tell the writer to validate data against schema.
WRITER_VERSION
There are only two available versions of Parquet format: V1 and V2.
ZSTD_COMPRESSION_LEVEL
Compression level for ZSTD codec. This option is going to be passed to zstd_compress function when Compression is set to ZSTD.

Cases

BROTLI_COMPRESSION_LEVEL

Compression level for Brotli codec. This option is going to be passed to gzcompress function when Compression is set to Brotli.

The higher the quality, the slower the compression.

Default value is 11 (BROTLI_COMPRESS_LEVEL_DEFAULT)

BYTE_ARRAY_TO_STRING

Some parquet writers might not properly use LogicalTyp for storing Strings or JSON's.

This option would tell the reader to treat all BYTE_ARRAY's as UTF-8 strings.

Default value is true;

DICTIONARY_PAGE_MIN_CARDINALITY_RATION

Whenever cardinality ratio of the dictionary goes below this value, PagesBuilders is going to fallback to PLAIN encoding.

Cardinality ration is calculated as distinct values / total values. Please notice that even when cardinality ration is above this value, PageBuilder will still fallback to PLAIN encoding when dictionary size gets above DICTIONARY_PAGE_SIZE.

Default value 0.4 (40% of the total values is distinct)

DICTIONARY_PAGE_SIZE

Whenever size of the dictionary goes above this value, PagesBuilders is going to fallback to PLAIN encoding.

Default value is 1Mb

GZIP_COMPRESSION_LEVEL

Compression level for GZIP codec. This option is going to be passed to gzcompress function when Compression is set to GZIP.

Lower level means faster compression, but bigger file size.

Default value is 9

INT_96_AS_DATETIME

When this option is set to true, reader will try to convert INT96 logical type to DateTimeImmutable object.

Some parquet writers due to historical reasons might still use INT96 to store timestamps with nanoseconds precision instead of using TIMESTAMP logical type. Since PHP does not support nanoseconds precision for DateTime objects, when this options is set to true, reader will round nanoseconds to microseconds.

INT96 in general is not supported anymore, this option should be set to true by default, otherwise it will return array of bytes (12) that represents INT96.

Default value is true

LZ4_COMPRESSION_LEVEL

Compression level for LZ4 codec. This option is going to be passed to lz4_compress function when Compression is set to LZ4.

The level of compression (1-12, Recommended values are between 4 and 9).

Default value is 4

PAGE_SIZE_BYTES

PageBuilder is going to use this value to determine how many rows should be stored in one page.

PageBuilder is not going to make it precisely equal to this value, but it will try to make it as close as possible. This should be considered as a threshold rather than a strict value.

Default value is 8Kb

https://parquet.apache.org/docs/file-format/configurations/#data-page--size

ROUND_NANOSECONDS

Since PHP does not support nanoseconds precision for DateTime objects, when this options is set to true, reader will round nanoseconds to microseconds.

Default value is false

ROW_GROUP_SIZE_BYTES

RowGroupBuilder is going to use this value to determine for how long it should keep adding rows to the buffer before flushing it on disk.

Default value is 8Mb

In order to be more aligned with apache spark and hadoop, this value should be set between 128 and 512Mb. https://parquet.apache.org/docs/file-format/configurations/#row-group-size

ROW_GROUP_SIZE_CHECK_INTERVAL

RowGroupBuilder is going to use this value to determine how often it should check if RowGroup size is not exceeded.

This is a performance optimization, since checking RowGroup size is a costly operation. If the value is set to 1000, RowGroupBuilder is going to check the size only after adding 1000 rows to the buffer.

Default value is 1000

VALIDATE_DATA

This option is going to tell the writer to validate data against schema.

In most cases that should be enabled, however if performance is critical, it can be disabled.

WRITER_VERSION

There are only two available versions of Parquet format: V1 and V2.

This option is going to tell the writer which version should be used to create DataPages.

  • 1 will use DataPage
  • 2 will use DataPageV2.

Default 1

ZSTD_COMPRESSION_LEVEL

Compression level for ZSTD codec. This option is going to be passed to zstd_compress function when Compression is set to ZSTD.

A value smaller than 0 means a faster compression level. (Zstandard library 1.3.4 or later).

Default value is 3


        
On this page

Search results