Option
Cases
- BROTLI_COMPRESSION_LEVEL
- Compression level for Brotli codec. This option is going to be passed to gzcompress function when Compression is set to Brotli.
- BYTE_ARRAY_TO_STRING
- Some parquet writers might not properly use LogicalTyp for storing Strings or JSON's.
- DICTIONARY_PAGE_MIN_CARDINALITY_RATION
- Whenever cardinality ratio of the dictionary goes below this value, PagesBuilders is going to fallback to PLAIN encoding.
- DICTIONARY_PAGE_SIZE
- Whenever size of the dictionary goes above this value, PagesBuilders is going to fallback to PLAIN encoding.
- GZIP_COMPRESSION_LEVEL
- Compression level for GZIP codec. This option is going to be passed to gzcompress function when Compression is set to GZIP.
- INT_96_AS_DATETIME
- When this option is set to true, reader will try to convert INT96 logical type to DateTimeImmutable object.
- LZ4_COMPRESSION_LEVEL
- Compression level for LZ4 codec. This option is going to be passed to lz4_compress function when Compression is set to LZ4.
- PAGE_SIZE_BYTES
- PageBuilder is going to use this value to determine how many rows should be stored in one page.
- ROUND_NANOSECONDS
- Since PHP does not support nanoseconds precision for DateTime objects, when this options is set to true, reader will round nanoseconds to microseconds.
- ROW_GROUP_SIZE_BYTES
- RowGroupBuilder is going to use this value to determine for how long it should keep adding rows to the buffer before flushing it on disk.
- ROW_GROUP_SIZE_CHECK_INTERVAL
- RowGroupBuilder is going to use this value to determine how often it should check if RowGroup size is not exceeded.
- VALIDATE_DATA
- This option is going to tell the writer to validate data against schema.
- WRITER_VERSION
- There are only two available versions of Parquet format: V1 and V2.
- ZSTD_COMPRESSION_LEVEL
- Compression level for ZSTD codec. This option is going to be passed to zstd_compress function when Compression is set to ZSTD.
Cases
BROTLI_COMPRESSION_LEVEL
Compression level for Brotli codec. This option is going to be passed to gzcompress function when Compression is set to Brotli.
The higher the quality, the slower the compression.
Default value is 11 (BROTLI_COMPRESS_LEVEL_DEFAULT)
BYTE_ARRAY_TO_STRING
Some parquet writers might not properly use LogicalTyp for storing Strings or JSON's.
This option would tell the reader to treat all BYTE_ARRAY's as UTF-8 strings.
Default value is true;
DICTIONARY_PAGE_MIN_CARDINALITY_RATION
Whenever cardinality ratio of the dictionary goes below this value, PagesBuilders is going to fallback to PLAIN encoding.
Cardinality ration is calculated as distinct values / total values. Please notice that even when cardinality ration is above this value, PageBuilder will still fallback to PLAIN encoding when dictionary size gets above DICTIONARY_PAGE_SIZE.
Default value 0.4 (40% of the total values is distinct)
DICTIONARY_PAGE_SIZE
Whenever size of the dictionary goes above this value, PagesBuilders is going to fallback to PLAIN encoding.
Default value is 1Mb
GZIP_COMPRESSION_LEVEL
Compression level for GZIP codec. This option is going to be passed to gzcompress function when Compression is set to GZIP.
Lower level means faster compression, but bigger file size.
Default value is 9
INT_96_AS_DATETIME
When this option is set to true, reader will try to convert INT96 logical type to DateTimeImmutable object.
Some parquet writers due to historical reasons might still use INT96 to store timestamps with nanoseconds precision instead of using TIMESTAMP logical type. Since PHP does not support nanoseconds precision for DateTime objects, when this options is set to true, reader will round nanoseconds to microseconds.
INT96 in general is not supported anymore, this option should be set to true by default, otherwise it will return array of bytes (12) that represents INT96.
Default value is true
LZ4_COMPRESSION_LEVEL
Compression level for LZ4 codec. This option is going to be passed to lz4_compress function when Compression is set to LZ4.
The level of compression (1-12, Recommended values are between 4 and 9).
Default value is 4
PAGE_SIZE_BYTES
PageBuilder is going to use this value to determine how many rows should be stored in one page.
PageBuilder is not going to make it precisely equal to this value, but it will try to make it as close as possible. This should be considered as a threshold rather than a strict value.
Default value is 8Kb
https://parquet.apache.org/docs/file-format/configurations/#data-page--size
ROUND_NANOSECONDS
Since PHP does not support nanoseconds precision for DateTime objects, when this options is set to true, reader will round nanoseconds to microseconds.
Default value is false
ROW_GROUP_SIZE_BYTES
RowGroupBuilder is going to use this value to determine for how long it should keep adding rows to the buffer before flushing it on disk.
Default value is 8Mb
In order to be more aligned with apache spark and hadoop, this value should be set between 128 and 512Mb. https://parquet.apache.org/docs/file-format/configurations/#row-group-size
ROW_GROUP_SIZE_CHECK_INTERVAL
RowGroupBuilder is going to use this value to determine how often it should check if RowGroup size is not exceeded.
This is a performance optimization, since checking RowGroup size is a costly operation. If the value is set to 1000, RowGroupBuilder is going to check the size only after adding 1000 rows to the buffer.
Default value is 1000
VALIDATE_DATA
This option is going to tell the writer to validate data against schema.
In most cases that should be enabled, however if performance is critical, it can be disabled.
WRITER_VERSION
There are only two available versions of Parquet format: V1 and V2.
This option is going to tell the writer which version should be used to create DataPages.
- 1 will use DataPage
- 2 will use DataPageV2.
Default 1
ZSTD_COMPRESSION_LEVEL
Compression level for ZSTD codec. This option is going to be passed to zstd_compress function when Compression is set to ZSTD.
A value smaller than 0 means a faster compression level. (Zstandard library 1.3.4 or later).
Default value is 3