Flow PHP

ColumnIndex

Optional statistics for each data page in a ColumnChunk.

Forms part the page index, along with OffsetIndex.

If this structure is present, OffsetIndex must also be present.

For each field in this structure, [i] refers to the page at OffsetIndex.page_locations[i]

Properties

$_TSPEC  : mixed
$boundary_order  : int
Stores whether both min_values and max_values are ordered and if so, in which direction. This allows readers to perform binary searches in both lists. Readers cannot assume that max_values[i] <= min_values[i+1], even if the lists are ordered.
$definition_level_histograms  : array<string|int, int>
Same as repetition_level_histograms except for definitions levels.
$isValidate  : mixed
$max_values  : array<string|int, string>
$min_values  : array<string|int, string>
Two lists containing lower and upper bounds for the values of each page determined by the ColumnOrder of the column. These may be the actual minimum and maximum values found on a page, but can also be (more compact) values that do not exist on a page. For example, instead of storing ""Blart Versenwald III", a writer may set min_values[i]="B", max_values[i]="C".
$null_counts  : array<string|int, int>
A list containing the number of null values for each page.
$null_pages  : array<string|int, bool>
A list of Boolean values to determine the validity of the corresponding min and max values. If true, a page contains only null values, and writers have to set the corresponding entries in min_values and max_values to byte[0], so that all lists have the same length. If false, the corresponding entries in min_values and max_values must be valid.
$repetition_level_histograms  : array<string|int, int>
Contains repetition level histograms for each page concatenated together. The repetition_level_histogram field on SizeStatistics contains more details.

Methods

__construct()  : mixed
getName()  : mixed
read()  : mixed
write()  : mixed

Properties

$_TSPEC

public static mixed $_TSPEC = [1 => ['var' => 'null_pages', 'isRequired' => true, 'type' => \Thrift\Type\TType::LST, 'etype' => \Thrift\Type\TType::BOOL, 'elem' => ['type' => \Thrift\Type\TType::BOOL]], 2 => ['var' => 'min_values', 'isRequired' => true, 'type' => \Thrift\Type\TType::LST, 'etype' => \Thrift\Type\TType::STRING, 'elem' => ['type' => \Thrift\Type\TType::STRING]], 3 => ['var' => 'max_values', 'isRequired' => true, 'type' => \Thrift\Type\TType::LST, 'etype' => \Thrift\Type\TType::STRING, 'elem' => ['type' => \Thrift\Type\TType::STRING]], 4 => ['var' => 'boundary_order', 'isRequired' => true, 'type' => \Thrift\Type\TType::I32, 'class' => '\Flow\Parquet\Thrift\BoundaryOrder'], 5 => ['var' => 'null_counts', 'isRequired' => false, 'type' => \Thrift\Type\TType::LST, 'etype' => \Thrift\Type\TType::I64, 'elem' => ['type' => \Thrift\Type\TType::I64]], 6 => ['var' => 'repetition_level_histograms', 'isRequired' => false, 'type' => \Thrift\Type\TType::LST, 'etype' => \Thrift\Type\TType::I64, 'elem' => ['type' => \Thrift\Type\TType::I64]], 7 => ['var' => 'definition_level_histograms', 'isRequired' => false, 'type' => \Thrift\Type\TType::LST, 'etype' => \Thrift\Type\TType::I64, 'elem' => ['type' => \Thrift\Type\TType::I64]]]

$boundary_order

Stores whether both min_values and max_values are ordered and if so, in which direction. This allows readers to perform binary searches in both lists. Readers cannot assume that max_values[i] <= min_values[i+1], even if the lists are ordered.

public int $boundary_order

$definition_level_histograms

Same as repetition_level_histograms except for definitions levels.

public array<string|int, int> $definition_level_histograms

$max_values

public array<string|int, string> $max_values

$min_values

Two lists containing lower and upper bounds for the values of each page determined by the ColumnOrder of the column. These may be the actual minimum and maximum values found on a page, but can also be (more compact) values that do not exist on a page. For example, instead of storing ""Blart Versenwald III", a writer may set min_values[i]="B", max_values[i]="C".

public array<string|int, string> $min_values

Such more compact values must still be valid values within the column's logical type. Readers must make sure that list entries are populated before using them by inspecting null_pages.

$null_counts

A list containing the number of null values for each page.

public array<string|int, int> $null_counts

Writers SHOULD always write this field even if no null values are present or the column is not nullable. Readers MUST distinguish between null_counts not being present and null_count being 0. If null_counts are not present, readers MUST NOT assume all null counts are 0.

$null_pages

A list of Boolean values to determine the validity of the corresponding min and max values. If true, a page contains only null values, and writers have to set the corresponding entries in min_values and max_values to byte[0], so that all lists have the same length. If false, the corresponding entries in min_values and max_values must be valid.

public array<string|int, bool> $null_pages

$repetition_level_histograms

Contains repetition level histograms for each page concatenated together. The repetition_level_histogram field on SizeStatistics contains more details.

public array<string|int, int> $repetition_level_histograms

When present the length should always be (number of pages * (max_repetition_level + 1)) elements.

Element 0 is the first element of the histogram for the first page. Element (max_repetition_level + 1) is the first element of the histogram for the second page.

Methods

__construct()

public __construct([mixed $vals = null ]) : mixed
Parameters
$vals : mixed = null

read()

public read(mixed $input) : mixed
Parameters
$input : mixed

write()

public write(mixed $output) : mixed
Parameters
$output : mixed

        
On this page

Search results