SizeStatistics
A structure for capturing metadata for estimating the unencoded, uncompressed size of data written. This is useful for readers to estimate how much memory is needed to reconstruct data in their memory model and for fine grained filter pushdown on nested structures (the histograms contained in this structure can help determine the number of nulls at a particular nesting level and maximum length of lists).
Properties
- $_TSPEC : mixed
- $definition_level_histogram : array<string|int, int>
- Same as repetition_level_histogram except for definition levels.
- $isValidate : mixed
- $repetition_level_histogram : array<string|int, int>
- When present, there is expected to be one element corresponding to each repetition (i.e. size=max repetition_level+1) where each element represents the number of times the repetition level was observed in the data.
- $unencoded_byte_array_data_bytes : int
- The number of physical bytes stored for BYTE_ARRAY data values assuming no encoding. This is exclusive of the bytes needed to store the length of each byte array. In other words, this field is equivalent to the `(size of PLAIN-ENCODING the byte array values) - (4 bytes * number of values written)`. To determine unencoded sizes of other types readers can use schema information multiplied by the number of non-null and null values.
Methods
- __construct() : mixed
- getName() : mixed
- read() : mixed
- write() : mixed
Properties
$_TSPEC
public
static mixed
$_TSPEC
= [1 => ['var' => 'unencoded_byte_array_data_bytes', 'isRequired' => false, 'type' => \Thrift\Type\TType::I64], 2 => ['var' => 'repetition_level_histogram', 'isRequired' => false, 'type' => \Thrift\Type\TType::LST, 'etype' => \Thrift\Type\TType::I64, 'elem' => ['type' => \Thrift\Type\TType::I64]], 3 => ['var' => 'definition_level_histogram', 'isRequired' => false, 'type' => \Thrift\Type\TType::LST, 'etype' => \Thrift\Type\TType::I64, 'elem' => ['type' => \Thrift\Type\TType::I64]]]
$definition_level_histogram
Same as repetition_level_histogram except for definition levels.
public
array<string|int, int>
$definition_level_histogram
This field may be omitted if max_definition_level is 0 or 1 without loss of information.
$isValidate
public
static mixed
$isValidate
= false
$repetition_level_histogram
When present, there is expected to be one element corresponding to each repetition (i.e. size=max repetition_level+1) where each element represents the number of times the repetition level was observed in the data.
public
array<string|int, int>
$repetition_level_histogram
This field may be omitted if max_repetition_level is 0 without loss of information.
$unencoded_byte_array_data_bytes
The number of physical bytes stored for BYTE_ARRAY data values assuming no encoding. This is exclusive of the bytes needed to store the length of each byte array. In other words, this field is equivalent to the `(size of PLAIN-ENCODING the byte array values) - (4 bytes * number of values written)`. To determine unencoded sizes of other types readers can use schema information multiplied by the number of non-null and null values.
public
int
$unencoded_byte_array_data_bytes
The number of null/non-null values can be inferred from the histograms below.
For example, if a column chunk is dictionary-encoded with dictionary ["a", "bc", "cde"], and a data page contains the indices [0, 0, 1, 2], then this value for that data page should be 7 (1 + 1 + 2 + 3).
This field should only be set for types that use BYTE_ARRAY as their physical type.
Methods
__construct()
public
__construct([mixed $vals = null ]) : mixed
Parameters
- $vals : mixed = null
getName()
public
getName() : mixed
read()
public
read(mixed $input) : mixed
Parameters
- $input : mixed
write()
public
write(mixed $output) : mixed
Parameters
- $output : mixed