DataFrame
Methods
- __construct() : mixed
- aggregate() : self
- autoCast() : self
- batchSize() : self
- Merge/Split Rows yielded by Extractor into batches of given size.
- cache() : self
- Start processing rows up to this moment and put each instance of Rows into previously defined cache.
- collect() : self
- Before transforming rows, collect them and merge into single Rows instance.
- collectRefs() : self
- This method allows to collect references to all entries used in this pipeline.
- constrain() : self
- count() : int
- crossJoin() : self
- display() : string
- drop() : self
- Drop given entries.
- dropDuplicates() : $this
- dropPartitions() : self
- Drop all partitions from Rows, additionally when $dropPartitionColumns is set to true, partition columns are also removed.
- duplicateRow() : self
- fetch() : Rows
- Be aware that fetch is not memory safe and will load all rows into memory.
- filter() : self
- filterPartitions() : self
- filters() : self
- forEach() : void
- get() : Generator<string|int, Rows>
- Yields each row as an instance of Rows.
- getAsArray() : Generator<string|int, array<string|int, array<string|int, mixed>>>
- Yields each row as an array.
- getEach() : Generator<string|int, Row>
- Yield each row as an instance of Row.
- getEachAsArray() : Generator<string|int, array<string|int, mixed>>
- Yield each row as an array.
- groupBy() : GroupedDataFrame
- join() : self
- joinEach() : self
- limit() : self
- load() : self
- map() : self
- match() : self
- mode() : $this
- SaveMode defines how Flow should behave when writing to a file/files that already exists.
- onError() : self
- partitionBy() : self
- pivot() : self
- printRows() : void
- printSchema() : void
- rename() : self
- renameAll() : self
- renameAllLowerCase() : self
- renameAllStyle() : self
- renameAllUpperCase() : self
- renameAllUpperCaseFirst() : self
- renameAllUpperCaseWord() : self
- reorderEntries() : self
- rows() : self
- run() : mixed
- saveMode() : self
- Alias for DataFrame::mode.
- schema() : Schema
- select() : self
- sortBy() : self
- transform() : self
- Alias for DataFrame::with().
- until() : self
- The difference between filter and until is that filter will keep filtering rows until extractors finish yielding rows.
- validate() : self
- void() : self
- with() : self
- withEntries() : self
- withEntry() : self
- write() : self
Methods
__construct()
public
__construct(Pipeline $pipeline, Config|FlowContext $context) : mixed
Parameters
- $pipeline : Pipeline
- $context : Config|FlowContext
aggregate()
public
aggregate(AggregatingFunction ...$aggregations) : self
Parameters
- $aggregations : AggregatingFunction
Tags
Return values
selfautoCast()
public
autoCast() : self
Return values
selfbatchSize()
Merge/Split Rows yielded by Extractor into batches of given size.
public
batchSize(int<1, max> $size) : self
For example, when Extractor is yielding one row at time, this method will merge them into batches of given size before passing them to the next pipeline element. Similarly when Extractor is yielding batches of rows, this method will split them into smaller batches of given size.
In order to merge all Rows into a single batch use DataFrame::collect() method or set size to -1 or 0.
Parameters
- $size : int<1, max>
Tags
Return values
selfcache()
Start processing rows up to this moment and put each instance of Rows into previously defined cache.
public
cache([null|string $id = null ][, int|null $cacheBatchSize = null ]) : self
Cache type can be set through ConfigBuilder. By default everything is cached in system tmp dir.
Important: cache batch size might significantly improve performance when processing large amount of rows. Larger batch size will increase memory consumption but will reduce number of IO operations. When not set, the batch size is taken from the last DataFrame::batchSize() call.
Parameters
- $id : null|string = null
- $cacheBatchSize : int|null = null
Tags
Return values
selfcollect()
Before transforming rows, collect them and merge into single Rows instance.
public
collect() : self
This might lead to memory issues when processing large amount of rows, use with caution.
Tags
Return values
selfcollectRefs()
This method allows to collect references to all entries used in this pipeline.
public
collectRefs(References $references) : self
(new Flow())
->read(From::chain())
->collectRefs($refs = refs())
->run();
Parameters
- $references : References
Tags
Return values
selfconstrain()
public
constrain(Constraint $constraint, Constraint ...$constraints) : self
Parameters
- $constraint : Constraint
- $constraints : Constraint
Return values
selfcount()
public
count() : int
Tags
Return values
intcrossJoin()
public
crossJoin(self $dataFrame[, string $prefix = '' ]) : self
Parameters
- $dataFrame : self
- $prefix : string = ''
Tags
Return values
selfdisplay()
public
display([int $limit = 20 ][, bool|int $truncate = 20 ][, Formatter $formatter = new AsciiTableFormatter() ]) : string
Parameters
- $limit : int = 20
-
maximum numbers of rows to display
- $truncate : bool|int = 20
-
false or if set to 0 columns are not truncated, otherwise default truncate to 20 characters
- $formatter : Formatter = new AsciiTableFormatter()
Tags
Return values
stringdrop()
Drop given entries.
public
drop(string|Reference ...$entries) : self
Parameters
- $entries : string|Reference
Tags
Return values
selfdropDuplicates()
public
dropDuplicates(Reference|string ...$entries) : $this
Parameters
- $entries : Reference|string
Tags
Return values
$thisdropPartitions()
Drop all partitions from Rows, additionally when $dropPartitionColumns is set to true, partition columns are also removed.
public
dropPartitions([bool $dropPartitionColumns = false ]) : self
Parameters
- $dropPartitionColumns : bool = false
Tags
Return values
selfduplicateRow()
public
duplicateRow(mixed $condition, WithEntry ...$entries) : self
Parameters
- $condition : mixed
- $entries : WithEntry
Return values
selffetch()
Be aware that fetch is not memory safe and will load all rows into memory.
public
fetch([int|null $limit = null ]) : Rows
If you want to safely iterate over Rows use oe of the following methods:.
DataFrame::get() : \Generator DataFrame::getAsArray() : \Generator DataFrame::getEach() : \Generator DataFrame::getEachAsArray() : \Generator
Parameters
- $limit : int|null = null
Tags
Return values
Rowsfilter()
public
filter(ScalarFunction $function) : self
Parameters
- $function : ScalarFunction
Tags
Return values
selffilterPartitions()
public
filterPartitions(Filter|ScalarFunction $filter) : self
Parameters
- $filter : Filter|ScalarFunction
Tags
Return values
selffilters()
public
filters(array<string|int, ScalarFunction> $functions) : self
Parameters
- $functions : array<string|int, ScalarFunction>
Tags
Return values
selfforEach()
public
forEach([null|callable(Rows $rows): void $callback = null ]) : void
Parameters
- $callback : null|callable(Rows $rows): void = null
Tags
get()
Yields each row as an instance of Rows.
public
get() : Generator<string|int, Rows>
Tags
Return values
Generator<string|int, Rows>getAsArray()
Yields each row as an array.
public
getAsArray() : Generator<string|int, array<string|int, array<string|int, mixed>>>
Tags
Return values
Generator<string|int, array<string|int, array<string|int, mixed>>>getEach()
Yield each row as an instance of Row.
public
getEach() : Generator<string|int, Row>
Tags
Return values
Generator<string|int, Row>getEachAsArray()
Yield each row as an array.
public
getEachAsArray() : Generator<string|int, array<string|int, mixed>>
Tags
Return values
Generator<string|int, array<string|int, mixed>>groupBy()
public
groupBy(string|Reference ...$entries) : GroupedDataFrame
Parameters
- $entries : string|Reference
Tags
Return values
GroupedDataFramejoin()
public
join(self $dataFrame, Expression $on[, string|Join $type = Join::left ]) : self
Parameters
- $dataFrame : self
- $on : Expression
- $type : string|Join = Join::left
Tags
Return values
selfjoinEach()
public
joinEach(DataFrameFactory $factory, Expression $on[, string|Join $type = Join::left ]) : self
Parameters
- $factory : DataFrameFactory
- $on : Expression
- $type : string|Join = Join::left
Tags
Return values
selflimit()
public
limit(int|null $limit) : self
Parameters
- $limit : int|null
Tags
Return values
selfload()
public
load(Loader $loader) : self
Parameters
- $loader : Loader
Tags
Return values
selfmap()
public
map(callable(Row $row): Row $callback) : self
Parameters
Tags
Return values
selfmatch()
public
match(Schema $schema[, null|SchemaValidator $validator = null ]) : self
Parameters
- $schema : Schema
- $validator : null|SchemaValidator = null
-
- when null, StrictValidator gets initialized
Tags
Return values
selfmode()
SaveMode defines how Flow should behave when writing to a file/files that already exists.
public
mode(SaveMode $mode) : $this
For more details please see SaveMode enum.
Parameters
- $mode : SaveMode
Tags
Return values
$thisonError()
public
onError(ErrorHandler $handler) : self
Parameters
- $handler : ErrorHandler
Tags
Return values
selfpartitionBy()
public
partitionBy(string|Reference $entry, string|Reference ...$entries) : self
Parameters
Tags
Return values
selfpivot()
public
pivot(Reference $ref) : self
Parameters
- $ref : Reference
Return values
selfprintRows()
public
printRows([int|null $limit = 20 ][, int|bool $truncate = 20 ][, Formatter $formatter = new AsciiTableFormatter() ]) : void
Parameters
- $limit : int|null = 20
- $truncate : int|bool = 20
- $formatter : Formatter = new AsciiTableFormatter()
Tags
printSchema()
public
printSchema([int|null $limit = 20 ][, SchemaFormatter $formatter = new ASCIISchemaFormatter() ]) : void
Parameters
- $limit : int|null = 20
- $formatter : SchemaFormatter = new ASCIISchemaFormatter()
Tags
rename()
public
rename(string $from, string $to) : self
Parameters
- $from : string
- $to : string
Tags
Return values
selfrenameAll()
public
renameAll(string $search, string $replace) : self
Parameters
- $search : string
- $replace : string
Tags
Return values
selfrenameAllLowerCase()
public
renameAllLowerCase() : self
Tags
Return values
selfrenameAllStyle()
public
renameAllStyle(StringStyles|string $style) : self
Parameters
- $style : StringStyles|string
Tags
Return values
selfrenameAllUpperCase()
public
renameAllUpperCase() : self
Tags
Return values
selfrenameAllUpperCaseFirst()
public
renameAllUpperCaseFirst() : self
Tags
Return values
selfrenameAllUpperCaseWord()
public
renameAllUpperCaseWord() : self
Tags
Return values
selfreorderEntries()
public
reorderEntries([Comparator $comparator = new TypeComparator() ]) : self
Parameters
- $comparator : Comparator = new TypeComparator()
Return values
selfrows()
public
rows(Transformer|Transformation $transformer) : self
Parameters
- $transformer : Transformer|Transformation
Tags
Return values
selfrun()
public
run([null|callable(Rows $rows, FlowContext $context): void $callback = null ][, Analyze|bool $analyze = false ]) : mixed
Parameters
- $callback : null|callable(Rows $rows, FlowContext $context): void = null
- $analyze : Analyze|bool = false
-
- when set run will return Report
Tags
saveMode()
Alias for DataFrame::mode.
public
saveMode(SaveMode $mode) : self
Parameters
- $mode : SaveMode
Tags
Return values
selfschema()
public
schema() : Schema
Tags
Return values
Schemaselect()
public
select(string|Reference ...$entries) : self
Parameters
- $entries : string|Reference
Tags
Return values
selfsortBy()
public
sortBy(Reference ...$entries) : self
Parameters
- $entries : Reference
Tags
Return values
selftransform()
Alias for DataFrame::with().
public
transform(Transformer|Transformation|Transformations|WithEntry $transformer) : self
Parameters
- $transformer : Transformer|Transformation|Transformations|WithEntry
Tags
Return values
selfuntil()
The difference between filter and until is that filter will keep filtering rows until extractors finish yielding rows.
public
until(ScalarFunction $function) : self
Until will send a STOP signal to the Extractor when the condition is not met.
Parameters
- $function : ScalarFunction
Tags
Return values
selfvalidate()
public
validate(Schema $schema[, null|SchemaValidator $validator = null ]) : self
Please use DataFrame::match instead
Parameters
- $schema : Schema
- $validator : null|SchemaValidator = null
-
- when null, StrictValidator gets initialized
Tags
Return values
selfvoid()
public
void() : self
Tags
Return values
selfwith()
public
with(Transformer|Transformation|Transformations|WithEntry $transformer) : self
Parameters
- $transformer : Transformer|Transformation|Transformations|WithEntry
Tags
Return values
selfwithEntries()
public
withEntries(array<string, ScalarFunction|WindowFunction|WithEntry> $references) : self
Parameters
- $references : array<string, ScalarFunction|WindowFunction|WithEntry>
Tags
Return values
selfwithEntry()
public
withEntry(string|Definition $entry, ScalarFunction|WindowFunction $reference) : self
Parameters
- $entry : string|Definition
- $reference : ScalarFunction|WindowFunction
Tags
Return values
selfwrite()
public
write(Loader $loader) : self
Parameters
- $loader : Loader