flow php

UNIFIED DATA PROCESSING FRAMEWORK

composer require flow-php/etl ^0.9.2

Changelog

elephant
extract

Extracts

Read from various data sources.

arrow
transform

Transforms

Shape and optimize for your needs.

arrow
load

Loads

Store and secure in one of many available data sinks.

Scalar Functions


Scalar Functions are the most basic building blocks of Flow. These functions are used in all operations that occur at the level of a single cell.

List of all available scalar functions can be found here.

If we wanted to imagine how Flow works, the easiest way is to think of a spreadsheet. After reading a portion of data from the source, Flow transforms it into a structure of rows, columns, and cells.

Once the read batch of data is converted into a tabular form, Flow proceeds to process the data row by row, column by column, cell by cell.

Scalar functions are called for each cell in a given row. It's worth noticing that the execution context of scalar functions is the entire row, even though the operation usually takes place on a single cell. This relationship allows using values from other cells as function arguments, for example.

Each scalar function implements a very simple interface.

<?php

declare(strict_types=1);

namespace Flow\ETL\Function;

use Flow\ETL\Row;

interface ScalarFunction
{
    public function eval(Row $row) : mixed;
}

As you can see, there's not much room for overly complicated operations here. The best and most efficient scalar functions are those that have only one, well-defined responsibility. The fewer things a function does, the easier it is to test, understand, and maintain.

Let's see what an example of scalar functions in Flow looks like:

<?php

declare(strict_types=1);

namespace Flow\ETL\Function;

use Flow\ETL\Row;

final class Power extends ScalarFunctionChain
{
    public function __construct(
        private readonly ScalarFunction|int|float $left,
        private readonly ScalarFunction|int|float $right
    ) {
    }

    public function eval(Row $row) : float|int|null
    {
        $left = (new Parameter($this->left))->asNumber($row);
        $right = (new Parameter($this->right))->asNumber($row);

        if ($left === null || $right === null) {
            return null;
        }

        if ($right === 0) {
            return null;
        }

        return $left ** $right;
    }
}

Let's start by analyzing the function constructor. We can notice that the arguments passed to the constructor are two other can also be scalar functions. This simple trick allows us to refer to values from different cells using the ref function.

The ref(string $entry) : EntryReference function returns an instance of the EntryReference object, which implements the ScalarFunction interface. This implementation involves returning the value from the cell that the reference points to.

It's worth mentioning another very frequently used scalar function, lit(mixed $value) : Literal. The lit function has a very similar operation to ref, with the difference that instead of extracting a value from a cell, it returns the value passed as an argument.

Now that we know the construction of a scalar function, let's look at the implementation of the eval(Row $row) : mixed method.

public function eval(Row $row) : float|int|null
{
    // Retrieving parameters
    $left = (new Parameter($this->left))->asNumber($row);
    $right = (new Parameter($this->right))->asNumber($row);

    // Validating parameters
    if ($left === null || $right === null) {
        return null;
    }

    if ($right === 0) {
        return null;
    }

    // Execution
    return $left ** $right;
}

We can assume that the eval method consists of 3 parts:

Let's now analyze each of these parts.

Retrieving parameters - the purpose of this action is to reduce all scalar functions to specific values by calling them on the passed row.

Parameters validation - when we have all parameters reduced to simple values, we need to make sure they are correct. We check if something is not null or if the values are of the appropriate type. In case the values are incorrect, instead of throwing an exception, we should return null.

Execution - this part of the method contains the actual logic of the scalar function. In our case, it's raising a value to a power. A significant portion of scalar functions are nothing more than functions provided by PHP.

This construction allows for very easy addition of new scalar functions, and for their very simple testing. Let's look at the tests for the power function:

<?php

declare(strict_types=1);

namespace Flow\ETL\Tests\Unit\Function;

use function Flow\ETL\DSL\{int_entry, lit, ref, str_entry};
use Flow\ETL\Row;
use PHPUnit\Framework\TestCase;

final class PowerTest extends TestCase
{
    public function test_power_non_numeric_values() : void
    {
        self::assertNull(
            ref('int')->power(lit('non numeric'))->eval(Row::create(int_entry('int', 10)))
        );
        self::assertNull(
            ref('str')->power(lit(2))->eval(Row::create(str_entry('str', 'abc')))
        );
    }

    public function test_power_two_numeric_values() : void
    {
        self::assertSame(
            100,
            ref('int')->power(lit(2))->eval(Row::create(int_entry('int', 10)))
        );
    }
}

To improve developer experience, EntryReference inherits from the abstract class ScalarFunctionChain, which aggregates all functions in one place so that they can be referred to directly from ref(). Let's look at some examples:

ref('integer_value')->power(lit(2));
ref('integer_value')->power(ref('other_column'));

The practical use of a scalar function will therefore look like this:

df()->withEntry('power_of_integer_value', ref('integer_value')->power(ref('other_column')));

Some scalar functions are also available in the form of regular functions registered in the DSL. An example of such a function is when which takes 3 arguments:

If we look at the implementation of this function, we'll notice that all its arguments are mixed (so we can provide literal values or scalar functions). This allows us to build very complicated nested conditions.

function when(mixed $condition, mixed $then, mixed $else = null) : When
{
    return new When($condition, $then, $else);
}

References

Contributors

Join us on GitHub external resource
scroll back to top