Skip to content
Search

Project Documentation

The project includes several levels of documentation, including:

  • Markdown files in the documentation directory
  • API documentation generated by PHPDoc, built in the /web/landing/build/documentation/ directory
  • Usage examples in the examples directory, divided into categories and topics

Project structure

This project is developed as a monorepo, which means that it contains multiple packages in a single repository. The project is structured as follows:

├── bin
├── build
├── documentation
├── examples
├── phpdoc
├── src
│   ├── adapter
│   ├── bridge
│   ├── cli
│   ├── core
│   ├── extension
│   ├── lib
│   └── tools
├── tools
├── var
├── vendor
└── web
    └── landing
  • bin contains the executable scripts used during development
  • build contains the build artifact, currently only the PHAR file. To build the PHAR file, you can run just phar.
  • documentation contains the documentation files of the entire project but also of each package separately. This folder is used to generate the documentation website by converting markdown into html.
  • examples contains example code that demonstrates how to use the project. Those examples are also displayed on the documentation website.
  • phpdoc contains the API documentation generated by PHPDoc. It can be built by running just docs-api.
  • src contains the source code of the project, which is divided into several packages:
    • adapter adapters are packages that provides implementations for different building blocks of the DataFrame like Extractor, Loader, Transformer
    • bridge contains bridges to connect flow libs with other libraries and frameworks.
    • cli contains the command line interface application.
    • core contains the core functionality of the project, it holds the entire DataFrame.
    • extension contains PHP extensions written in other languages (Rust, C) that provide performance-critical functionality.
    • lib contains standalone libraries that can be used independently of the project, like doctrine-dbal-bulk and parquet.
    • tools contains tools used during development.
  • tools contains tools used during development, like the phpstan, phpunit and others, to not pollute project autoloader and to keep tools outside of project dependencies.
  • var contains temporary files, like cache and logs.
  • vendor contains the dependencies of the project, managed by Composer.
  • web
    • landing contains the landing page of the project, which is a symfony application that is automatically dumped to static HTML files and served by GitHub Pages. It has it's own composer.json that defines the dependencies for the landing page and commands.

Monorepo packages

Each package is structured in a very similar way, with the following structure:

src/core
└── etl
    ├── CONTRIBUTING.md
    ├── LICENSE
    ├── README.md
    ├── composer.json
    ├── src
    │   └── Flow
    └── tests
        └── Flow

Tests are usually divided into:

  • Unit - src/core/etl/tests/Flow/ETL/Tests/Integration
  • Integraiton - src/core/etl/tests/Flow/ETL/Tests/Unit

In some cases on that level, there might be some test helpers, like:

  • Doubles - mocks / stubs / fakses used across tests

Packages Dependencies

The rule is that we should keep as few dependencies as possible. But when we are adding dependency, it should be as wide as possible, so we don't block other projects by our constraints.

Packages from this monorepo can depend on each other, but ther are strict rules about that:

  • lib - libraries can depend only on other lib packages, never on anything else.
  • adapter - adapters can depend on lib / bridge and they always depend on core. Adpaters should not depend on cli
  • bridge - bridges can depend only on lib
  • cli - CLI can depend on core, lib and adapter and bridge, cli always depends on core
  • core - core can depend on lib or bridge, but should should never depend on adapter, or cli

The above rules apply also on namespaces. So for example Adapter for CSV can't depend on Parquet adapter.

PHP Versions

This project supports only the latest three PHP versions, for example: 8.2, 8.3, and 8.4. (assuming that 8.4 is the latest version). Development is done using the lowest supported version, which is currently PHP 8.2. The project is tested against all three versions, so you can use any of them for development.

Tools

There are several tools used in this project to help with development, testing, and building the project. They are exposed as just recipes — run just --list to see every available task.

  • just lint runs all linters (Mago format check, Mago lint, monorepo validation).
  • just analyze runs static analysis (PHPStan).
  • just fix runs Mago to automatically fix coding standards and lint issues in the code.
  • just test runs all tests in the project. Forward arguments to phpunit to scope the run, e.g. just test --testsuite=lib-parquet-unit or just test --filter=my_test_method. Use tools/phpunit/vendor/bin/phpunit --list-suites to discover available testsuites.
  • just test-website runs the tests for the website.
  • just test-mutation runs mutation tests to check the quality of the tests.
  • just build runs the full pipeline: lint + analyze + tests + mutation tests.

Coding Standards

The whole project is developed as an object oriented code, with a focus on clean code principles. The coding standards are defined and automatically enforced by Mago.

Each package comes with a DSL (Domain Specific Language) that provides an easy, functional API. All functions defined in DSL (usually in functions.php files) are following the same codding standards as PHP code. snake_case is used for function names and arguments in DSL. The only other place where snake_case is used is in the tests, where it is used for test method names.

To make sure, that the whole project is aligned with the codding standards, run following commands:

just fix
just lint
just analyze

Only when all three commands pass, you should commit your changes.

Testing

  • Tests in the project are divided into
    • Unit - tests a single behavior in isolation, without any dependencies.
    • Integration - tests a single behavior with dependencies, like database or external services.
  • Test cases of all packages should extends \Flow\ETL\Tests\FlowTestCase class, the only exceptions are lib and bridge packages, which can use their own test cases.
  • Each test method should test only one scenario, when one behavior needs to be tested against multiple input data, use PHPUnit\Framework\Attributes\TestWith or PHPUnit\Framework\Attributes\DataProvider attributes.

Found a typo or an outdated section? Edit this page on GitHub


Contributors

Built in the open.

Join us on GitHub
scroll back to top