flow php

Caching

Cache

The goal of cache is to serialize and save on disk (or in another location defined by Cache implementation) already transformed dataset.

Cache will run a pipeline, catching each Rows and saving them into cache from where those rows can be later extracted.

This is useful for operations that require full transformation of dataset before moving forward, like, for example, sorting.

Another interesting use case for caching would be to share the dataset between multiple data processing pipelines. So instead of going to datasource multiple times and then repeating all transformations, only one ETL would do the whole job and others could benefit from the final form of dataset in a memory-safe way.

<?php 

data_frame
    ->read(from_())
    ->withEntry('...', ref('...')->doSomething())
    ->cache()
    ->write(to_())
    ->run();

By default, Flow is using Filesystem Cache, location of the cache storage can be adjusted through environment variable CACHE_DIR_ENV.

To use different cache implementation please use ConfigBuilder


Config::default()
  ->cache(
    new PSRSimpleCache(
        new Psr16Cache(
            new ArrayAdapter()
        ),
        new NativePHPSerializer()
    )
  );

The following implementations are available out of the box:

PSRSimpleCache makes possible to use any of the psr/simple-cache-implementation but it does not come with any out of the box.


Contributors

Join us on GitHub external resource
scroll back to top