[go: nahoru, domu]

Skip to content
/ luisy Public
forked from boschglobal/luisy

A Python framework to build reproducible, robust, and scalable data pipelines

License

Notifications You must be signed in to change notification settings

windisch/luisy

 
 

Repository files navigation

luisy

Test Package Test docs PyPI

This tool is an extension for the Python Framework luigi which helps to build reproducable and complex data pipelines for batch jobs. Visit our docs to learn more!


This is how an end-to-end luisy pipeline may look like:

    import luisy
    import pandas as pd
    
    @luisy.raw
    @luisy.csv_output(delimiter=',')
    class InputFile(luisy.ExternalTask):
        label = luisy.Parameter()
    
        def get_file_name(self): 
            return f"file_{self.label}"
    
    @luisy.interim
    @luisy.requires(InputFile)
    class ProcessedFile(luisy.Task):
        def run(self):
            df = self.input().read()
            # Some more preprocessings
            # ...
            # Write to disk
            self.write(df)
    
    @luisy.final
    class MergedFile(luisy.ConcatenationTask):
        def requires(self):
            for label in ['a', 'b', 'c', 'd']:
                yield ProcessedFile(label=label)

Stable Branch: main

Minimum python version: 3.8

Install luisy with

pip install luisy

To run all unittests that are inside the tests directory use the following command:

pytest

Please have a look at our contribution guide.

Runtime dependencies

Name License Type
numpy BSD-3-Clause License Dependency
pandas BSD 3-Clause License Dependency
networkx BSD-3-Clause License Dependency
luigi Apache License 2.0 Dependency
distlib Python license Dependency
matplotlib Other Dependency
azure-storage-blob MIT License Dependency
tables BSD license Dependency
pipdeptree MIT License Dependency
requirements-parser Apache License 2.0 Dependency
pyarrow Apache License 2.0 Dependency
spark Apache License 2.0 Dependency

Development dependency

Name License Type
sphinx BSD-2-Clause Dependency
sphinx_rtd_theme MIT License Dependency
flake8 MIT License Dependency
pytest MIT License Dependency
pytest-flake8 BSD License Dependency
pytest-cov MIT License Dependency
pip-tools BSD 3-Clause License Dependency

About

A Python framework to build reproducible, robust, and scalable data pipelines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Makefile 0.2%