Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
-
Updated
Jul 1, 2024 - Python
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
CloudQuery Go SDK for source and destination plugins
Privacy and Security focused Segment-alternative, in Golang and React
🧙 Build, run, and manage data pipelines for integrating and transforming data.
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
Flink CDC is a streaming data integration tool
Framework for developing extractors in Python
An orchestration platform for the development, production, and observation of data assets.
Turns Data and AI algorithms into production-ready web applications in no time.
The open source high performance ELT framework powered by Apache Arrow
Hop Orchestration Platform
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Upserts, Deletes And Incremental Processing on Big Data.
A service that provides a clustering storage of metadata for Data Integration purposes
Add a description, image, and links to the data-integration topic page so that developers can more easily learn about it.
To associate your repository with the data-integration topic, visit your repo's landing page and select "manage topics."