Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
-
Updated
May 21, 2026 - Python
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Incremental engine for long horizon agents 🌟 Star if you like it!
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Unified querying, transformation, and modification of JSON, TOML, YAML, XML, INI, HCL, KDL and CSV.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
A lightweight data processing framework built on DuckDB and 3FS.
A light-weight, flexible, and expressive statistical data testing library
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
High-performance AI pipeline engine with a C++ core and 50+ Python-extensible nodes. Build, debug, and scale LLM workflows with 13+ model providers, 8+ vector databases, and agent orchestration, all from your IDE. Includes VS Code extension, TypeScript/Python SDKs, and Docker deployment.
The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure
Concurrent and multi-stage data ingestion and data processing with Elixir
Kubernetes-native platform to run massively parallel data/streaming jobs
Large-scale pretraining for dialogue
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Python Stream Processing
Extract Transform Load for Python 3.5+
Scalable data pre processing and curation toolkit for LLMs
Concurrent Python made simple
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."