A collection of thoughtfully crafted tools designed with care and purpose, to enhance your digital experience and solve real problems with elegance.
Scroll to explore
Modern ETL for LLM Dataset Preparation
📦 GOETL GOETL is a modern, extensible ETL (Extract, Transform, Load) utility designed for preparing datasets for LLM (Large Language Model) training and analytics. It supports both CLI and REST API modes, and comes with a sleek React-based web UI for interactive dataset preparation.
Effortlessly extract textual data from pdf nad txt files.
Clean, tokenize, and chunk text for LLM-friendly datasets.
Output to JSONL, CSV, or directly to databases (Postgres, MySQL, SQLite, MongoDB, Redis)
Generate semantic graphs from code directories
Run as a web service for programmatic or UI-driven ETL
Intuitive React frontend for easy job configuration and monitoring
Production-grade deployment with Caddy reverse proxy
An Award winning Open Data Transformation and processing Toolkit and pipeline end-point
Open-T-DATA is an open-source initiative focused on streamlining the extraction, transformation, and utilization of open datasets. It provides tools to simplify working with large-scale public datasets by offering efficient data processing pipelines and user-friendly APIs for developers and data enthusiasts.
Automated ingestion from public APIs and open data sources.
Process and structure raw data into easy-to-use formats (JSON, CSV, etc.).
transformed datasets through an intuitive API interface.
Build custom data workflows tailored to your needs.
Designed for collaboration, with contributions and feedback encouraged. Tech Stack