← Back to projects
Data API & Backend2023· Bouygues Telecom

SQL Streaming Module – Documentation & Evolutions

Contributions to an internal SQL-on-Flink streaming module at the core of Bouygues Telecom's ETL. Main deliverable: an automated documentation system and CI/CD integration to make the module production-ready.

JavaApache FlinkDocusaurusMavenSQL

Context

Bouygues Telecom's internal ETL includes a SQL streaming module built on top of Apache Flink. It exposes a SQL-derived language letting data engineers define streaming jobs (source → sink) without writing Flink code directly. The module was functional but not production-ready: connectors lacked documentation and the available options were invisible to users.

Approach

  • ·Deep-dived into the internal ETL architecture and the streaming module internals to understand the full connector system
  • ·Built a Maven Java plugin that parses the module's source code, extracts annotations, and auto-generates Markdown documentation
  • ·Deployed a Docusaurus static site to expose the generated docs in a navigable, user-friendly format
  • ·Integrated the documentation pipeline into the existing CI/CD chain
  • ·Worked on streaming evolutions: error handling improvements and robustness to malformed data

Solution

An automated documentation system that keeps the streaming module's connector reference always in sync with the source code, deployed via CI/CD — giving data engineers a reliable, up-to-date reference to write streaming jobs autonomously.

Key outcome

The streaming module went from entirely undocumented to fully referenced, unblocking data engineers from having to inspect source code to understand available connectors and options.

Project details

TypeData Engineering
Date2023
RoleDeveloper

Technologies

JavaApache FlinkDocusaurusMavenSQL