Databricks Open-Sources Apache Spark Declarative ETL Framework, Boosts Pipeline Builds by 90%
In a groundbreaking move for the data engineering community, Databricks has announced the open-sourcing of its core declarative ETL (Extract, Transform, Load) framework, named Apache Spark Declarative Pipelines, at the annual Data + AI Summit in San Francisco on June 11, 2025. This framework promises to revolutionize data pipeline development with a staggering 90% faster build time, empowering engineers to focus on outcomes rather than intricate coding details.
The newly open-sourced framework allows data engineers to define their pipelines using familiar languages like SQL and Python. Instead of manually coding each step, users simply describe what the pipeline should achieve, and Apache Spark handles the execution. This declarative approach simplifies both batch and streaming ETL processes, making it accessible to a broader range of professionals.
Previously known as part of Databricks' proprietary offerings under Delta Live Tables (DLT), the company has now contributed this technology to the Apache Spark open-source project. This move democratizes access to enterprise-grade data workflow automation, enabling the global Spark community to leverage Databricks’ battle-tested tools for more reliable and efficient data processing.
According to Databricks, the framework not only accelerates pipeline development but also enhances automation in areas like data quality, change data capture (CDC), ingestion, and transformation. This aligns with the company’s mission to simplify data engineering on its Data Intelligence Platform, as highlighted during the summit announcements.
The open-sourcing of Apache Spark Declarative Pipelines is expected to foster innovation within the data and AI ecosystem, encouraging collaboration and further development by community contributors. Tutorials and resources, such as those provided by Microsoft Learn for Azure Databricks, are already emerging to help users get started with building ETL pipelines using this framework.
As Databricks continues to lead in transformative tech, this release marks a significant step toward making advanced data tools more inclusive. Businesses and individual developers alike can now harness the power of faster, more efficient data pipelines to drive actionable insights and maintain a competitive edge in an increasingly data-driven world.