StarTree Inc., which sells a real-time analytics platform and cloud service based on the Apache Pinot open-source online analytical processing database, today becomes the latest data analytics provider to announce full support for Apache Iceberg.
The StarTree Cloud managed service will employ Iceberg as the analytic and serving layer on top of its data lakehouse, effective today. The company said the move creates new use cases for Iceberg in real-time applications requiring high concurrency across thousands of simultaneous users. In particular, it enables Iceberg to be more easily applied to customer-facing scenarios where organizations want to expose data externally without relying on complex, multi-step pipelines.
Iceberg is a management layer that sits atop data files in cloud storage to improve consistency, manageability and query performance. It has been rapidly gaining acceptance as a de facto table standard, replacing an assortment of proprietary alternatives.
Iceberg provides transactional access to structured files in formats such as Parquet, a columnar storage file format optimized for efficient read/write access to large analytical datasets. However, Iceberg lacks native capabilities to process low-latency, high-concurrency queries.
For this reason, organizations have typically extracted Iceberg data into separate systems, such as key-value stores or proprietary formats, to achieve subsecond responsiveness. These require engineering-intensive pipelines and data duplication while limiting flexibility.
Query complexity
“Not only are you duplicating data, you’re amplifying the data itself because you have to materialize all combinations of your dimensions and metrics to make it easy to query in a key-value store-like fashion,” said Chinmay Soman, StarTree’s head of product.
StarTree said it enables direct querying of Iceberg tables without the need to move or transform the underlying data. The integration supports open formats and leverages performance-enhancing features, including Pinot indexing and materialization, local caching and intelligent prefetching.
“Data products today increasingly rely on historical data from lakehouses, but the serving layer has been missing,” said Chief Marketing Officer Chad Meley. “By querying Iceberg directly with subsecond latency, we’re eliminating the need for intermediate pipelines, duplicate storage and external databases.”
Executives said Iceberg support expands StarTree’s addressable market beyond its original focus on streaming and low-latency analytics. “This is certainly a new use case for us,” Meley said. “The primary challenge we’re solving is no longer just about data freshness. It’s about helping customers build scalable data products without all the bloat and complexity.”
StarTree enables various indexes and pre-aggregated materializations to be defined directly on Iceberg tables. Indexes for numerical data, text, JavaScript Object Notation, geospatial data and other types can be distributed locally on compute nodes or stored in object storage.
Soman said the integration is based on work StarTree had already done to query Parquet files and S3-based object storage. “Parquet is not designed for random read access, but we’ve adapted Pinot to use it as a forward index,” he said. “Combining that with our understanding of Iceberg manifests and metadata gave us the building blocks we needed.”
Data stays in place
The company emphasized that its query engine still uses proprietary indexing strategies to achieve performance, but that the data itself remains in open formats. “We’re not moving data from Iceberg into StarTree’s proprietary format,” Meley said. “The only thing proprietary in this case would be the index.”
Support for Iceberg enables customers like financial technology firms to use StarTree to power merchant-facing dashboards that report historical cash flow or cohort revenue metrics. Transportation and logistics organizations are building interactive dashboards to review delivery performance, error rates and route efficiency across time. In both cases, data doesn’t need to be real-time, but must still be served with strict service level agreements to large user bases.
Paul Nashawaty, principal analyst at theCUBE Research, SiliconANGLE’s sister market research firm, said the approach addresses a growing gap in modern data architecture. “Iceberg adoption is accelerating, but most query engines can’t meet the performance SLAs of customer-facing applications,” he said. “StarTree’s ability to serve Iceberg data at high concurrency without duplication is a timely advancement.”
Soman said there are minor performance tradeoffs using Iceberg instead of Pinot’s proprietary native format, but that Pinot is still capable of handling hundreds of queries per second with subsecond latencies.
Meley said that the decision to support Iceberg reflects both market momentum and practical customer needs. “All of our customers are asking about Iceberg,” he said. “It’s becoming the standard for lakehouse storage, and this allows us to support that natively while simplifying the architecture for serving data products.”
Photo: Pixabay
Support our open free content by sharing and engaging with our content and community.
Join theCUBE Alumni Trust Network
Where Technology Leaders Connect, Share Intelligence & Create Opportunities
11.4k+
CUBE Alumni Network
C-level and Technical
Domain Experts
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.
SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.