Apache Doris serves as an advanced data warehouse tailored for real-time analytics, providing exceptionally rapid insights into large-scale real-time data.
It features both push-based micro-batch and pull-based streaming data ingestion, achieving this within a second, along with a storage engine capable of real-time updates, appends, and pre-aggregations.
The platform is optimized for handling high-concurrency and high-throughput queries thanks to its columnar storage engine, MPP architecture, cost-based query optimizer, and vectorized execution engine.
Moreover, it supports federated querying across various data lakes like Hive, Iceberg, and Hudi, as well as traditional databases such as MySQL and PostgreSQL.
Doris also accommodates complex data types, including Array, Map, and JSON, and features a variant data type that allows for automatic inference of JSON data types.
Additionally, it employs advanced indexing techniques like NGram bloomfilter and inverted index to enhance text search capabilities.
With its distributed architecture, Doris enables linear scalability, incorporates workload isolation, and implements tiered storage to optimize resource management effectively.
Furthermore, it is designed to support both shared-nothing clusters and the separation of storage and compute resources, making it a versatile solution for diverse analytical needs.