Data Lake Insight (DLI)

Ease of Use

Results for TB-level data can be returned in seconds, development is simple using standard SQL, and maintenance-free operations mean low labor investment.

All-in-One Analysis

Spark, Flink, and Trino are combined to provide a seamless interactive analysis experience for processing both batch and streaming data.

Superior Cost-Effectiveness

Decoupled storage and compute architecture means lower costs, elastic resources, time-based reuse, and flexible priorities.

Open Source

Cross-source analysis capabilities are provided by supporting popular open-source data formats and integrating with mainstream BI products.

Why Huawei Cloud DLI?

All in SQL

All in SQL

  • With multi-model engines, DLI makes big data analysis accessible to those without a big data background — all you need is knowledge of SQL. It is fully compatible with Apache Spark, Apache Flink, and Trino ecosystems and interfaces, and offline applications can be effortlessly migrated to the cloud. One set of resources can handle multiple types of computations, including stream processing, batch processing, and interactive analysis.

Elastic Resource Pool

Elastic Resource Pool

  • The resource pool is flexible and can quickly adjust to fluctuations in demand for offline, real-time, and interactive resources. It also supports job-level priority and integrates with DataArts Studio for streamlined operations, ensuring the timely completion of critical tasks.



  • DLI incorporates the Hudi data lake format for analysis and supports a unified solution for multiple engines, such as Spark and Flink. It also allows for real-time data ingestion and lakehouse analysis using SQL.

Cross-Source Analysis

Cross-Source Analysis

  • DLI supports multiple data formats and can analyze data from various sources, such as cloud, on-premises databases, and offline databases, without requiring data migration. This allows for the creation of a unified view of enterprise data, empowering businesses to rapidly innovate and unlock the value of their data.

Valuable Data Insights for Any Scenario

Database Analytics
Database Analytics

Analyze data stored in a relational database, such as registration information for an application.

Familiar SQL Experience

DLI's SQL syntax is fully compatible with the ANSI SQL 2003 standard for relational databases, requires no additional learning, and allows you to use SQL the way you always have.

Superior Performance

DLI uses a distributed in-memory computing model to easily handle massive amounts of data.

Bottlenecks Resolved
Relational databases cannot handle complex queries as data volume grows.
Sharding can hinder thorough analysis.
Analyzing business data can affect online operations.
Related Services
Precision Marketing

In the e-commerce industry, it is crucial to obtain information from multiple sources for correlation analysis to optimize precision marketing and improve conversion rates. For example, correlating "page ad click event data" with "user registration data" to identify the types of ads preferred by different age groups, and delivering more accurate ads to users according to their age.

Cross-Source Analysis

Data can be correlated and analyzed between "page ad click event data" stored in OBS and "user registration data" stored in RDS without the need for data migration.

Pure SQL Operation

DLI has integrated multiple data sources, and data source mapping can be completed directly through SQL table creation.

Related Services
Log Analysis

Game companies rely on data analysis platforms to overcome industry challenges by harnessing the power of data. For example, finding high-quality advertising channels, improving retention of new players, optimizing operational activities to increase player activity, and driving product iteration through data.


DLI is billed only during usage, reducing costs by more than 50% compared to exclusive clusters.

Convergent Analysis

DLI's three engines share metadata, and data is cleaned in real-time before being stored for offline ETL processing. The processing results can be directly used for interactive analysis and data exploration.

Bottleneck Resolved
Log analysis is usually scheduled by period, resulting in a lot of idle time between each scheduling.
Related Services
Large Enterprise
Log Analysis

Large enterprises often have multiple departments utilizing cloud services, which requires the management of permissions for various employees within each department. This includes overseeing the creation, deletion, use, and isolation of compute resources. At the same time, they also need to manage the data of each department, ensuring that proper isolation and sharing protocols are in place.

Fine-Grained Permission Control

Column-level permission control; separate permission control for INSERT INTO/OVERWRITE; read-only permission control for table metadata.

Unified Management

Use IAM to manage users (no need to create separate DLI users), and support fine-grained authorization through IAM.

Related Services
Gene Data Management

In the field of genetics, there are many third-party analysis libraries based on the Spark distributed framework, such as ADAM and Hail.

Custom Images Supported

You can package third-party analysis libraries such as ADAM and Hail based on base images, which can be directly uploaded to SWR. When running jobs on DLI, the custom images in SWR will be automatically pulled.

Built-in Base Images

Built-in Huawei-enhanced versions of Spark/Flink and open-source AI images for TensorFlow, Keras, and PyTorch.

Related Services
Real-Time Risk Control

To increase the likelihood of eliminating or reducing the occurrence of risk events, a risk control system is needed for typical scenarios such as registration, login, and transaction control.

High Throughput and Low Latency

Using Apache Flink's Dataflow model, it is a fully real-time computing framework. It uses high-performance compute resources and can process 1,000 to 20,000 messages per second per CPU.

Rich Cloud Ecosystem

Using SQL, processed data streams can be written to multiple cloud services such as CloudTable and SMN.

Related Services
Real-Time Large Screen

To better manage the COVID-19 pandemic, governments needed to use real-time dashboards to monitor key data, such as current confirmed cases, cumulative confirmed cases, and imported cases, providing data support for the next stage of pandemic control.

Millisecond-Level Query Performance

The built-in openLooKeng engine uses many query optimization techniques to meet high-performance, millisecond-level interactive analysis needs on top of an in-memory computing framework.

Easy to Use

Pure SQL development method with full compatibility with the standard ANSI SQL 2003 syntax.

Continuous Service Innovation with Tens of Thousands of Customers

Continuous Service Innovation with Tens of Thousands of Customers

Start Your Journey

Gain Valuable Insights

Try Now

More Services

More Services