Why Huawei Cloud DLI?
All in SQL
With multi-model engines, DLI makes big data analysis accessible to those without a big data background — all you need is knowledge of SQL. It is fully compatible with Apache Spark, Apache Flink, and Trino ecosystems and interfaces, and offline applications can be effortlessly migrated to the cloud. One set of resources can handle multiple types of computations, including stream processing, batch processing, and interactive analysis.
Elastic Resource Pool
The resource pool is flexible and can quickly adjust to fluctuations in demand for offline, real-time, and interactive resources. It also supports job-level priority and integrates with DataArts Studio for streamlined operations, ensuring the timely completion of critical tasks.
DLI incorporates the Hudi data lake format for analysis and supports a unified solution for multiple engines, such as Spark and Flink. It also allows for real-time data ingestion and lakehouse analysis using SQL.
DLI supports multiple data formats and can analyze data from various sources, such as cloud, on-premises databases, and offline databases, without requiring data migration. This allows for the creation of a unified view of enterprise data, empowering businesses to rapidly innovate and unlock the value of their data.
Valuable Data Insights for Any Scenario
Analyze data stored in a relational database, such as registration information for an application.
DLI's SQL syntax is fully compatible with the ANSI SQL 2003 standard for relational databases, requires no additional learning, and allows you to use SQL the way you always have.
DLI uses a distributed in-memory computing model to easily handle massive amounts of data.
In the e-commerce industry, it is crucial to obtain information from multiple sources for correlation analysis to optimize precision marketing and improve conversion rates. For example, correlating "page ad click event data" with "user registration data" to identify the types of ads preferred by different age groups, and delivering more accurate ads to users according to their age.
Data can be correlated and analyzed between "page ad click event data" stored in OBS and "user registration data" stored in RDS without the need for data migration.
DLI has integrated multiple data sources, and data source mapping can be completed directly through SQL table creation.
Game companies rely on data analysis platforms to overcome industry challenges by harnessing the power of data. For example, finding high-quality advertising channels, improving retention of new players, optimizing operational activities to increase player activity, and driving product iteration through data.
DLI is billed only during usage, reducing costs by more than 50% compared to exclusive clusters.
DLI's three engines share metadata, and data is cleaned in real-time before being stored for offline ETL processing. The processing results can be directly used for interactive analysis and data exploration.
Large enterprises often have multiple departments utilizing cloud services, which requires the management of permissions for various employees within each department. This includes overseeing the creation, deletion, use, and isolation of compute resources. At the same time, they also need to manage the data of each department, ensuring that proper isolation and sharing protocols are in place.
Column-level permission control; separate permission control for INSERT INTO/OVERWRITE; read-only permission control for table metadata.
Use IAM to manage users (no need to create separate DLI users), and support fine-grained authorization through IAM.
In the field of genetics, there are many third-party analysis libraries based on the Spark distributed framework, such as ADAM and Hail.
You can package third-party analysis libraries such as ADAM and Hail based on base images, which can be directly uploaded to SWR. When running jobs on DLI, the custom images in SWR will be automatically pulled.
Built-in Huawei-enhanced versions of Spark/Flink and open-source AI images for TensorFlow, Keras, and PyTorch.
To increase the likelihood of eliminating or reducing the occurrence of risk events, a risk control system is needed for typical scenarios such as registration, login, and transaction control.
Using Apache Flink's Dataflow model, it is a fully real-time computing framework. It uses high-performance compute resources and can process 1,000 to 20,000 messages per second per CPU.
Using SQL, processed data streams can be written to multiple cloud services such as CloudTable and SMN.
To better manage the COVID-19 pandemic, governments needed to use real-time dashboards to monitor key data, such as current confirmed cases, cumulative confirmed cases, and imported cases, providing data support for the next stage of pandemic control.
The built-in openLooKeng engine uses many query optimization techniques to meet high-performance, millisecond-level interactive analysis needs on top of an in-memory computing framework.
Pure SQL development method with full compatibility with the standard ANSI SQL 2003 syntax.