User Tools

Site Tools


Sidebar

HPL/SQL is included to Apache Hive since version 2.0

why

This is an old revision of the document!


A PCRE internal error occured. This might be caused by a faulty plugin

====== Why HPL/SQL ====== The role of Hadoop in Data Warehousing is huge. But to implement comprehensive ETL, reporting, analytics and data mining processes you not only need distributed processing engines such as MapReduce, Spark or Tez, you also need a way to express comprehensive business rules. ===== 1. Business Logic Driver and Advanced Error Handling ===== HPL/SQL allows you to implement business logic using variables, expressions, flow-of-control statements and iterations. HPL/SQL supports error handling using exceptions and condition handlers. You can develop programs that manages and controls distributed processes but at the same time is not a bottleneck of the system. ===== 2. Make SQL-on-Hadoop More Dynamic ===== One of the key features of HPL/SQL is that it allows you to make SQL much more dynamic. You can use advanced expressions, various built-in functons, conditions to generate SQL on the fly based on the user configuration, the results of the previous queries, data from files or non-Hadoop data sources and so on. ===== 3. Leverage Existing Procedural SQL Skills ===== Traditionally database management systems offer procedural SQL languages that widely used to implement advanced data manipulation scenarios and workflows. This approach is simple and familiar to database developers and data analysts. Compared with Python, Java or Linux shell scripting, HPL/SQL enables Hadoop for a wider audience of BI analysts and developers. ===== 4. Readability and Maintainability ===== HPL/SQL is much more concise, readbale and maintainable for BI/SQL developers especially compared with Bash scripts, Java, Python or Scala programs. ===== 5. Integration and Polyglot Persistence ===== Hadoop extends a traditional data warehouse built using a RDBMS product. This means you have to integrate multiple systems including Hadoop, RDBMS, NoSQL and others. HPL/SQL allows you to work with multiple systems in a single script, so you can take the best of all worlds for different types of workloads and easily integrate them. ===== 6. Compatibility and Migration ===== HPL/SQL tries to support syntaxes of all widely used procedural languages as much as possible. You do not need to learn a new procedural language from scratch. This facilitates the development of new code as well as migration of the existing code base to Hadoop. ===== 7. Hadoop Quick Start ===== HPL/SQL offers the fastest way to start working with Hadoop. Later you can re-design and implement advanced data processing workflows using Spark, Tez, Storm, Flink and other frameworks, but right now you can use your current skills and existing code to run your business logic on Hadoop. ~~NOTOC~~