HPL/SQL is included to Apache Hive since version 2.0
HPL/SQL is included to Apache Hive since version 2.0
The role of Hadoop in Data Warehousing is huge. But to implement comprehensive ETL, reporting, analytics and data mining processes you not only need distributed processing engines such as MapReduce, Spark or Tez, you also need a way to express comprehensive business rules.
HPL/SQL allows you to implement business logic using variables, expressions, flow-of-control statements and iterations. HPL/SQL supports error handling using exceptions and condition handlers. You can develop programs that manages and controls distributed processes but at the same time is not a bottleneck of the system.
One of the key features of HPL/SQL is that it allows you to make SQL much more dynamic. You can use advanced expressions, various built-in functons, conditions to generate SQL on the fly based on the user configuration, the results of the previous queries, data from files or non-Hadoop data sources and so on.
Traditionally database management systems offer procedural SQL languages that widely used to implement advanced data manipulation scenarios and workflows. This approach is simple and familiar to database developers and data analysts.
Compared with Python, Java or Linux shell scripting, HPL/SQL enables Hadoop for a wider audience of BI analysts and developers.
HPL/SQL offers functions and statements to make your typical ETL development much more productive.
HPL/SQL is concise, readable and maintainable for BI/SQL developers especially compared with Bash scripts, Java, Python or Scala programs.
Hadoop extends a traditional data warehouse built using a RDBMS product. This means you have to integrate multiple systems including Hadoop, RDBMS, NoSQL and others.
HPL/SQL allows you to work with multiple systems in a single script, so you can take the best of all worlds for different types of workloads and easily integrate them.
HPL/SQL tries to support syntaxes of all widely used procedural languages as much as possible. You do not need to learn a new procedural language from scratch. This facilitates the development of new code as well as migration of the existing code base to Hadoop.
HPL/SQL offers the fastest way to start working with Hadoop. Later you can re-design and implement advanced data processing workflows using Spark, Tez, Storm, Flink and other frameworks, but right now you can use your current skills and existing code to run your business logic on Hadoop.