HPL/SQL is included to Apache Hive since version 2.0
HPL/SQL is included to Apache Hive since version 2.0
Quick guide how to start using HPL/SQL.
You can install HPL/SQL by downloading .tar.gz or .zip file, or build it from the source code.
1. Download
Download a HPL/SQL release and uncompress to the preferred location, for example ~/hplsql/ directory. The HPL/SQL program directory includes the following files:
On Linux/UNIX make sure hplsql is an executable file (if you uncompress the tool from .zip file):
chmod +x <hplsql_dir>/hplsql
2. Configure CLASSPATH (Optional)
For Cloudera distributions, you can edit hplsql file, remove all lines containing
export "HADOOP_CLASSPATH=..."
and add the following line
export "HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/jars/*"
For Hortonworks distributions check if Hadoop jars are located in /usr/hdp/x.x.x.x-x/ directory and change all paths in hplsql file accordingly.
For other distributions check whether Hadoop jars are located in /usr/lib/, and make necessary changes in hplsql file.
3. Test installation
Run the following command to test HPL/SQL installation:
<hplsql_dir>/hplsql --version HPL/SQL x.x.x
Or when executed from the current directory:
./hplsql --version HPL/SQL x.x.x
If the version number is printed the tool is installed correctly.
4. Add to PATH variable (Optional)
You may add HPL/SQL directory to PATH variable:
export PATH=$PATH:<hplsql_dir>
Then you can invoke HPL/SQL by running:
hplsql <options>
HPL/SQL uses hplsql-site.xml configuration file located in the HPL/SQL program directory where hplsql.jar is located.
To run Hive queries from HPL/SQL you may need to specify the YARN job queue, for example:
<property> <name>hplsql.conn.init.hive2conn</name> <value> set mapred.job.queue.name=dev; set hive.execution.engine=mr; use sales_db; </value> </property>
Note that hplsql-site.xml located in the current directory takes precedence over the configuration file in HPL/SQL program directory.
Now you can specify options and run HPL/SQL, for example:
hplsql -e "CURRENT_DATE+1" hplsql -e "SELECT * FROM src LIMIT 1"
or
hplsql -f script.sql
Get a value from HPL/SQL script:
MDATE=$(hplsql -e "NVL(MIN_PARTITION_DATE(sales, local_dt, code='A'), '1970-01-01')")
START=$(hplsql -e 'CURRENT_DATE - 1')
Read HPL/SQL Reference for more information how to use the tool.