HPL/SQL is included to Apache Hive since version 2.0
HPL/SQL is included to Apache Hive since version 2.0
The SUMMARY statement outputs the summary statistics for a table or result set.
For each column it includes the data type, number of distinct values, non-NULL rows, mean, median, standard deviation, 5%, 25%, 75% and 95% percentiles, min and max values.
The statement helps you perform quick and easy exploratory data analysis.
Syntax:
SUMMARY [TOP num] FOR table_name [WHERE condition] [LIMIT num] | select_statement;
Examples
Summary for a table:
summary for src;
Column Type Rows NonNull Unique Avg Min Max StdDev p05 p25 p50 p75 p95 KEY string 500 500 309 260.18 0 98 143.07 26.00 146.00 255.50 395.00 479.00 VALUE string 500 500 309 null val_0 val_98 null null null null null null
Summary for a query result:
summary for select code, total_emp, salary from sample_07;
Column Type Rows NonNull Unique Avg Min Max StdDev p05 p25 p50 p75 p95 code string 823 823 823 null 00-0000 53-7199 null null null null null null total_emp int 823 823 806 489748.24 340 134354250 4858790.94 4054.50 17270.00 49335.00 162662.50 1238941.00 salary int 823 819 759 47963.63 16700 192780 25706.09 21860.00 30547.50 40700.00 58747.50 92025.50
Top 3 values for each column in table:
summary top 3 for sample_07;
CODE DESCRIPTION TOTAL_EMP SALARY 53-7199 1 Aircraft mechanics and service technicians 1 25500 2 null 4 00-0000 1 Aircraft cargo handling supervisors 1 9910 2 34220 3 11-0000 1 Agricultural workers, all other 1 112300 2 35470 3
Version: HPL/SQL 0.3.31