User Tools

Site Tools


Sidebar

HPL/SQL is included to Apache Hive since version 2.0

summary

This is an old revision of the document!


SUMMARY Statement

The SUMMARY statement outputs the summary statistics for a table or result set.

For each column it includes the data type, number of distinct values, non-NULL rows, mean, median, standard deviation, 5%, 25%, 75% and 95% percentiles, min and max values.

The statement helps you perform quick and easy exploratory data analysis.

Syntax:

SUMMARY [TOP num] FOR table_name [WHERE condition] [LIMIT num] | select_statement;

Examples

Summary for a table:

summary for src;
Column      	Type       	Rows       	NonNull    	Unique     	Avg        	Min        	Max        	StdDev     	p05        	p25        	p50        	p75        	p95        
KEY         	string     	500        	500        	309        	260.18     	0          	98         	143.07     	26.00      	146.00     	255.50     	395.00     	479.00     
VALUE       	string     	500        	500        	309        	null       	val_0      	val_98     	null       	null       	null       	null       	null       	null       

Summary for a query result:

summary for select code, total_emp, salary from sample_07;
Column      	Type       	Rows       	NonNull    	Unique     	Avg        	Min        	Max        	StdDev     	p05        	p25        	p50        	p75        	p95        
code        	string     	823        	823        	823        	null       	00-0000    	53-7199    	null       	null       	null       	null       	null       	null       
total_emp   	int        	823        	823        	806        	489748.24  	340        	134354250  	4858790.94 	4054.50    	17270.00   	49335.00   	162662.50  	1238941.00 
salary      	int        	823        	819        	759        	47963.63   	16700      	192780     	25706.09   	21860.00   	30547.50   	40700.00   	58747.50   	92025.50   

Version: HPL/SQL 0.3.31