1. 程式人生 > >Hive(12):Hive分析函式

Hive(12):Hive分析函式

一、實現功能

對於分組之後的資料進行處理。

官網:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

二、例項

1.測試表

emp.empno       emp.ename       emp.job emp.mgr emp.hiredate    emp.sal emp.comm        emp.deptno
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10

2.例項

(1)查詢部門編號10的所有員工的資訊,按照薪資進行降序排列

select * from emp where deptno='10' order by sal desc;
結果:
emp.empno       emp.ename       emp.job emp.mgr emp.hiredate    emp.sal emp.comm        emp.deptno
7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10
7782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    10
7934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10

(2)查詢所有部門的員工的資訊,按照薪資進行降序排列,多加一個欄位:顯示該部門的最高薪資, 或者顯示該部門的最低薪資。備註:求最大值,要desc;求最小值,要asc。否則會出錯!

  求最大值

select empno,ename,deptno,sal,max(sal) over (partition by deptno order by sal desc) as max_sal from emp;
結果:
empno   ename   deptno  sal     max_sal	
7839    KING    10      5000.0  5000.0	
7782    CLARK   10      2450.0  5000.0	
7934    MILLER  10      1300.0  5000.0	

7788    SCOTT   20      3000.0  3000.0	
7902    FORD    20      3000.0  3000.0	
7566    JONES   20      2975.0  3000.0	
7876    ADAMS   20      1100.0  3000.0	
7369    SMITH   20      800.0   3000.0	

7698    BLAKE   30      2850.0  2850.0	
7499    ALLEN   30      1600.0  2850.0
7844    TURNER  30      1500.0  2850.0
7654    MARTIN  30      1250.0  2850.0
7521    WARD    30      1250.0  2850.0
7900    JAMES   30      950.0   2850.0

  求最小值

select empno,ename,deptno,sal,min(sal) over (partition by deptno order by sal asc) as min_sal from emp;
結果:
empno   ename   deptno  sal     min_sal
7934    MILLER  10      1300.0  1300.0
7782    CLARK   10      2450.0  1300.0
7839    KING    10      5000.0  1300.0
7369    SMITH   20      800.0   800.0
7876    ADAMS   20      1100.0  800.0
7566    JONES   20      2975.0  800.0
7788    SCOTT   20      3000.0  800.0
7902    FORD    20      3000.0  800.0
7900    JAMES   30      950.0   950.0
7654    MARTIN  30      1250.0  950.0
7521    WARD    30      1250.0  950.0
7844    TURNER  30      1500.0  950.0
7499    ALLEN   30      1600.0  950.0
7698    BLAKE   30      2850.0  950.0

(3)查詢所有部門的員工的資訊,按照薪資進行降序排列,最後一列顯示編號

select empno,ename,deptno,sal,row_number() over (partition by deptno order by sal desc) as rn from emp;
結果:
empno   ename   deptno  sal     rn
7839    KING    10      5000.0  1
7782    CLARK   10      2450.0  2
7934    MILLER  10      1300.0  3

7788    SCOTT   20      3000.0  1
7902    FORD    20      3000.0  2
7566    JONES   20      2975.0  3
7876    ADAMS   20      1100.0  4
7369    SMITH   20      800.0   5

7698    BLAKE   30      2850.0  1
7499    ALLEN   30      1600.0  2
7844    TURNER  30      1500.0  3
7654    MARTIN  30      1250.0  4
7521    WARD    30      1250.0  5
7900    JAMES   30      950.0   6

(4)去重薪水一樣的(總共有14個人,有4個人兩兩薪水是一樣的,hql結果應該是12人)

select deptno,count(DISTINCT sal) over (partition by deptno ) as countNum from emp group by deptno;
結果:
deptno  countnum
10      3
20      4
30      5

(5)統計每個部門的人數

select deptno,count(*) as count from emp group by deptno;
結果:
deptno  count
10      3
20      5
30      6

或者

select deptno,count(empno) over (partition by deptno) as count from emp group by deptno,empno;
結果:
deptno  count
10      3
10      3
10      3
20      5
20      5
20      5
20      5
20      5
30      6
30      6
30      6
30      6
30      6
30      6