Hive(12):Hive分析函式
阿新 • • 發佈:2018-11-12
一、實現功能
對於分組之後的資料進行處理。
官網:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
二、例項
1.測試表
emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
2.例項
(1)查詢部門編號10的所有員工的資訊,按照薪資進行降序排列
select * from emp where deptno='10' order by sal desc; 結果: emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
(2)查詢所有部門的員工的資訊,按照薪資進行降序排列,多加一個欄位:顯示該部門的最高薪資, 或者顯示該部門的最低薪資。備註:求最大值,要desc;求最小值,要asc。否則會出錯!
求最大值
select empno,ename,deptno,sal,max(sal) over (partition by deptno order by sal desc) as max_sal from emp; 結果: empno ename deptno sal max_sal 7839 KING 10 5000.0 5000.0 7782 CLARK 10 2450.0 5000.0 7934 MILLER 10 1300.0 5000.0 7788 SCOTT 20 3000.0 3000.0 7902 FORD 20 3000.0 3000.0 7566 JONES 20 2975.0 3000.0 7876 ADAMS 20 1100.0 3000.0 7369 SMITH 20 800.0 3000.0 7698 BLAKE 30 2850.0 2850.0 7499 ALLEN 30 1600.0 2850.0 7844 TURNER 30 1500.0 2850.0 7654 MARTIN 30 1250.0 2850.0 7521 WARD 30 1250.0 2850.0 7900 JAMES 30 950.0 2850.0
求最小值
select empno,ename,deptno,sal,min(sal) over (partition by deptno order by sal asc) as min_sal from emp;
結果:
empno ename deptno sal min_sal
7934 MILLER 10 1300.0 1300.0
7782 CLARK 10 2450.0 1300.0
7839 KING 10 5000.0 1300.0
7369 SMITH 20 800.0 800.0
7876 ADAMS 20 1100.0 800.0
7566 JONES 20 2975.0 800.0
7788 SCOTT 20 3000.0 800.0
7902 FORD 20 3000.0 800.0
7900 JAMES 30 950.0 950.0
7654 MARTIN 30 1250.0 950.0
7521 WARD 30 1250.0 950.0
7844 TURNER 30 1500.0 950.0
7499 ALLEN 30 1600.0 950.0
7698 BLAKE 30 2850.0 950.0
(3)查詢所有部門的員工的資訊,按照薪資進行降序排列,最後一列顯示編號
select empno,ename,deptno,sal,row_number() over (partition by deptno order by sal desc) as rn from emp;
結果:
empno ename deptno sal rn
7839 KING 10 5000.0 1
7782 CLARK 10 2450.0 2
7934 MILLER 10 1300.0 3
7788 SCOTT 20 3000.0 1
7902 FORD 20 3000.0 2
7566 JONES 20 2975.0 3
7876 ADAMS 20 1100.0 4
7369 SMITH 20 800.0 5
7698 BLAKE 30 2850.0 1
7499 ALLEN 30 1600.0 2
7844 TURNER 30 1500.0 3
7654 MARTIN 30 1250.0 4
7521 WARD 30 1250.0 5
7900 JAMES 30 950.0 6
(4)去重薪水一樣的(總共有14個人,有4個人兩兩薪水是一樣的,hql結果應該是12人)
select deptno,count(DISTINCT sal) over (partition by deptno ) as countNum from emp group by deptno;
結果:
deptno countnum
10 3
20 4
30 5
(5)統計每個部門的人數
select deptno,count(*) as count from emp group by deptno;
結果:
deptno count
10 3
20 5
30 6
或者
select deptno,count(empno) over (partition by deptno) as count from emp group by deptno,empno;
結果:
deptno count
10 3
10 3
10 3
20 5
20 5
20 5
20 5
20 5
30 6
30 6
30 6
30 6
30 6
30 6