hive:條件查詢、join關聯查詢、分組聚合、子查詢
阿新 • • 發佈:2019-02-09
hive查詢語法
提示:在做小資料量查詢測試時,可以讓hive將mrjob提交給本地執行器執行,可以在hive會話中設定如下引數:
hive> set hive.exec.mode.local.auto=true;
基本查詢示例
select * from t_access;
select count(*) from t_access;
select max(ip) from t_access;
條件查詢
select * from t_access where access_time<'2017-08-06 15:30:20'
select * from t_access where access_time<'2017-08-06 16:30:20' and ip>'192.168.33.3';
join關聯查詢示例
假如有a.txt檔案
a,1
b,2
c,3
d,4
假如有b.txt檔案
a,xx
b,yy
d,zz
e,pp
進行各種join查詢:
inner join(join)
select
a.name as aname,
a.numb as anumb,
b.name as bname,
b.nick as bnick
from t_a a
join t_b b
on a.name=b.name
結果:
+--------+--------+--------+--------+--+ | aname | anumb | bname | bnick | +--------+--------+--------+--------+--+ | a | 1 | a | xx | | b | 2 | b | yy | | d | 4 | d | zz | +--------+--------+--------+--------+--+
left outer join(left join)
select
a.name as aname,
a.numb as anumb,
b.name as bname,
b.nick as bnick
from t_a a
left outer join t_b b
on a.name=b.name
結果:
right outer join(right join)
select a.name as aname, a.numb as anumb, b.name as bname, b.nick as bnick from t_a a right outer join t_b b on a.name=b.name
結果:
full outer join(full join)
結果:
left semi join
hive中不支援exist/IN子查詢,可以用left semi join來實現同樣的效果:
select
a.name as aname,
a.numb as anumb
from t_a a
left semi join t_b b
on a.name=b.name;
結果:
注意: left semi join的 select子句中,不能有右表的欄位
group by分組聚合
select dt,count(*),max(ip) as cnt from t_access group by dt;
select dt,count(*),max(ip) as cnt from t_access group by dt having dt>'20170804';
select
dt,count(*),max(ip) as cnt
from t_access
where url='http://www.edu360.cn/job'
group by dt having dt>'20170804';
## 為什麼where必須寫在group by的前面,為什麼group by後面的條件只能用having
因為,where是用於在真正執行查詢邏輯之前過濾資料用的
having是對group by聚合之後的結果進行再過濾;
上述語句的執行邏輯:
- where過濾不滿足條件的資料
- 用聚合函式和group by進行資料運算聚合,得到聚合結果
- 用having條件過濾掉聚合結果中不滿足條件的資料
子查詢
select id,name,father
from
(select id,name,family_members['brother'] as father from t_person) tmp
where father is not null;