專案實戰從0到1之hive(45)大資料專案之電商數倉(用三)
阿新 • • 發佈:2021-01-21
結果如下
使用者 日期 小計 總計
mid1 2019-12-14 10 10
mid1 2019-02-11 12 22
mid2 2019-12-14 15 15
mid2 2019-02-11 12 27
20.1 DWS層
20.1.1 建表語句
drop table if exists dws_user_total_count_day;
create external table dws_user_total_count_day(
`mid_id` string COMMENT '裝置id',
`subtotal` bigint COMMENT '每日登入小計'
)
partitioned by(`dt` string)
row format delimited fields terminated by '\t'
location '/warehouse/gmall/dws/dws_user_total_count_day';
20.1.2 匯入資料
1)匯入資料
insert overwrite table dws_user_total_count_day
partition(dt='2019-12-14')
select
mid_id,
count(mid_id) cm
from
dwd_start_log
where
dt='2019-12-14'
group by
mid_id;
2)查詢結果
select * from dws_user_total_count_day;
20.1.3 資料匯入指令碼
1)建立指令碼dws_user_total_count_day.sh
[kgg@hadoop102 bin]$ vim dws_user_total_count_day.sh
在指令碼中填寫如下內容
2)增加指令碼執行許可權
chmod 777 ads_user_total_count.sh
3)指令碼使用
ads_user_total_count.sh 2019-02-20
4)查詢結果
select * from ads_user_total_count;
5)指令碼執行時間
企業開發中一般在每天凌晨30分~1點
20.2 ADS層
20.2.1 建表語句
drop table if exists ads_user_total_count;
create external table ads_user_total_count(
`mid_id` string COMMENT '裝置id',
`subtotal` bigint COMMENT '每日登入小計',
`total` bigint COMMENT '登入次數總計'
)
partitioned by(`dt` string)
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_user_total_count';
20.2.2 匯入資料
insert overwrite table ads_user_total_count partition(dt='2019-10-03')
select
if(today.mid_id is null, yesterday.mid_id, today.mid_id) mid_id,
today.subtotal,
if(today.subtotal is null, 0, today.subtotal) + if(yesterday.total is null, 0, yesterday.total) total
from (
select
*
from dws_user_total_count_day
where dt='2019-10-03'
) today
full join (
select
*
from ads_user_total_count
where dt=date_add('2019-10-03', -1)
) yesterday
on today.mid_id=yesterday.mid_id
20.2.3 資料匯入指令碼
1)建立指令碼
[kgg@hadoop102 bin]$ vim ads_user_total_count.sh
在指令碼中編寫如下內容
2)增加指令碼執行許可權
chmod 777 ads_user_total_count.sh
3)指令碼使用
ads_user_total_count.sh 2019-02-20
4)查詢結果
select * from ads_user_total_count;
5)指令碼執行時間
企業開發中一般在每天凌晨30分~1點
第21章 需求十:新收藏使用者數
新收藏使用者:指的是在某天首次新增收藏的使用者
21.1 DWS層建立使用者日誌行為寬表
考慮到後面的多個需求會同時用到多張表中的資料, 如果每次都join操作, 則影響查詢的效率. 可以先提前做一張寬表, 提高其他查詢的執行效率.
每個使用者對每個商品的點選次數, 點贊次數, 收藏次數
21.1.1 建表語句
drop table if exists dws_user_action_wide_log;
CREATE EXTERNAL TABLE dws_user_action_wide_log(
`mid_id` string COMMENT '裝置id',
`goodsid` string COMMENT '商品id',
`display_count` string COMMENT '點選次數',
`praise_count` string COMMENT '點贊次數',
`favorite_count` string COMMENT '收藏次數')
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_user_action_wide_log/'
TBLPROPERTIES('parquet.compression'='lzo');
21.1.2 匯入資料
insert overwrite table dws_user_action_wide_log partition(dt='2019-12-14')
select
mid_id,
goodsid,
sum(display_count) display_count,
sum(praise_count) praise_count,
sum(favorite_count) favorite_count
from
( select
mid_id,
goodsid,
count(*) display_count,
0 praise_count,
0 favorite_count
from
dwd_display_log
where
dt='2019-12-14' and action=2
group by
mid_id,goodsid
union all
select
mid_id,
target_id goodsid,
0,
count(*) praise_count,
0
from
dwd_praise_log
where
dt='2019-12-14'
group by
mid_id,target_id
union all
select
mid_id,
course_id goodsid,
0,
0,
count(*) favorite_count
from
dwd_favorites_log
where
dt='2019-12-14'
group by
mid_id,course_id
)user_action
group by
mid_id,goodsid;
21.1.3 資料匯入指令碼
[kgg@hadoop102 bin]$ vi dws_user_action_wide_log.sh
[kgg@hadoop102 bin]$ chmod 777 dws_user_action_wide_log.sh
21.2 DWS層
使用日誌資料使用者行為寬表作為DWS層表
21.3 ADS層
21.3.1 建表語句
drop table if exists ads_new_favorites_mid_day;
create external table ads_new_favorites_mid_day(
`dt` string COMMENT '日期',
`favorites_users` bigint COMMENT '新收藏使用者數'
)
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_new_favorites_mid_day';
21.3.2 匯入資料
insert into table ads_new_favorites_mid_day
select
'2019-12-14' dt,
count(*) favorites_users
from
(
select
mid_id
from
dws_user_action_wide_log
where
favorite_count>0
group by
mid_id
having
min(dt)='2019-12-14'
)user_favorite;
21.3.3 資料匯入指令碼
1)建立指令碼ads_new_favorites_mid_day.sh
[kgg@hadoop102 bin]$ vim ads_new_favorites_mid_day.sh
在指令碼中填寫如下內容
2)增加指令碼執行許可權
chmod 777 ads_new_favorites_mid_day.sh
3)指令碼使用
ads_new_favorites_mid_day.sh 2019-02-20
4)查詢結果
select * from ads_new_favorites_mid_day;
5)指令碼執行時間
企業開發中一般在每天凌晨30分~1點
第22章 需求十一:各個商品點選次數top3的使用者
22.1 DWS層
使用日誌資料使用者行為寬表作為DWS層表
22.2 ADS層
22.2.1 建表語句
drop table if exists ads_goods_count;
create external table ads_goods_count(
`dt` string COMMENT '統計日期',
`goodsid` string COMMENT '商品',
`user_id` string COMMENT '使用者',
`goodsid_user_count` bigint COMMENT '商品使用者點選次數'
)
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_goods_count';
22.2.2 匯入資料
insert into table ads_goods_count
select
'2019-10-03',
goodsid,
mid_id,
sum_display_count
from(
select
goodsid,
mid_id,
sum_display_count,
row_number() over(partition by goodsid order by sum_display_count desc) rk
from(
select
goodsid,
mid_id,
sum(display_count) sum_display_count
from dws_user_action_wide_log
where display_count>0
group by goodsid, mid_id
) t1
) t2
where rk <= 3
22.2.3 資料匯入指令碼
1)建立指令碼ads_goods_count.sh
[kgg@hadoop102 bin]$ vim ads_goods_count.sh
在指令碼中填寫如下內容
2)增加指令碼執行許可權
chmod 777 ads_goods_count.sh
3)指令碼使用
ads_goods_count.sh 2019-02-20
4)查詢結果
select * from ads_goods_count;
5)指令碼執行時間