1. 程式人生 > 實用技巧 >專案實戰從0到1之hive(41)大資料專案之電商數倉(使用者行為資料)(九)

專案實戰從0到1之hive(41)大資料專案之電商數倉(使用者行為資料)(九)

第11章 需求一:使用者活躍主題

11.1 DWS層

目標:統計當日(dau)、當週、當月活動的每個裝置明細

11.1.1 每日活躍裝置明細

1)建表語句

drop table if exists dws_uv_detail_day;
create external table dws_uv_detail_day
(
`mid_id` string COMMENT '裝置唯一標識',
`user_id` string COMMENT '使用者標識',
`version_code` string COMMENT '程式版本號',
`version_name` string COMMENT '程式版本名',
`lang` string COMMENT '系統語言',
`source` string COMMENT '渠道號',
`os` string COMMENT '安卓系統版本',
`area` string COMMENT '區域',
`model` string COMMENT '手機型號',
`brand` string COMMENT '手機品牌',
`sdk_version` string COMMENT 'sdkVersion',
`gmail` string COMMENT 'gmail',
`height_width` string COMMENT '螢幕寬高',
`app_time` string COMMENT '客戶端日誌產生時的時間',
`network` string COMMENT '網路模式',
`lng` string COMMENT '經度',
`lat` string COMMENT '緯度'
)
partitioned by(dt string)
stored as parquet
location '/warehouse/gmall/dws/dws_uv_detail_day';

2)資料匯入

以使用者單日訪問為key進行聚合,如果某個使用者在一天中使用了兩種作業系統、兩個系統版本、多個地區,登入不同賬號,只取其中之一

set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dws_uv_detail_day
partition(dt='2020-10-14')
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang))lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat
from dwd_start_log
where dt='2020-10-14'
group by mid_id;

3)查詢匯入結果

select * from dws_uv_detail_day limit 1;
select count(*) from dws_uv_detail_day;

4)思考

不同渠道來源的每日活躍數統計怎麼計算?

11.1.2 每週活躍裝置明細

根據日使用者訪問明細,獲得周使用者訪問明細。

1)建表語句

drop table if exists dws_uv_detail_wk;
create external table dws_uv_detail_wk(
`mid_id` string COMMENT '裝置唯一標識',
`user_id` string COMMENT '使用者標識',
`version_code` string COMMENT '程式版本號',
`version_name` string COMMENT '程式版本名',
`lang` string COMMENT '系統語言',
`source` string COMMENT '渠道號',
`os` string COMMENT '安卓系統版本',
`area` string COMMENT '區域',
`model` string COMMENT '手機型號',
`brand` string COMMENT '手機品牌',
`sdk_version` string COMMENT 'sdkVersion',
`gmail` string COMMENT 'gmail',
`height_width` string COMMENT '螢幕寬高',
`app_time` string COMMENT '客戶端日誌產生時的時間',
`network` string COMMENT '網路模式',
`lng` string COMMENT '經度',
`lat` string COMMENT '緯度',
`monday_date` string COMMENT '週一日期',
`sunday_date` string COMMENT '週日日期'
) COMMENT '活躍使用者按周明細'
PARTITIONED BY (`wk_dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_uv_detail_wk/';

2)資料匯入

set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dws_uv_detail_wk partition(wk_dt)
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang)) lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat,
date_add(next_day('2020-10-14','MO'),-7),
date_add(next_day('2020-10-14','MO'),-1),
concat(date_add( next_day('2020-10-14','MO'),-7), '_' , date_add(next_day('2020-10-14','MO'),-1)
)
from dws_uv_detail_day
where dt>=date_add(next_day('2020-10-14','MO'),-7) and dt<=date_add(next_day('2020-10-14','MO'),-1)
group by mid_id;

3)查詢匯入結果

select * from dws_uv_detail_wk limit 1;
select count(*) from dws_uv_detail_wk;

11.1.3 每月活躍裝置明細

1)建表語句

drop table if exists dws_uv_detail_mn;

create external table dws_uv_detail_mn(
`mid_id` string COMMENT '裝置唯一標識',
`user_id` string COMMENT '使用者標識',
`version_code` string COMMENT '程式版本號',
`version_name` string COMMENT '程式版本名',
`lang` string COMMENT '系統語言',
`source` string COMMENT '渠道號',
`os` string COMMENT '安卓系統版本',
`area` string COMMENT '區域',
`model` string COMMENT '手機型號',
`brand` string COMMENT '手機品牌',
`sdk_version` string COMMENT 'sdkVersion',
`gmail` string COMMENT 'gmail',
`height_width` string COMMENT '螢幕寬高',
`app_time` string COMMENT '客戶端日誌產生時的時間',
`network` string COMMENT '網路模式',
`lng` string COMMENT '經度',
`lat` string COMMENT '緯度'
) COMMENT '活躍使用者按月明細'
PARTITIONED BY (`mn` string)
stored as parquet
location '/warehouse/gmall/dws/dws_uv_detail_mn/';

2)資料匯入

set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dws_uv_detail_mn partition(mn)
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang)) lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat,
date_format('2020-10-14','yyyy-MM')
from dws_uv_detail_day
where date_format(dt,'yyyy-MM') = date_format('2020-10-14','yyyy-MM')
group by mid_id;

3)查詢匯入結果

select * from dws_uv_detail_mn limit 1;
select count(*) from dws_uv_detail_mn ;

11.1.4 DWS層載入資料指令碼

1)建立指令碼

[kgg@hadoop102 bin]$ vim dws_uv_log.sh
在指令碼中編寫如下內容
#!/bin/bash

# 定義變數方便修改
APP=gmall
hive=/opt/module/hive/bin/hive

# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
echo "===日誌日期為 $do_date==="
sql="
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table "$APP".dws_uv_detail_day partition(dt='$do_date')
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang)) lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat
from "$APP".dwd_start_log
where dt='$do_date'
group by mid_id;


insert overwrite table "$APP".dws_uv_detail_wk partition(wk_dt)
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang)) lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat,
date_add(next_day('$do_date','MO'),-7),
date_add(next_day('$do_date','MO'),-1),
concat(date_add( next_day('$do_date','MO'),-7), '_' , date_add(next_day('$do_date','MO'),-1)
)
from "$APP".dws_uv_detail_day
where dt>=date_add(next_day('$do_date','MO'),-7) and dt<=date_add(next_day('$do_date','MO'),-1)
group by mid_id;


insert overwrite table "$APP".dws_uv_detail_mn partition(mn)
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang))lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat,
date_format('$do_date','yyyy-MM')
from "$APP".dws_uv_detail_day
where date_format(dt,'yyyy-MM') = date_format('$do_date','yyyy-MM')
group by mid_id;
"

$hive -e "$sql"

2)增加指令碼執行許可權

chmod 777 dws_uv_log.sh

3)指令碼使用

dws_uv_log.sh 2019-02-11

4)查詢結果

select count(*) from dws_uv_detail_day where dt='2019-02-11';
select count(*) from dws_uv_detail_wk;
select count(*) from dws_uv_detail_mn ;

5)指令碼執行時間

企業開發中一般在每日凌晨30分~1點

11.2 ADS層

目標:當日、當週、當月活躍裝置數

11.2.1 活躍裝置數

1)建表語句

drop table if exists ads_uv_count;
create external table ads_uv_count(
`dt` string COMMENT '統計日期',
`day_count` bigint COMMENT '當日使用者數量',
`wk_count` bigint COMMENT '當週使用者數量',
`mn_count` bigint COMMENT '當月使用者數量',
`is_weekend` string COMMENT 'Y,N是否是週末,用於得到本週最終結果',
`is_monthend` string COMMENT 'Y,N是否是月末,用於得到本月最終結果'
) COMMENT '活躍裝置數'
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_uv_count/';

2)匯入資料

insert into table ads_uv_count 
select
'2020-10-14' dt,
daycount.ct,
wkcount.ct,
mncount.ct,
if(date_add(next_day('2020-10-14','MO'),-1)='2020-10-14','Y','N') ,
if(last_day('2020-10-14')='2020-10-14','Y','N')
from
(
select
'2020-10-14' dt,
count(*) ct
from dws_uv_detail_day
where dt='2020-10-14'
)daycount join
(
select
'2020-10-14' dt,
count (*) ct
from dws_uv_detail_wk
where wk_dt=concat(date_add(next_day('2020-10-14','MO'),-7),'_' ,date_add(next_day('2020-10-14','MO'),-1) )
) wkcount on daycount.dt=wkcount.dt
join
(
select
'2020-10-14' dt,
count (*) ct
from dws_uv_detail_mn
where mn=date_format('2020-10-14','yyyy-MM')
)mncount on daycount.dt=mncount.dt;

3)查詢匯入結果

select * from ads_uv_count ;

11.2.2 ADS層載入資料指令碼

1)建立指令碼

[kgg@hadoop102 bin]$ vim ads_uv_log.sh
在指令碼中編寫如下內容
#!/bin/bash

# 定義變數方便修改
APP=gmall
hive=/opt/module/hive/bin/hive

# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
echo "===日誌日期為 $do_date==="
sql="
set hive.exec.dynamic.partition.mode=nonstrict;

insert into table "$APP".ads_uv_count
select
'$do_date' dt,
daycount.ct,
wkcount.ct,
mncount.ct,
if(date_add(next_day('$do_date','MO'),-1)='$do_date','Y','N') ,
if(last_day('$do_date')='$do_date','Y','N')
from
(
select
'$do_date' dt,
count(*) ct
from "$APP".dws_uv_detail_day
where dt='$do_date'
)daycount join
(
select
'$do_date' dt,
count (*) ct
from "$APP".dws_uv_detail_wk
where wk_dt=concat(date_add(next_day('$do_date','MO'),-7),'_' ,date_add(next_day('$do_date','MO'),-1) )
) wkcount on daycount.dt=wkcount.dt
join
(
select
'$do_date' dt,
count (*) ct
from "$APP".dws_uv_detail_mn
where mn=date_format('$do_date','yyyy-MM')
)mncount on daycount.dt=mncount.dt;
"

$hive -e "$sql"

2)增加指令碼執行許可權

chmod 777 ads_uv_log.sh

3)指令碼使用

 ads_uv_log.sh 2019-02-11

4)指令碼執行時間

企業開發中一般在每日凌晨30分~1點

5)查詢匯入結果

 select * from ads_uv_count;