專案實戰從0到1之hive（45）大資料專案之電商數倉（用三）

阿新 • • 發佈：2021-01-21

第20章需求九：每個使用者累計訪問次數

結果如下

使用者   日期         小計   總計
mid1  2019-12-14    10    10
mid1  2019-02-11    12    22
mid2  2019-12-14    15    15
mid2  2019-02-11    12    27

20.1 DWS層

20.1.1 建表語句

drop table if exists dws_user_total_count_day;
create external table dws_user_total_count_day( 
   `mid_id` string COMMENT '裝置id',
`subtotal` bigint COMMENT '每日登入小計'
)
partitioned by(`dt` string)
row format delimited fields terminated by '\t'
location '/warehouse/gmall/dws/dws_user_total_count_day';

20.1.2 匯入資料

1）匯入資料

insert overwrite table dws_user_total_count_day 
partition(dt='2019-12-14')
select
   mid_id,
  count(mid_id) cm
from
dwd_start_log
where
   dt='2019-12-14'
group by
   mid_id;

2）查詢結果

select * from dws_user_total_count_day;

20.1.3 資料匯入指令碼

1）建立指令碼dws_user_total_count_day.sh

[kgg@hadoop102 bin]$ vim dws_user_total_count_day.sh
在指令碼中填寫如下內容
#!/bin/bash

# 定義變數方便修改
APP=gmall
hive=/opt/module/hive/bin/hive
hadoop=/opt/module/hadoop-2.7.2/bin/hadoop

# 如果是輸入的日期按照取輸入日期；如果沒輸入日期取當前時間的前一天
if [ -n "$1" ] ;then
  do_date=$1
else 
  do_date=`date -d "-1 day" +%F`
fi

echo "===日誌日期為 $do_date==="
sql="
insert overwrite table "$APP".dws_user_total_count_day partition(dt='$do_date')
select
   mid_id,
   count(mid_id) cm
from
  "$APP".dwd_start_log
where
  dt='$do_date'
group by
   mid_id,dt;
"

$hive -e "$sql"

2）增加指令碼執行許可權

chmod 777 ads_user_total_count.sh

3）指令碼使用

 ads_user_total_count.sh 2019-02-20

4）查詢結果

select * from ads_user_total_count;

5）指令碼執行時間

企業開發中一般在每天凌晨30分~1點

20.2 ADS層

20.2.1 建表語句

drop table if exists ads_user_total_count;
create external table ads_user_total_count( 
   `mid_id` string COMMENT '裝置id',
   `subtotal` bigint COMMENT '每日登入小計',
   `total` bigint COMMENT '登入次數總計'
)
partitioned by(`dt` string)
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_user_total_count';

20.2.2 匯入資料

insert overwrite table ads_user_total_count partition(dt='2019-10-03')
select
  if(today.mid_id is null, yesterday.mid_id, today.mid_id) mid_id,
  today.subtotal,
  if(today.subtotal is null, 0, today.subtotal) + if(yesterday.total is null, 0, yesterday.total) total
from (
 select
   *
 from dws_user_total_count_day
 where dt='2019-10-03'
) today
full join (
 select
   *
 from ads_user_total_count
 where dt=date_add('2019-10-03', -1)
) yesterday
on today.mid_id=yesterday.mid_id

20.2.3 資料匯入指令碼

1）建立指令碼

[kgg@hadoop102 bin]$ vim ads_user_total_count.sh
在指令碼中編寫如下內容
#!/bin/bash

db=gmall
hive=/opt/module/hive-1.2.1/bin/hive
hadoop=/opt/module/hadoop-2.7.2/bin/hadoop

if [[ -n $1 ]]; then
  do_date=$1
else
  do_date=`date -d '-1 day' +%F`
fi

sql="
use gmall;
insert overwrite table ads_user_total_count partition(dt='$do_date')
select
 if(today.mid_id is null, yesterday.mid_id, today.mid_id) mid_id,
  today.subtotal,
 if(today.subtotal is null, 0, today.subtotal) + if(yesterday.total is null, 0, yesterday.total) total
from (
  select
   *
  from dws_user_total_count_day
  where dt='$do_date'
) today
full join (
  select
   *
  from ads_user_total_count
  where dt=date_add('$do_date', -1)
) yesterday
on today.mid_id=yesterday.mid_id
"

$hive -e "$sql"

2）增加指令碼執行許可權

chmod 777 ads_user_total_count.sh

3）指令碼使用

ads_user_total_count.sh 2019-02-20

4）查詢結果

select * from ads_user_total_count;

5）指令碼執行時間

企業開發中一般在每天凌晨30分~1點

第21章需求十：新收藏使用者數

新收藏使用者：指的是在某天首次新增收藏的使用者

21.1 DWS層建立使用者日誌行為寬表

考慮到後面的多個需求會同時用到多張表中的資料, 如果每次都join操作, 則影響查詢的效率. 可以先提前做一張寬表, 提高其他查詢的執行效率.

每個使用者對每個商品的點選次數, 點贊次數, 收藏次數

21.1.1 建表語句

drop table if exists dws_user_action_wide_log;
CREATE EXTERNAL TABLE dws_user_action_wide_log(
   `mid_id` string COMMENT '裝置id',
   `goodsid` string COMMENT '商品id',
   `display_count` string COMMENT '點選次數',
   `praise_count` string COMMENT '點贊次數',
   `favorite_count` string COMMENT '收藏次數')
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_user_action_wide_log/'
TBLPROPERTIES('parquet.compression'='lzo');

21.1.2 匯入資料

insert overwrite table dws_user_action_wide_log partition(dt='2019-12-14')
select
   mid_id,
   goodsid,
   sum(display_count) display_count,
   sum(praise_count) praise_count,
   sum(favorite_count) favorite_count
from
( select
     mid_id,
     goodsid,
    count(*) display_count,
    0 praise_count,
    0 favorite_count
  from
     dwd_display_log
  where
     dt='2019-12-14' and action=2
  group by
     mid_id,goodsid

  union all

  select
     mid_id,
     target_id goodsid,
    0,
    count(*) praise_count,
    0
  from
     dwd_praise_log
  where
     dt='2019-12-14'
  group by
     mid_id,target_id

  union all

  select
     mid_id,
     course_id goodsid,
    0,
    0,
    count(*) favorite_count
  from
     dwd_favorites_log
  where
     dt='2019-12-14'
  group by
     mid_id,course_id
)user_action
group by 
mid_id,goodsid;

21.1.3 資料匯入指令碼

[kgg@hadoop102 bin]$ vi dws_user_action_wide_log.sh
[kgg@hadoop102 bin]$ chmod 777 dws_user_action_wide_log.sh

#!/bin/bash
db=gmall
hive=/opt/module/hive-1.2.1/bin/hive
hadoop=/opt/module/hadoop-2.7.2/bin/hadoop

if [[ -n $1 ]]; then
  do_date=$1
else
  do_date=`date -d '-1 day' +%F`
fi

sql="
use gmall;
insert overwrite table dws_user_action_wide_log partition(dt='$do_date')
select
   mid_id,
   goodsid,
   sum(display_count) display_count,
   sum(praise_count) praise_count,
   sum(favorite_count) favorite_count
from
( select
     mid_id,
     goodsid,
     count(*) display_count,
    0 praise_count,
    0 favorite_count
   from
     dwd_display_log
   where
    dt='$do_date' and action=2
   group by
     mid_id,goodsid

   union all

   select
     mid_id,
     target_id goodsid,
    0,
     count(*) praise_count,
    0
   from
     dwd_praise_log
   where
    dt='$do_date'
   group by
     mid_id,target_id

   union all

   select
     mid_id,
     course_id goodsid,
    0,
    0,
     count(*) favorite_count
   from
     dwd_favorites_log
   where
    dt='$do_date'
   group by
     mid_id,course_id
)user_action
group by
mid_id,goodsid;
"

$hive -e "$sql"

21.2 DWS層

使用日誌資料使用者行為寬表作為DWS層表

21.3 ADS層

21.3.1 建表語句

drop table if exists ads_new_favorites_mid_day;
create external table ads_new_favorites_mid_day( 
   `dt` string COMMENT '日期',
   `favorites_users` bigint COMMENT '新收藏使用者數'
) 
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_new_favorites_mid_day';

21.3.2 匯入資料

insert into table ads_new_favorites_mid_day
select
  '2019-12-14' dt,
  count(*) favorites_users
from
(
  select
     mid_id
  from
     dws_user_action_wide_log
  where
     favorite_count>0
  group by
     mid_id
  having
     min(dt)='2019-12-14'
)user_favorite;

21.3.3 資料匯入指令碼

1）建立指令碼ads_new_favorites_mid_day.sh

[kgg@hadoop102 bin]$ vim ads_new_favorites_mid_day.sh
在指令碼中填寫如下內容
#!/bin/bash

# 定義變數方便修改
APP=gmall
hive=/opt/module/hive/bin/hive
hadoop=/opt/module/hadoop-2.7.2/bin/hadoop

# 如果是輸入的日期按照取輸入日期；如果沒輸入日期取當前時間的前一天
if [ -n "$1" ] ;then
  do_date=$1
else 
  do_date=`date -d "-1 day" +%F`
fi

echo "===日誌日期為 $do_date==="
sql="
insert into table "$APP".ads_new_favorites_mid_day
select
  '$do_date' dt,
   count(*) favorites_users
from
(
   select
     mid_id
   from
    "$APP".dws_user_action_wide_log
   where
     favorite_count>0
   group by
     mid_id
   having
     min(dt)='$do_date'
)user_favorite;
"

$hive -e "$sql"

2）增加指令碼執行許可權

chmod 777 ads_new_favorites_mid_day.sh

3）指令碼使用

ads_new_favorites_mid_day.sh 2019-02-20

4）查詢結果

select * from ads_new_favorites_mid_day;

5）指令碼執行時間

企業開發中一般在每天凌晨30分~1點

第22章需求十一：各個商品點選次數top3的使用者

22.1 DWS層

使用日誌資料使用者行為寬表作為DWS層表

22.2 ADS層

22.2.1 建表語句

drop table if exists ads_goods_count;
create external table ads_goods_count( 
   `dt` string COMMENT '統計日期',
   `goodsid` string COMMENT '商品',
   `user_id` string COMMENT '使用者',
   `goodsid_user_count` bigint COMMENT '商品使用者點選次數'
) 
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_goods_count';

22.2.2 匯入資料

insert into table ads_goods_count
select
  '2019-10-03',
   goodsid,
   mid_id,
   sum_display_count
from(
  select
    goodsid,
    mid_id,
    sum_display_count,
    row_number() over(partition by goodsid order by sum_display_count desc) rk
  from(
   select
     goodsid,
     mid_id,
     sum(display_count) sum_display_count
   from dws_user_action_wide_log
   where display_count>0
   group by goodsid, mid_id
   ) t1
) t2
where rk <= 3

22.2.3 資料匯入指令碼

1）建立指令碼ads_goods_count.sh

[kgg@hadoop102 bin]$ vim ads_goods_count.sh
在指令碼中填寫如下內容
#!/bin/bash

db=gmall
hive=/opt/module/hive/bin/hive
hadoop=/opt/module/hadoop/bin/hadoop

if [[ -n $1 ]]; then
  do_date=$1
else
  do_date=`date -d '-1 day' +%F`
fi

sql="
use gmall;
insert into table ads_goods_count
select
  '$do_date',
   goodsid,
   mid_id,
   sum_display_count
from(
   select
    goodsid,
    mid_id,
    sum_display_count,
    row_number() over(partition by goodsid order by sum_display_count desc) rk
   from(
    select
     goodsid,
     mid_id,
     sum(display_count) sum_display_count
    from dws_user_action_wide_log
    where display_count>0
    group by goodsid, mid_id
   ) t1
) t2
where rk <= 3
"
$hive -e "$sql"