【hadoop生態之Hive】Hive的DML資料操縱語言【筆記+程式碼】

阿新 • • 發佈：2020-12-15

技術標籤：hadoop生態之Hive java hive hdfs hadoop 大資料

五、DML資料操作

5.1 資料匯入

5.1.1 向表中裝載資料（Load）

1）語法

hive>load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,…)];

（1）load data:表示載入資料

（2）local:表示從本地載入資料到hive表；否則從HDFS載入資料到hive表

（3）inpath:表示載入資料的路徑

（4）overwrite:表示覆蓋表中已有資料，否則表示追加

（5）into table:表示載入到哪張表

（6）student:表示具體的表名

（7）partition:表示上傳到指定分割槽

2）實操案例

（0）建立一張表

hive (default)> create table student(id string, name string) row format delimited fields terminated by '\t';

（1）載入本地檔案到hive

hive (default)> load data local inpath '/opt/module/datas/student.txt' into table default 
.student;

（2）載入HDFS檔案到hive中

上傳檔案到HDFS

hive (default)> dfs -put /opt/module/datas/student.txt /user/itstar/hive;

載入HDFS上資料

hive (default)>load data inpath '/user/itstar/hive/student.txt' into table default.student;

（3）載入資料覆蓋表中已有的資料

上傳檔案到HDFS

hive (default)> dfs -put /opt/module/datas/student.txt /user 
/itstar/hive;

載入資料覆蓋表中已有的資料

hive (default)>load data inpath '/user/itstar/hive/student.txt' overwrite into table default.student;

注：load hdfs的資料相當於mv檔案到另一個目錄中，原目錄檔案消失

5.1.2 通過查詢語句向表中插入資料（Insert）

1）建立一張分割槽表

hive (default)> create table student(id int, name string) partitioned by (month string) row format delimited fields terminated by '\t';

2）基本插入資料

  hive (default)> insert into table student partition(month='201809')  values(1,'wangwu');  

3）基本模式插入（根據單張表查詢結果）

hive (default)> insert overwrite table student partition(month='201808')

       select id, name from student where month='201809';

4）多插入模式（根據多張表查詢結果）

hive (default)> from student

       insert overwrite table student partition(month='201807')

       select id, name where month='201809'

       insert overwrite table student partition(month='201806')

       select id, name where month='201809';

5.1.3 查詢語句中建立表並載入資料（As Select）

詳見4.5.1章建立表。

根據查詢結果建立表（查詢的結果會新增到新建立的表中）

 create table if not exists student3  as select id, name  from student;

5.1.4 建立表時通過Location指定載入資料路徑

1）建立表，並指定在hdfs上的位置

hive (default)> create table if not exists student5(

       id int, name string

       )

       row format delimited fields terminated by '\t'

       location '/user/hive/warehouse/student5';

2）上傳資料到hdfs上

hive (default)> dfs -put /opt/module/datas/student.txt /user/hive/warehouse/student5;

3）查詢資料

hive (default)> select * from student5;

5.1.5 Import資料到指定Hive表中

注意：先用export匯出後，再將資料匯入。同在HDFS上是Copy級操作

  hive (default)> export table default.student to  '/user/hive/warehouse/export/student';

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片儲存下來直接上傳(img-Fe1fUfcm-1607930911177)(file:///C:/Users/18451/AppData/Local/Temp/msohtmlclip1/01/clip_image002.gif)]

  hive (default)> import table student2  partition(month='201809') from '/user/hive/warehouse/export/student';

5.2 資料匯出

5.2.1 Insert匯出

1）將查詢的結果匯出到本地,資料之間無間隔

 hive (default)> insert overwrite local directory  '/opt/module/datas/export/student'        
 select *  from student;

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片儲存下來直接上傳(img-SXqqrtaX-1607930911180)(file:///C:/Users/18451/AppData/Local/Temp/msohtmlclip1/01/clip_image004.gif)]

2）將查詢的結果格式化匯出到本地,資料之間"\t"間隔

hive (default)> insert overwrite local directory '/opt/module/datas/export/student1'
             ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'             select * from student;

3）將查詢的結果匯出到HDFS上(沒有local)

hive (default)> insert overwrite directory '/user/itstar/student2'
             ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
             select * from student;

注：雖然同是HDFS，但不是copy操作

5.2.2 Hadoop命令匯出到本地

 hive (default)> dfs -get /user/hive/warehouse/student/month=201809/000000_0 /opt/module/datas/export/student3.txt;

5.2.3 Hive Shell 命令匯出

基本語法：（hive -f/-e 執行語句或者指令碼 > file）

$ bin/hive -e 'select *  from default.student;' > /opt/module/datas/export/student4.txt;

5.2.4 Export匯出到HDFS上

 hive (default)> export table default.student to  '/user/hive/warehouse/export/student';

5.2.5 Sqoop匯出

5.3 清除表中資料（Truncate）

注意：Truncate只能刪除管理表，不能刪除外部表中資料

hive (default)> truncate table student;

【hadoop生態之Hive】Hive的DML資料操縱語言【筆記+程式碼】

技術標籤：hadoop生態之Hivejavahivehdfshadoop大資料五、DML資料操作 5.1 資料匯入 5.1.1 向表中裝載資料（Load）

Python 網路程式設計之UDP傳送接收資料功能示例【基於socket套接字】

本文例項講述了Python 網路程式設計之UDP傳送接收資料功能。分享給大家供大家參考，具體如下：

hive的DDL(資料定義語言）操作

資料庫的DDL 建庫語法 CREATE DATABASE [IF NOT EXISTS] database_name[IF NOT EXISTS] --增強程式碼的健壯性

SQL之DML(Data Manipulation Language) 資料操縱語言

技術標籤：mysql基礎mysql資料庫javasql大資料將DML分成插入、修改、刪除，並按此順序記錄筆記。

SQL之資料操縱語言DML（資料的增，刪，改）

文章目錄 SQL-DML DML概述資料操縱語言DML(Data Manipulation Language)，故名思意，即對資料庫表中的資料進行操縱的語言。

MySQL-資料操縱語言

DML(Data Manipulation Language) 插入語句方式一：語法： insert into 表名（列名,...）value(值1,...)

SQL/資料操縱語言DML（Data Manipulation Language）

　　資料操縱語言DML主要有三種形式：　　插入：INSERT　　更新：UPDATE　　刪除：DELETE

【原創】大資料基礎之Hadoop（3）hdfs diskbalancer

hdfs單個節點內多個磁碟不均衡時（比如新加磁碟），需要手工進行diskbalancer操作，命令如下

mysql優化小技巧之去除重複項實現方法分析【百萬級資料】

本文例項講述了mysql優化小技巧之去除重複項實現方法。分享給大家供大家參考，具體如下：

【Java學習筆記九】——I/O流之字元流與資料流

宣告：本文章內容主要摘選自尚矽谷宋紅康Java教程、《Java核心卷二》、廖雪峰Java教程，示例程式碼部分出自本人，更多詳細內容推薦直接觀看以上教程及書籍，若有錯誤之處請指出，歡迎交流。

【進階之路】基於ShardingSphere的線上業務資料脫敏解決方案

因為某種原因，需要去考慮資料脫敏的問題，但是既不想因為脫敏而影響資料的操作性，又需要對一些敏感資訊進行可靠的保護。因此，正好解決了手頭問題的我就開始研究各種脫敏手段、尋求最適合目前現狀的脫敏解決方案。

【模擬】Carla之收集資料快速教程 (附完整程式碼) [7]

收集過程視覺化展示，隨後進入正文：參考與前言看到模擬群對這類任務下（用carla收集資料然後再做訓練等）需求量大，順手馬上寫一個好了，首先收集資料需要考慮清楚：

【符文之地傳說】3.4新版本Day2資料：46套卡組

小總結：飛昇儘管使用率極高，但勝率已經不足50%。火艾希、深潛等卡組在今天重回榜單，而銳雯維克托仍然位居榜首。

【專業技術】Android資料儲存之SharedPreferences

前言：程式中處理的大部分問題都與資料有關，讀取資料顯示在UI上，讀取的資料可以是本地的，也可以是網路的。儲存使用者資料到儲存空間，可以是本地的資料庫，檔案等，也可以是儲存到網路伺服器。總之大部分的程式都

【專業技術】Android資料儲存之檔案儲存

前言：上一篇文章寫了在Android中利用SharedPreferences儲存資料，SharedPreferences在儲存資料的時候主要是儲存一些應用程式的設定資訊或者少量的使用者資訊，並且是以key-value形式儲存的String類的資訊，比較有侷

【TcaplusDB知識庫】TcaplusDB資料備份之自助備份

【TcaplusDB知識庫】TcaplusDB資料備份之自助備份介紹 OMS 除了每天的定時備份外，使用者可以隨時發起自助備份，步驟如下

【TcaplusDB知識庫】TcaplusDB資料備份之定時備份介紹

【TcaplusDB知識庫】TcaplusDB資料備份之定時備份介紹每套線上運營的Tcaplus環境都有定時備份策略

【TcaplusDB知識庫】TcaplusDB資料備份之gluster安裝

【TcaplusDB知識庫】TcaplusDB資料備份之gluster安裝介紹服務端部分冷備節點機器準備好硬碟、合盤以及掛載資料盤（如：統一掛載到/data/gluster_brick）

【轉】大資料開發之 Spark 面試八股文

【轉】大資料開發之 Spark 面試八股文 1. Spark 的執行流程？具體執行流程如下：

【vue3】element-plus，Checkbox-Group多選框之繫結選中資料，資料不顯示選中問題

今天記錄一下在新專案vue3中，使用的element-plus組價庫遇到的一個問題！場景如下：有一個表格的column繫結的陣列物件，我需要對錶格的頭部實現動態可配置顯示錶格列，由於繫結的column是一個數組物件，重點來了，el

【hadoop生態之Hive】Hive的DML資料操縱語言【筆記+程式碼】

五、DML資料操作

5.1 資料匯入

5.1.1 向表中裝載資料（Load）

5.1.2 通過查詢語句向表中插入資料（Insert）

5.1.3 查詢語句中建立表並載入資料（As Select）

5.1.4 建立表時通過Location指定載入資料路徑

5.1.5 Import資料到指定Hive表中

5.2 資料匯出

5.2.1 Insert匯出

5.2.2 Hadoop命令匯出到本地

5.2.3 Hive Shell 命令匯出

5.2.4 Export匯出到HDFS上

5.2.5 Sqoop匯出

5.3 清除表中資料（Truncate）

相關推薦