1. 程式人生 > >HDFS Java API的使用舉例

HDFS Java API的使用舉例

lap tmp author company 守護 mapred err ips cond

  HDFS是Hadoop應用程序使用的主要分布式存儲。HDFS集群主要由管理文件系統元數據的NameNode和存儲實際數據的DataNodes組成,HDFS架構圖描述了NameNode,DataNode和客戶端之間的基本交互。客戶端聯系NameNode進行文件元數據或文件修改,並直接使用DataNodes執行實際的文件I / O。

  Hadoop支持shell命令直接與HDFS進行交互,同時也支持JAVA API對HDFS的操作,例如,文件的創建、刪除、上傳、下載、重命名等。

  HDFS中的文件操作主要涉及以下幾個類:

    Configuration:提供對配置參數的訪問

    FileSystem:文件系統對象

    Path:在FileSystem中命名文件或目錄。 路徑字符串使用斜杠作為目錄分隔符。 如果以斜線開始,路徑字符串是絕對的

    FSDataInputStream和FSDataOutputStream:這兩個類分別是HDFS中的輸入和輸出流

下面是JAVA API對HDFS的操作過程:

1.項目結構

技術分享

2.pom.xml配置

技術分享
 1 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 2     xsi:schemaLocation
="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 3 <modelVersion>4.0.0</modelVersion> 4 5 <groupId>com.zjl</groupId> 6 <artifactId>myhadoop</artifactId> 7 <version>0.0.1-SNAPSHOT</version> 8 <packaging>jar</
packaging> 9 10 <name>myhadoop</name> 11 <url>http://maven.apache.org</url> 12 13 <properties> 14 <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> 15 <hadoop.version>2.5.0</hadoop.version> 16 </properties> 17 18 <dependencies> 19 <dependency> 20 <groupId>org.apache.hadoop</groupId> 21 <artifactId>hadoop-client</artifactId> 22 <version>${hadoop.version}</version> 23 </dependency> 24 <dependency> 25 <groupId>junit</groupId> 26 <artifactId>junit</artifactId> 27 <version>3.8.1</version> 28 </dependency> 29 </dependencies> 30 </project>
View Code

3.拷貝hadoop安裝目錄下與HDFS相關的配置(core-site.xml,hdfs-site.xml,log4j.properties)到resource目錄下

技術分享
1 [[email protected] ~]$ cd /opt/modules/hadoop-2.6.5/etc/hadoop/
2 [[email protected] hadoop]$ cp core-site.xml hdfs-site.xml log4j.properties /opt/tools/workspace/myhadoop/src/main/resource/
3 [[email protected] hadoop]$
View Code

(1)core-site.xml

技術分享
 1 <?xml version="1.0" encoding="UTF-8"?>
 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3 <!--
 4   Licensed under the Apache License, Version 2.0 (the "License");
 5   you may not use this file except in compliance with the License.
 6   You may obtain a copy of the License at
 7 
 8     http://www.apache.org/licenses/LICENSE-2.0
 9 
10   Unless required by applicable law or agreed to in writing, software
11   distributed under the License is distributed on an "AS IS" BASIS,
12   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13   See the License for the specific language governing permissions and
14   limitations under the License. See accompanying LICENSE file.
15 -->
16 
17 <!-- Put site-specific property overrides in this file. -->
18 
19 <configuration>
20     <property>
21         <name>fs.defaultFS</name>
22         <!-- 如果沒有配置,默認會從本地文件系統讀取數據 -->
23         <value>hdfs://hadoop01.zjl.com:9000</value>
24     </property>
25     <property>
26         <name>hadoop.tmp.dir</name>
27         <!-- hadoop文件系統依賴的基礎配置,很多路徑都依賴它。如果hdfs-site.xml中不配置namenode和datanode的存放位置,默認就放在這個路徑中 -->
28         <value>/opt/modules/hadoop-2.6.5/data/tmp</value>
29     </property>
30 </configuration>
View Code

(2)hdfs-site.xml

技術分享
 1 <?xml version="1.0" encoding="UTF-8"?>
 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3 <!--
 4   Licensed under the Apache License, Version 2.0 (the "License");
 5   you may not use this file except in compliance with the License.
 6   You may obtain a copy of the License at
 7 
 8     http://www.apache.org/licenses/LICENSE-2.0
 9 
10   Unless required by applicable law or agreed to in writing, software
11   distributed under the License is distributed on an "AS IS" BASIS,
12   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13   See the License for the specific language governing permissions and
14   limitations under the License. See accompanying LICENSE file.
15 -->
16 
17 <!-- Put site-specific property overrides in this file. -->
18 
19 <configuration>
20     <property>
21         <!-- default value 3 -->
22         <name>dfs.replication</name>
23         <value>1</value>
24     </property>
25 </configuration>
View Code

(3)采用默認即可,打印hadoop的日誌信息所需的配置文件。如果不配置,運行程序時eclipse控制臺會提示警告

4.啟動hadoop的hdfs的守護進程,並在hdfs文件系統中創建文件(文件共步驟5中java程序讀取)

技術分享
 1 [[email protected] hadoop]$ cd /opt/modules/hadoop-2.6.5/
 2 [[email protected] hadoop-2.6.5]$ sbin/start-dfs.sh
 3 17/06/21 22:59:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 4 Starting namenodes on [hadoop01.zjl.com]
 5 hadoop01.zjl.com: starting namenode, logging to /opt/modules/hadoop-2.6.5/logs/hadoop-hadoop-namenode-hadoop01.zjl.com.out
 6 hadoop01.zjl.com: starting datanode, logging to /opt/modules/hadoop-2.6.5/logs/hadoop-hadoop-datanode-hadoop01.zjl.com.out
 7 Starting secondary namenodes [0.0.0.0]
 8 0.0.0.0: starting secondarynamenode, logging to /opt/modules/hadoop-2.6.5/logs/hadoop-hadoop-secondarynamenode-hadoop01.zjl.com.out
 9 17/06/21 23:00:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
10 [[email protected] hadoop-2.6.5]$ jps
11 3987 NameNode
12 4377 Jps
13 4265 SecondaryNameNode
14 4076 DataNode
15 3135 org.eclipse.equinox.launcher_1.3.201.v20161025-1711.jar
16 [[email protected] hadoop-2.6.5]$ bin/hdfs dfs -mkdir -p /user/hadoop/mapreduce/wordcount/input
17 17/06/21 23:07:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18 [[email protected] hadoop-2.6.5]$ cat wcinput/wc.input 
19 hadoop yarn
20 hadoop mapreduce
21 hadoop hdfs
22 yarn nodemanager
23 hadoop resourcemanager
24 [[email protected] hadoop-2.6.5]$ bin/hdfs dfs -put wcinput/wc.input /user/hadoop/mapreduce/wordcount/input/
25 17/06/21 23:20:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
26 [[email protected] hadoop-2.6.5]$
View Code

5.java代碼

技術分享
 1 package com.zjl.myhadoop;
 2 
 3 import java.io.File;
 4 import java.io.FileInputStream;
 5 
 6 import org.apache.hadoop.conf.Configuration;
 7 import org.apache.hadoop.fs.FSDataInputStream;
 8 import org.apache.hadoop.fs.FSDataOutputStream;
 9 import org.apache.hadoop.fs.FileSystem;
10 import org.apache.hadoop.fs.Path;
11 import org.apache.hadoop.io.IOUtils;
12 
13 /**
14  * 
15  * @author hadoop
16  *
17  */
18 public class HdfsApp {
19     
20     /**
21      * get file system
22      * @return
23      * @throws Exception
24      */
25     public static FileSystem getFileSystem() throws Exception{
26         //read configuration
27         //core-site.xml,core-default-site.xml,hdfs-site.xml,hdfs-default-site.xml
28         Configuration conf = new Configuration();
29         //create file system
30         FileSystem fileSystem = FileSystem.get(conf);
31         return fileSystem;
32     }
33     
34     /**
35      * read file from hdfs file system,output to the console
36      * @param fileName
37      * @throws Exception
38      */
39     public static void read(String fileName) throws Exception {
40         //read path
41         Path readPath = new Path(fileName);
42         //get file system
43         FileSystem fileSystem = getFileSystem();
44         //open file
45         FSDataInputStream inStream = fileSystem.open(readPath);
46         try{
47             //read file
48             IOUtils.copyBytes(inStream, System.out, 4096, false);
49         }catch (Exception e) {
50             e.printStackTrace();
51         }finally {
52             //io close
53             IOUtils.closeStream(inStream);
54         }
55     }
56     
57     public static void upload(String inFileName, String outFileName) throws Exception {
58         
59         //file input stream,local file
60         FileInputStream inStream = new FileInputStream(new File(inFileName));
61         
62         //get file system
63         FileSystem fileSystem = getFileSystem();
64         //write path,hdfs file system
65         Path writePath = new Path(outFileName);
66         
67         //output stream
68         FSDataOutputStream outStream = fileSystem.create(writePath);
69         try{
70             //write file
71             IOUtils.copyBytes(inStream, outStream, 4096, false);
72         }catch (Exception e) {
73             e.printStackTrace();
74         }finally {
75             //io close
76             IOUtils.closeStream(inStream);
77             IOUtils.closeStream(outStream);
78         }
79     }
80     public static void main( String[] args ) throws Exception {
81         //1.read file from hdfs to console
82 //        String fileName = "/user/hadoop/mapreduce/wordcount/input/wc.input";
83 //        read(fileName);
84         
85         //2.upload file from local file system to hdfs file system
86         //file input stream,local file
87         String inFileName = "/opt/modules/hadoop-2.6.5/wcinput/wc.input";
88         String outFileName = "/user/hadoop/put-wc.input";
89         upload(inFileName, outFileName);
90     }
91 }
View Code

6.調用方法 read(fileName)

技術分享

7.進入hdfs文件系統查看/user/hadoop目錄

技術分享

8.調用upload(inFileName, outFileName),然後刷新步驟7的頁面,文件上傳成功

技術分享

HDFS Java API的使用舉例