HBase權威指南學習記錄（五、hbase與MapReduce整合）

阿新 • • 發佈：2019-01-13

新增依賴：

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.4.9</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.9.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>1.4.9</version>
        </dependency>

student.txt資料：

1,Sam,18
2,Tom,16
3,Jetty,25
4,LiLei,56

hbase建表：

create 'student','cf1'

1.MapReduce作業從一個檔案中讀取資料並寫入hbase表中

public class HadoopConnectTest extends Configured implements Tool{
    public static class Mapper1 extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
        @Override
        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String values[] = value.toString().split(",");
            String id = values[0];
            String name = values[1];
            String age = values[2];
//            新建put並插入資料
            Put put = new Put(id.getBytes());
            put.addColumn("cf1".getBytes(), "name".getBytes(), name.getBytes());
            put.addColumn("cf1".getBytes(), "age".getBytes(), age.getBytes());
            if (!put.isEmpty()) {
//                插入表名
                ImmutableBytesWritable ib = new ImmutableBytesWritable("student".getBytes());
                context.write(ib, put);
            }
        }
    }

    //HDFS路徑
    private static final String HDFS = "hdfs://192.168.30.141:9000";
    //輸入檔案路徑
    private static final String INPATH = HDFS + "/student.txt";

    public int run() throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = HBaseConfiguration.create();
        //任務的配置設定，configuration是一個任務的配置物件，封裝了任務的配置資訊
        conf.set("hbase.zookeeper.quorum", "hadoop1:2181,hadoop2:2181,hadoop3:2181");
        conf.set("hbase.rootdir", "hdfs://hadoop1:9000/hbase");

        Job job = Job.getInstance(conf, "HFile bulk load test");
        job.setJarByClass(HadoopConnectTest.class);
        job.setMapperClass(Mapper1.class);
        // TableMapReduceUtil是HBase提供的工具類，會自動設定mapreuce提交到hbase任務的各種配置，封裝了操作，只需要簡單的設定即可
        //設定表名為student，reducer類為空，job為此前設定號的job
        TableMapReduceUtil.initTableReducerJob("student", null, job);
        // 設定reduce過程，這裡由map端的資料直接提交，不要使用reduce類，因而設定成null,並設定reduce的個數為0
        job.setNumReduceTasks(0);

        FileInputFormat.addInputPath(job, new Path(INPATH));
        return (job.waitForCompletion(true) ? 0 : -1);
    }

    public static void main(String[] args) {
        int status = new HadoopConnectTest().run();
        System.exit(status);
    }
}

執行jar包：

./hadoop jar /usr/local/hbase.jar

驗證：

2.MapReduce從hbase表中讀取資料並存入檔案中

public class HBaseMapper extends TableMapper<Text, Text> {

    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context)
            throws IOException, InterruptedException {
        for (Cell cell : value.rawCells()) {
            String row = new String(CellUtil.cloneRow(cell));
            String name = new String(CellUtil.cloneValue(cell));
            context.write(new Text(row), new Text(name));
        }
    }
}

public class HBaseJob {
    public static final String tableName = "student";
    public static final String outputFilePath = "hdfs://hadoop1:9000/output";

    public static Configuration conf = HBaseConfiguration.create();

    static {
        conf.set("hbase.zookeeper.quorum", "hadoop1:2181,hadoop2:2181,hadoop3:2181");
        conf.set("hbase.rootdir", "hdfs://hadoop1:9000/hbase");
        conf.set("hbase.master", "hadoop1:60000");
    }

    public static void main(String[] args)
            throws IOException, InterruptedException, ClassNotFoundException {
        Scan scan = new Scan();
        scan.addColumn("cf1".getBytes(),"name".getBytes());

        Job job = Job.getInstance(conf, "hbase_word_count");
        job.setJarByClass(HBaseJob.class);

        TableMapReduceUtil.initTableMapperJob(
                "student",
                scan,
                HBaseMapper.class,
                Text.class,
                Text.class,
                job);

        FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

HBase權威指南學習記錄（五、hbase與MapReduce整合）

新增依賴： <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifact

Hadoop學習記錄（五、hadoop IO操作）

1.壓縮從標準輸入讀取的資料，然後將其寫到標準輸出通過GzipCodec的StreamCompressor物件對字串“Text”進行壓縮，再使用gunzip從標準輸出中對它進行讀取並解壓縮 public class StreamCompressor { public static

spark學習記錄（五、Spark基於資源排程管理器的提交模式）

一、Standalone（Spark自帶） 1.1 Standalone-client模式提交命令： ./spark-submit --master spark://hadoop1:7077 --class org.apache.spark.examples.Spar

Hadoop學習記錄（四、hadoop實現檔案操作）

1.從Hadoop URL讀取資料類似cat命令 public class URLCat { static{ URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); }

Hadoop學習記錄（一、Hadoop叢集的搭建）

參考：http://www.zuidemo.com/filePreview/pdfFilePreview/11202並進行補充 1.新建七個centos7系統的虛擬機器，分別命名為cluster1,cluster2等。關閉防火牆。 2.七臺主機都修改host檔案 vi /etc/host

spark學習記錄（八、廣播變數和累加器）

一、廣播變數 public class JavaExample { public static void main(String[] args) { SparkConf conf = new SparkConf(); conf.setMaster("

spark學習記錄（九、MasterHA和Spark shuffle）

一、zookeeper配置MasterHA 1.1修改conf下的spark-env.sh ： export SPARK_DAEMON_JAVA_OPTS="-Dspark-deploy-recoveryMode=ZOOKEEPER -Dspark.deploy.zookee

spark學習記錄（一、scala與java編寫wordCount比較）

新增依賴： <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.12<

js權威指南學習筆記（一）類型、值和變量

聲明 for black inf 筆記提升 under election 對象類型 1、數據類型：原始類型(primitive type) 和對象類型(object type) 原始類型包括數字、字符串和布爾值；除數字、字符串、布爾值、null（空）、undefined

SpringBoot學習記錄（五）——Servlet、Filter、Listener配置

SpringBoot中沒有了web.xml檔案，但是有時候需要使用Servlet，Listener，Filter，則SpringBoot中有2種方式： 1、Servlet3中的註解@WebSer

BigData 學習記錄（五）

merge 而且 seq 運行時間 big 存儲位置完成 setup 其中 MR(MapReduce)運行過程 client程序--》提交job至JobTracker--》分配job ID--》JobTracker檢查輸入文件存在，輸出文件不存在--》進行輸入分片--

Linux命令學習記錄（五）

oss file process itl alt 技術 bsp image 運行命令 ln命令：創建鏈接（快捷方式）每個文件有一個標示號碼，就是inode；硬鏈接原理是，使鏈接的兩個文件共享同樣的文件內容，即同樣的inode。缺陷：只能創建文件的硬鏈接，不能創建目錄的硬鏈接

Kubernetes權威指南學習筆記（一）

資源利用率 date ace epo yaml policy 下一代標識 code https://blog.csdn.net/keysilence1/article/details/70239717 概念 Kubernetes是谷歌嚴格保密十幾年的秘密武器——Bo

JPA學習（五、JPA_二級緩存）

level follow have cati mil 使用軟件什麽 idl 框架學習之JPA（五） JPA是Java Persistence API的簡稱，中文名Java持久層API，是JDK 5.0註解或XML描述對象－關系表的映射關系，並將運行期的實體對象持久化到數

python學習記錄（五）

特定 bar 打印字符 toolbar 元組 pytho www san 數字 20180829--https://www.cnblogs.com/fnng/archive/2013/04/20/3032563.html 字典字典的使用現實中的字段及在Python中

圖解HTTP學習記錄（五）

與HTTP協作的Web伺服器用單臺虛擬機器實現多個域名 HTTP/1.1 規範允許一臺 HTTP 伺服器搭建多個 Web 站點。即使物理層面只有一臺伺服器，但只要使用虛擬主機的功能，則可以假想已具有多臺伺服器。在網際網路上，域名通過 DNS 服務對映到 IP 地址（

Hadoop學習記錄（三、MapReduce）

1.將一個日誌檔案上傳到hdfs上 2. 編寫mapReduce程式碼 2.1新建一個maven專案，新增依賴 <dependencies> <dependency> <groupId>

Hadoop學習記錄（二、hdfs shell命令）

在/usr/local/hadoop-2.9.2/bin目錄下執行命令 1.檢視根目錄： ./hdfs dfs -ls / 2.檔案上傳：上傳到根目錄 ./hdfs dfs -put /tmp/test.txt / 3.檢視檔案內容 ./hdfs df

Hadoop學習記錄（七、MapReduce檔案分解與合成）

1.將若干個小檔案打包成順序檔案 public class SmallFilesToSequenceFileConverter extends Configured implements Tool { static class SequenceFileMapper

Hadoop學習記錄（六、MapReduce測試）

1.MRUnit進行單元測試加入依賴 <dependency> <groupId>org.apache.mrunit</groupId> <artifactId>mrunit&l

HBase權威指南學習記錄（五、hbase與MapReduce整合）

1.MapReduce作業從一個檔案中讀取資料並寫入hbase表中

2.MapReduce從hbase表中讀取資料並存入檔案中

相關推薦