1. 程式人生 > 實用技巧 >大資料Hadoop 學習心得003 (操作篇)

大資料Hadoop 學習心得003 (操作篇)

主要對於Hadoop的MapReduce:

MapReduce:

Map: 解析分發資料

Reduce: 執行計算邏輯

例如:DB--sql語句根據地址來從access運算資料(採用分而治之,並非採用單一的節點進行運算,多節點參與運算)

select count(*) from access log group by url;

例如:

A B C D

url1 url1 url1 url1

url2 url2 url2 url2

url3 url3 url3 url3

url4 url4 url4 url4

當進行運算時(多節點分而治之): count1節點來計算url1

count2節點來計算url2

count3節點來計算url3

count4節點來計算url4

具體程式碼實現上:

public class WordCount{

//分發資料
public static class TokenizerMapper 
extends Mapper<Object,Text,Text,IntWritable>{

private final static IntWritable one = new IntWritable(1); private Text word=new Text(); //核心的Map 解析分析資料 public void map(Object key,Text value,Context context) throws IOExcption,InterruptedException{ //對資料進行切分和資料化 StringTokenizer itr= new StringTokenizer(value.toString()); while(itr.hasMoreTakens()){ word.set(itr.nextToken());
//將歸納組隊 相同的Key分成一組(這裡的key為word) value為統計的某個屬性 context.write(word,one); } } } public static IntSumReduce extends Reduce<Text,IntWritable,Text,IntWritable>{ private IntWritable result=new IntWritable(); public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,IntrruptException{ int sum=0; for(IntWritable val:values){ sum+=val.get(); } result.set(sum); context.write(key,result); } } public static void main(String[] args){ Configuration config=new Configuration(); Job job=Job.getInstance(config,"word count"); job.setJarByClass(WordCount.class) .setMapperClass(Tokenizer.class) .setCombinerClass(IntSumReduce.class) .setReducerClass(IntSumReduce.class) .setOutputKeyClass(Text.class) .setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job,new Path(args[0])); FileOutputFormat.addOutputPath(job,new Path(args[1])); System.out.println(job.waitForCompletion(true)?0:1); } }