大資料Hadoop 學習心得003 (操作篇)
阿新 • • 發佈:2020-08-18
主要對於Hadoop的MapReduce:
MapReduce:
Map: 解析分發資料
Reduce: 執行計算邏輯
例如:DB--sql語句根據地址來從access運算資料(採用分而治之,並非採用單一的節點進行運算,多節點參與運算)
select count(*) from access log group by url;
例如:
A B C D
url1 url1 url1 url1
url2 url2 url2 url2
url3 url3 url3 url3
url4 url4 url4 url4
當進行運算時(多節點分而治之): count1節點來計算url1
count2節點來計算url2
count3節點來計算url3
count4節點來計算url4
具體程式碼實現上:
public class WordCount{ //分發資料 public static class TokenizerMapper extends Mapper<Object,Text,Text,IntWritable>{private final static IntWritable one = new IntWritable(1); private Text word=new Text(); //核心的Map 解析分析資料 public void map(Object key,Text value,Context context) throws IOExcption,InterruptedException{ //對資料進行切分和資料化 StringTokenizer itr= new StringTokenizer(value.toString()); while(itr.hasMoreTakens()){ word.set(itr.nextToken());//將歸納組隊 相同的Key分成一組(這裡的key為word) value為統計的某個屬性 context.write(word,one); } } } public static IntSumReduce extends Reduce<Text,IntWritable,Text,IntWritable>{ private IntWritable result=new IntWritable(); public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,IntrruptException{ int sum=0; for(IntWritable val:values){ sum+=val.get(); } result.set(sum); context.write(key,result); } } public static void main(String[] args){ Configuration config=new Configuration(); Job job=Job.getInstance(config,"word count"); job.setJarByClass(WordCount.class) .setMapperClass(Tokenizer.class) .setCombinerClass(IntSumReduce.class) .setReducerClass(IntSumReduce.class) .setOutputKeyClass(Text.class) .setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job,new Path(args[0])); FileOutputFormat.addOutputPath(job,new Path(args[1])); System.out.println(job.waitForCompletion(true)?0:1); } }