`
serisboy
  • 浏览: 169765 次
  • 性别: Icon_minigender_1
  • 来自: 广州
社区版块
存档分类
最新评论

hadoop--mapredduce代码之数据去重

阅读更多
package com.hadoop.sample;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class Dedup {
	//map将输入中的value复制到输出数据的key上,并直接输出
	public static class Map extends Mapper<Object,Text,Text,Text>{
		private static Text line = new Text();
		public void map(Object key,Text value,Context context) throws IOException,InterruptedException{
			line = value;
			context.write(line, new Text(""));
		}
	}
	//reduce将输入中的key复制到输出数据的key上,并直接输出
	public static class Reduce extends Reducer<Text,Text,Text,Text>{
		public void reduce(Text key,Iterable<Text> values,Context context) throws IOException,InterruptedException{
			context.write(key, new Text(""));
			
		}
	}
	/**
	 * @param args
	 */
	public static void main(String[] args) throws Exception{
		// TODO Auto-generated method stub
		Configuration conf = new Configuration();
		String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
		if(otherArgs.length != 2){
			System.err.println("Usage WordCount <int> <out>");
			System.exit(2);
		}
		Job job = new Job(conf,"Dedup");
		job.setJarByClass(Dedup.class);
		job.setMapperClass(Map.class);
		job.setCombinerClass(Reduce.class);
		job.setReducerClass(Reduce.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
		FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}

}
分享到:
评论
1 楼 a331251021 2013-07-31  
前辈。。。。。
47.        job.setCombinerClass(Reducer.class);  
48.        job.setReducerClass(Reducer.class);
这里应该是Reduce.class
才对。刚好最近也在看陆嘉恒hadoop实战

相关推荐

    hadoop-mapreduce-client-jobclient-2.6.5-API文档-中文版.zip

    赠送源代码:hadoop-mapreduce-client-jobclient-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-mapreduce-client-jobclient-2.6.5.pom; 包含翻译后的API文档:hadoop-mapreduce-client-jobclient-2.6.5-...

    hadoop-yarn-client-2.6.5-API文档-中文版.zip

    赠送源代码:hadoop-yarn-client-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-client-2.6.5.pom; 包含翻译后的API文档:hadoop-yarn-client-2.6.5-javadoc-API文档-中文(简体)版.zip; Maven坐标:org...

    hadoop-yarn-common-2.6.5-API文档-中文版.zip

    赠送源代码:hadoop-yarn-common-2.6.5-sources.jar 包含翻译后的API文档:hadoop-yarn-common-2.6.5-javadoc-API文档-中文(简体)版.zip 对应Maven信息:groupId:org.apache.hadoop,artifactId:hadoop-yarn-...

    hadoop最新版本3.1.1全量jar包

    hadoop-annotations-3.1.1.jar hadoop-common-3.1.1.jar hadoop-mapreduce-client-core-3.1.1.jar hadoop-yarn-api-3.1.1.jar hadoop-auth-3.1.1.jar hadoop-hdfs-3.1.1.jar hadoop-mapreduce-client-hs-3.1.1.jar ...

    hadoop-common-2.7.3-API文档-中文版.zip

    赠送源代码:hadoop-common-2.7.3-sources.jar; 赠送Maven依赖信息文件:hadoop-common-2.7.3.pom; 包含翻译后的API文档:hadoop-common-2.7.3-javadoc-API文档-中文(简体)版.zip; Maven坐标:org.apache.hadoop:...

    hadoop-3.3.4 版本(最新版)

    Apache Hadoop (hadoop-3.3.4.tar.gz)项目为可靠、可扩展的分布式计算开发开源软件。官网下载速度非常缓慢,因此将hadoop-3.3.4 版本放在这里,欢迎大家来下载使用! Hadoop 架构是一个开源的、基于 Java 的编程...

    hadoop-eclipse-plugin-2.7.3和2.7.7

    hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包

    好用hadoop-eclipse-plugin-1.2.1

    hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1

    hadoop-2.4.1\share\hadoop\common\hadoop-common-2.4.1.jar

    hadoop-common-2.4.1.jar,是学习基础的Hadoop必须的包

    hadoop-eclipse-plugin-1.2.1.jar有用的

    该资源包里面包含eclipse上的hadoop-1.2.1版本插件的jar包和hadoop-1.2.1.tar.gz,亲测可用~~请在下载完该包后解压,将hadoop-1.2.1放置于Eclipse\plugins目录下,然后重启eclipse,将hadoop-1.2.1.tar.gz放到D:\...

    hadoop-mapreduce-client-common-2.6.5-API文档-中英对照版.zip

    赠送源代码:hadoop-mapreduce-client-common-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-mapreduce-client-common-2.6.5.pom; 包含翻译后的API文档:hadoop-mapreduce-client-common-2.6.5-javadoc-API...

    hadoop-eclipse-plugin三个版本的插件都在这里了。

    hadoop-eclipse-plugin-2.7.4.jar和hadoop-eclipse-plugin-2.7.3.jar还有hadoop-eclipse-plugin-2.6.0.jar的插件都在这打包了,都可以用。

    hadoop-yarn-server-resourcemanager-2.6.0-API文档-中文版.zip

    赠送源代码:hadoop-yarn-server-resourcemanager-2.6.0-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-server-resourcemanager-2.6.0.pom; 包含翻译后的API文档:hadoop-yarn-server-resourcemanager-2.6.0-...

    hadoop-hdfs-client-2.9.1-API文档-中文版.zip

    赠送源代码:hadoop-hdfs-client-2.9.1-sources.jar 包含翻译后的API文档:hadoop-hdfs-client-2.9.1-javadoc-API文档-中文(简体)版.zip 对应Maven信息:groupId:org.apache.hadoop,artifactId:hadoop-hdfs-...

    flink-shaded-hadoop-3下载

    flink-shaded-hadoop-3下载

    hadoop-auth-2.5.1-API文档-中文版.zip

    赠送源代码:hadoop-auth-2.5.1-sources.jar; 赠送Maven依赖信息文件:hadoop-auth-2.5.1.pom; 包含翻译后的API文档:hadoop-auth-2.5.1-javadoc-API文档-中文(简体)版.zip; Maven坐标:org.apache.hadoop:hadoop...

    hadoop-eclipse-plugin-3.1.1.tar.gz

    hadoop-eclipse-plugin-3.1.1, hadoop eclipse 插件 3.1.1

    hadoop-yarn-api-2.5.1-API文档-中文版.zip

    赠送源代码:hadoop-yarn-api-2.5.1-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-api-2.5.1.pom; 包含翻译后的API文档:hadoop-yarn-api-2.5.1-javadoc-API文档-中文(简体)版.zip; Maven坐标:org.apache....

    hadoop-eclipse-plugin-2.9.2

    找不到与hadoop-2.9.2版本对应的插件,手动生成的hadoop-eclipse-plugin-2.9.2版本,

    hadoop-lzo-0.4.20.jar

    hadoop2 lzo 文件 ,编译好的64位 hadoop-lzo-0.4.20.jar 文件 ,在mac 系统下编译的,用法:解压后把hadoop-lzo-0.4.20.jar 放到你的hadoop 安装路径下的lib 下,把里面lib/Mac_OS_X-x86_64-64 下的所有文件 拷到 ...

Global site tag (gtag.js) - Google Analytics