package com.hadoop.sample;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class Dedup {
//map将输入中的value复制到输出数据的key上,并直接输出
public static class Map extends Mapper<Object,Text,Text,Text>{
private static Text line = new Text();
public void map(Object key,Text value,Context context) throws IOException,InterruptedException{
line = value;
context.write(line, new Text(""));
}
}
//reduce将输入中的key复制到输出数据的key上,并直接输出
public static class Reduce extends Reducer<Text,Text,Text,Text>{
public void reduce(Text key,Iterable<Text> values,Context context) throws IOException,InterruptedException{
context.write(key, new Text(""));
}
}
/**
* @param args
*/
public static void main(String[] args) throws Exception{
// TODO Auto-generated method stub
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
if(otherArgs.length != 2){
System.err.println("Usage WordCount <int> <out>");
System.exit(2);
}
Job job = new Job(conf,"Dedup");
job.setJarByClass(Dedup.class);
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
分享到:
相关推荐
赠送源代码:hadoop-mapreduce-client-jobclient-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-mapreduce-client-jobclient-2.6.5.pom; 包含翻译后的API文档:hadoop-mapreduce-client-jobclient-2.6.5-...
赠送源代码:hadoop-yarn-client-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-client-2.6.5.pom; 包含翻译后的API文档:hadoop-yarn-client-2.6.5-javadoc-API文档-中文(简体)版.zip; Maven坐标:org...
赠送源代码:hadoop-yarn-common-2.6.5-sources.jar 包含翻译后的API文档:hadoop-yarn-common-2.6.5-javadoc-API文档-中文(简体)版.zip 对应Maven信息:groupId:org.apache.hadoop,artifactId:hadoop-yarn-...
hadoop-annotations-3.1.1.jar hadoop-common-3.1.1.jar hadoop-mapreduce-client-core-3.1.1.jar hadoop-yarn-api-3.1.1.jar hadoop-auth-3.1.1.jar hadoop-hdfs-3.1.1.jar hadoop-mapreduce-client-hs-3.1.1.jar ...
赠送源代码:hadoop-common-2.7.3-sources.jar; 赠送Maven依赖信息文件:hadoop-common-2.7.3.pom; 包含翻译后的API文档:hadoop-common-2.7.3-javadoc-API文档-中文(简体)版.zip; Maven坐标:org.apache.hadoop:...
Apache Hadoop (hadoop-3.3.4.tar.gz)项目为可靠、可扩展的分布式计算开发开源软件。官网下载速度非常缓慢,因此将hadoop-3.3.4 版本放在这里,欢迎大家来下载使用! Hadoop 架构是一个开源的、基于 Java 的编程...
hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包
hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1
hadoop-common-2.4.1.jar,是学习基础的Hadoop必须的包
该资源包里面包含eclipse上的hadoop-1.2.1版本插件的jar包和hadoop-1.2.1.tar.gz,亲测可用~~请在下载完该包后解压,将hadoop-1.2.1放置于Eclipse\plugins目录下,然后重启eclipse,将hadoop-1.2.1.tar.gz放到D:\...
赠送源代码:hadoop-mapreduce-client-common-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-mapreduce-client-common-2.6.5.pom; 包含翻译后的API文档:hadoop-mapreduce-client-common-2.6.5-javadoc-API...
hadoop-eclipse-plugin-2.7.4.jar和hadoop-eclipse-plugin-2.7.3.jar还有hadoop-eclipse-plugin-2.6.0.jar的插件都在这打包了,都可以用。
赠送源代码:hadoop-yarn-server-resourcemanager-2.6.0-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-server-resourcemanager-2.6.0.pom; 包含翻译后的API文档:hadoop-yarn-server-resourcemanager-2.6.0-...
赠送源代码:hadoop-hdfs-client-2.9.1-sources.jar 包含翻译后的API文档:hadoop-hdfs-client-2.9.1-javadoc-API文档-中文(简体)版.zip 对应Maven信息:groupId:org.apache.hadoop,artifactId:hadoop-hdfs-...
flink-shaded-hadoop-3下载
赠送源代码:hadoop-auth-2.5.1-sources.jar; 赠送Maven依赖信息文件:hadoop-auth-2.5.1.pom; 包含翻译后的API文档:hadoop-auth-2.5.1-javadoc-API文档-中文(简体)版.zip; Maven坐标:org.apache.hadoop:hadoop...
hadoop-eclipse-plugin-3.1.1, hadoop eclipse 插件 3.1.1
赠送源代码:hadoop-yarn-api-2.5.1-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-api-2.5.1.pom; 包含翻译后的API文档:hadoop-yarn-api-2.5.1-javadoc-API文档-中文(简体)版.zip; Maven坐标:org.apache....
找不到与hadoop-2.9.2版本对应的插件,手动生成的hadoop-eclipse-plugin-2.9.2版本,
hadoop2 lzo 文件 ,编译好的64位 hadoop-lzo-0.4.20.jar 文件 ,在mac 系统下编译的,用法:解压后把hadoop-lzo-0.4.20.jar 放到你的hadoop 安装路径下的lib 下,把里面lib/Mac_OS_X-x86_64-64 下的所有文件 拷到 ...