一. 代码
- Hbase In Action(HBase实战)和Hbase:The Definitive Guide(HBase权威指南)两本书中,有很多入门级的代码,可以选择自己感兴趣的check out。地址分别为https://github.com/HBaseinaction https://github.com/larsgeorge/hbase-book。
- 在Win7下运行Hbase与MapReduce集成章节的代码时,出现了错误。比喻这个代码https://github.com/larsgeorge/hbase-book/blob/master/ch07/src/main/java/mapreduce/ParseJson.java
二. 错误
Exception in thread "main" java.lang.IllegalArgumentException: Pathname /D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar from hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:184) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:92) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286) at com.jyz.study.hadoop.hbase.mapreduce.AnalyzeData.main(AnalyzeData.java:249)
三. 跟踪代码
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil
public static void addHBaseDependencyJars(Configuration conf) throws IOException { addDependencyJars(conf, // explicitly pull a class from each module org.apache.hadoop.hbase.HConstants.class, // hbase-common org.apache.hadoop.hbase.protobuf.generated.ClientProtos.class, // hbase-protocol org.apache.hadoop.hbase.client.Put.class, // hbase-client org.apache.hadoop.hbase.CompatibilityFactory.class, // hbase-hadoop-compat org.apache.hadoop.hbase.mapreduce.TableMapper.class, // hbase-server // pull necessary dependencies org.apache.zookeeper.ZooKeeper.class, org.jboss.netty.channel.ChannelFactory.class, com.google.protobuf.Message.class, com.google.common.collect.Lists.class, org.cloudera.htrace.Trace.class); } public static void addDependencyJars(Configuration conf, Class<?>... classes) throws IOException { Path path = findOrCreateJar(clazz, localFs, packagedClasses); conf.set("tmpjars", StringUtils.arrayToString(jars.toArray(new String[jars.size()]))); }
此时tmpjars例如
file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-server-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/htrace-core-2.01.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-common-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/guava-12.0.1.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-common-2.2.0.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-protocol-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-hadoop-compat-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/netty-3.6.6.Final.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/protobuf-java-2.5.0.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-mapreduce-client-core-2.2.0.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/zookeeper-3.4.5.jar
JobSubmitter的copyAndConfigureFiles方法
String libjars = conf.get("tmpjars"); if (libjars != null) { FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms); String[] libjarsArr = libjars.split(","); for (String tmpjars: libjarsArr) { Path tmp = new Path(tmpjars); Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication); DistributedCache.addFileToClassPath( new Path(newPath.toUri().getPath()), conf); } }
copyRemoteFiles会copies 这些jar to the jobtracker filesystem and returns the path where itwas copied to。
当集群环境运行时,就会返回
[hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-client-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-server-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/htrace-core-2.01.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-common-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/guava-12.0.1.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hadoop-common-2.2.0.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-protocol-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-hadoop-compat-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/netty-3.6.6.Final.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/protobuf-java-2.5.0.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hadoop-mapreduce-client-core-2.2.0.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/zookeeper-3.4.5.jar]
如果是本地运行时,则返回
[hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-server-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/htrace-core-2.01.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-common-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/guava-12.0.1.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-common-2.2.0.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-protocol-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-hadoop-compat-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/netty-3.6.6.Final.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/protobuf-java-2.5.0.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-mapreduce-client-core-2.2.0.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/zookeeper-3.4.5.jar]
后面会使用Hadoop文件系统检查这两批URL。问题就在这里,它没有区分是本地Window文件系统还是集群Hadoop文件系统,应该区分检查。所以提交到集群运行没问题,本地运行出现上述问题。找个时间去Hadoop Jira上create a issue。
四. 代码能跑下去的解决方法
在TableMapReduceUtil里initTableMapperJob,initTableReducerJob都有大量的重构方法,其中可以指定参数
* @param addDependencyJars upload HBase jars and jars for any of the configured * job classes via the distributed cache (tmpjars).
也正是因为addDependencyJars默认为true,才触发了上面的错误
if (addDependencyJars) { addDependencyJars(job); }
所以我们可以将其设置为false。修改https://github.com/larsgeorge/hbase-book/blob/master/ch07/src/main/java/mapreduce/ParseJson.java 代码为
TableMapReduceUtil.initTableMapperJob(input, scan, ParseMapper.class, // co ParseJson-3-SetMap Setup map phase details using the utility method. ImmutableBytesWritable.class, Put.class, job, false); TableMapReduceUtil.initTableReducerJob(output, // co ParseJson-4-SetReduce Configure an identity reducer to store the parsed data. IdentityTableReducer.class, job, null, null, null, null, false);
运行正常,查看结果,testtable data:json的数据划分为 testtable data:column1 data:column2...符合期望。
相关推荐
hadoop-mapreduce-examples-2.6.5.jar 官方案例源码
hadoop-mapreduce-examples-2.7.1.jar
hadoop-2.7.2-hbase-jar.tar.gz hadoop-2.7.2-hbase-jar.tar.gz hadoop-2.7.2-hbase-jar.tar.gz
hadoop-annotations-3.1.1.jar hadoop-common-3.1.1.jar hadoop-mapreduce-client-core-3.1.1.jar hadoop-yarn-api-3.1.1.jar hadoop-auth-3.1.1.jar hadoop-hdfs-3.1.1.jar hadoop-mapreduce-client-hs-3.1.1.jar ...
被编译的hive-hbase-handler-1.2.1.jar,用于在Hive中创建关联HBase表的jar,解决创建Hive关联HBase时报FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop....
hbase-0.90.5.tar.gz与hadoop0.20.2版本匹配,我在我本地虚拟机已经安装成功可以使用。请放心下载!!!
A.3实验三:熟悉常用的HBase操作 本实验对应第5章的内容。 A.3.1 实验目的 (1)理解HBase在Hadoop体系结构中的角色。(2)熟练使用HBase操作常用的 Shell命令。(3)熟悉HBase操作常用的 Java API。 A.3.2 实验平台 (1...
HBase(hbase-2.4.9-bin.tar.gz)是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System...
Hadoop开发基础 : Google三大论文: MapReduce超大机群上的简单数据处理.doc Hadoop开发基础 : Google三大论文: MapReduce超大机群上的简单数据处理.doc Hadoop开发基础 : Google三大论文: MapReduce超大机群上的简单...
hadoop1.1.2操作例子 包括hbase hive mapreduce相应的jar包
【SpringBoot】Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster报错明细问题解决后记 报错明细 IDEA SpringBoot集成hadoop运行环境,,本地启动项目,GET请求接口触发...
赠送jar包:hadoop-mapreduce-client-jobclient-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.6.5-sources.jar; 赠送...
hadoop中的demo,wordcount列子用到的JAR包 用法: # 在容器里运行WordCount程序,该程序需要2个参数...hadoop jar hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount input output
Flink-1.11.2与Hadoop3集成JAR包,放到flink安装包的lib目录下,可以避免Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.这个报错,实现...
8、短评:HDFS、MapReduce和HBase三者相辅相成、各有长处 ..... - 34 - 9、HDFS在web开发中的应用................................. - 35 - 10、Mapreduce中value集合的二次排序 ....................... - 38 - 11...
赠送jar包:hbase-hadoop2-compat-1.2.12.jar; 赠送原API文档:hbase-hadoop2-compat-1.2.12-javadoc.jar; 赠送源代码:hbase-hadoop2-compat-1.2.12-sources.jar; 赠送Maven依赖信息文件:hbase-hadoop2-compat-...
赠送jar包:hbase-hadoop2-compat-1.1.3.jar; 赠送原API文档:hbase-hadoop2-compat-1.1.3-javadoc.jar; 赠送源代码:hbase-hadoop2-compat-1.1.3-sources.jar; 赠送Maven依赖信息文件:hbase-hadoop2-compat-...
hadoop和hbase集成所需jar包。例如使用hbase进行MapReduce。 需要更多资源请关注我。
本文将HBase-2.2.1安装在Hadoop-3.1.2上,关于Hadoop-3.1.2的安装,请参见《基于zookeeper-3.5.5安装hadoop-3.1.2》一文。安装环境为64位CentOS-Linux 7.2版本。 本文将在HBase官方提供的quickstart.html文件的指导...
一、实验目的 (1)熟悉Hadoop开发包 (2)编写MepReduce程序 (3)调试和运行MepReduce程序 (4)完成上课老师演示的内容 二、实验环境 Windows 10 VMware Workstation Pro虚拟机 Hadoop环境 Jdk1.8 二、实验内容 1...