Hadoop学习三十一：Win7下HBase与MapReduce集成时XXX.jar is not a valid DFS filename

zy19982004

浏览: 654371 次
性别:
来自: 深圳

最近访客更多访客>>

apex53

h416373073

lyvslu

gaoming1990

博主相关

博客

微博

相册

留言

关于我

博客专栏

: Hadoop学习
浏览量：249886

文章分类

社区版块

存档分类

博客分类：

Hadoop

一. 代码

Hbase In Action(HBase实战)和Hbase:The Definitive Guide(HBase权威指南)两本书中，有很多入门级的代码，可以选择自己感兴趣的check out。地址分别为https://github.com/HBaseinaction https://github.com/larsgeorge/hbase-book。
在Win7下运行Hbase与MapReduce集成章节的代码时，出现了错误。比喻这个代码https://github.com/larsgeorge/hbase-book/blob/master/ch07/src/main/java/mapreduce/ParseJson.java

二. 错误

Exception in thread "main" java.lang.IllegalArgumentException: Pathname /D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar from hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar is not a valid DFS filename.
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:184)
	at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:92)
	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
	at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
	at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
	at com.jyz.study.hadoop.hbase.mapreduce.AnalyzeData.main(AnalyzeData.java:249)

三. 跟踪代码

org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil

  public static void addHBaseDependencyJars(Configuration conf) throws IOException {
    addDependencyJars(conf,
      // explicitly pull a class from each module
      org.apache.hadoop.hbase.HConstants.class,                      // hbase-common
      org.apache.hadoop.hbase.protobuf.generated.ClientProtos.class, // hbase-protocol
      org.apache.hadoop.hbase.client.Put.class,                      // hbase-client
      org.apache.hadoop.hbase.CompatibilityFactory.class,            // hbase-hadoop-compat
      org.apache.hadoop.hbase.mapreduce.TableMapper.class,           // hbase-server
      // pull necessary dependencies
      org.apache.zookeeper.ZooKeeper.class,
      org.jboss.netty.channel.ChannelFactory.class,
      com.google.protobuf.Message.class,
      com.google.common.collect.Lists.class,
      org.cloudera.htrace.Trace.class);
  }


public static void addDependencyJars(Configuration conf,
      Class<?>... classes) throws IOException {
      Path path = findOrCreateJar(clazz, localFs, packagedClasses);
    conf.set("tmpjars", StringUtils.arrayToString(jars.toArray(new String[jars.size()])));
  }

此时tmpjars例如

file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-server-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/htrace-core-2.01.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-common-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/guava-12.0.1.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-common-2.2.0.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-protocol-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-hadoop-compat-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/netty-3.6.6.Final.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/protobuf-java-2.5.0.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-mapreduce-client-core-2.2.0.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/zookeeper-3.4.5.jar

JobSubmitter的copyAndConfigureFiles方法

String libjars = conf.get("tmpjars");
if (libjars != null) {
      FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms);
      String[] libjarsArr = libjars.split(",");
      for (String tmpjars: libjarsArr) {
        Path tmp = new Path(tmpjars);
        Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication);
        DistributedCache.addFileToClassPath(
            new Path(newPath.toUri().getPath()), conf);
      }
    }

copyRemoteFiles会copies 这些jar to the jobtracker filesystem and returns the path where itwas copied to。

当集群环境运行时，就会返回

[hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-client-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-server-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/htrace-core-2.01.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-common-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/guava-12.0.1.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hadoop-common-2.2.0.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-protocol-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-hadoop-compat-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/netty-3.6.6.Final.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/protobuf-java-2.5.0.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hadoop-mapreduce-client-core-2.2.0.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/zookeeper-3.4.5.jar]

如果是本地运行时，则返回

[hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-server-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/htrace-core-2.01.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-common-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/guava-12.0.1.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-common-2.2.0.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-protocol-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-hadoop-compat-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/netty-3.6.6.Final.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/protobuf-java-2.5.0.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-mapreduce-client-core-2.2.0.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/zookeeper-3.4.5.jar]

后面会使用Hadoop文件系统检查这两批URL。问题就在这里，它没有区分是本地Window文件系统还是集群Hadoop文件系统，应该区分检查。所以提交到集群运行没问题，本地运行出现上述问题。找个时间去Hadoop Jira上create a issue。

四. 代码能跑下去的解决方法

在TableMapReduceUtil里initTableMapperJob，initTableReducerJob都有大量的重构方法，其中可以指定参数

   * @param addDependencyJars upload HBase jars and jars for any of the configured
   *           job classes via the distributed cache (tmpjars).

也正是因为addDependencyJars默认为true，才触发了上面的错误

if (addDependencyJars) {
      addDependencyJars(job);
    }

所以我们可以将其设置为false。修改https://github.com/larsgeorge/hbase-book/blob/master/ch07/src/main/java/mapreduce/ParseJson.java 代码为

TableMapReduceUtil.initTableMapperJob(input, scan, ParseMapper.class, // co ParseJson-3-SetMap Setup map phase details using the utility method.
      ImmutableBytesWritable.class, Put.class, job, false);
    TableMapReduceUtil.initTableReducerJob(output, // co ParseJson-4-SetReduce Configure an identity reducer to store the parsed data.
      IdentityTableReducer.class, job, null, null, null, null, false);

运行正常，查看结果，testtable data:json的数据划分为 testtable data:column1 data:column2...符合期望。

3
顶

1
踩

分享到：

Hadoop学习三十二：Win7下无法提交MapReduc ... | Hadoop学习三十：Win7 Eclipse调试Centos H ...

2014-03-12 09:48
浏览 13815
评论(1)
分类:企业架构
查看更多

1 楼 houseDaine 2014-05-14

按您的方式改了。还是is not a valid DFS filename

可以加你QQ吗

发表评论

您还没有登录,请您登录后再发表评论