无论我做什么,我都无法摆脱这个错误。我知道snappy是一个速度很快的库,因此是一个比其他选择更好的压缩/解压缩库。我想使用这个库来进行我的处理。据我所知,谷歌在他们的BigTables,MapReduce (基本上是他们所有的杀手级应用程序)内部都使用了这个。我自己做了研究。人们建议不要使用它,或者选择java-snappy,但我想坚持使用hadoop snappy。我在我的设置中有相应的库。(我指的是在lib下)
有人能修复这个错误吗?我发现,尽管出现此错误,作业仍成功完成。
****hdfs://localhost:54310/user/hduser/gutenberg
12/06/01 18:18:54 INFO input.FileInputFormat: Total input paths to process : 3
12/06/01 18:18:54 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/06/01 18:18:54 WARN snappy.LoadSnappy: Snappy native library not loaded
12/06/01 18:18:54 INFO mapred.JobClient: Running job: job_201206011229_0008
12/06/01 18:18:55 INFO mapred.JobClient: map 0% reduce 0%
12/06/01 18:19:08 INFO mapred.JobClient: map 66% reduce 0%
12/06/01 18:19:14 INFO mapred.JobClient: map 100% reduce 0%
12/06/01 18:19:17 INFO mapred.JobClient: map 100% reduce 22%
12/06/01 18:19:23 INFO mapred.JobClient: map 100% reduce 100%
12/06/01 18:19:28 INFO mapred.JobClient: Job complete: job_201206011229_0008
12/06/01 18:19:28 INFO mapred.JobClient: Counters: 29
12/06/01 18:19:28 INFO mapred.JobClient: Job Counters
12/06/01 18:19:28 INFO mapred.JobClient: Launched reduce tasks=1
12/06/01 18:19:28 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22810
12/06/01 18:19:28 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/01 18:19:28 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/01 18:19:28 INFO mapred.JobClient: Launched map tasks=3
12/06/01 18:19:28 INFO mapred.JobClient: Data-local map tasks=3
12/06/01 18:19:28 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14345
12/06/01 18:19:28 INFO mapred.JobClient: File Output Format Counters
12/06/01 18:19:28 INFO mapred.JobClient: Bytes Written=880838
12/06/01 18:19:28 INFO mapred.JobClient: FileSystemCounters
12/06/01 18:19:28 INFO mapred.JobClient: FILE_BYTES_READ=2214849
12/06/01 18:19:28 INFO mapred.JobClient: HDFS_BYTES_READ=3671878
12/06/01 18:19:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3775339
12/06/01 18:19:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=880838
12/06/01 18:19:28 INFO mapred.JobClient: File Input Format Counters
12/06/01 18:19:28 INFO mapred.JobClient: Bytes Read=3671517
12/06/01 18:19:28 INFO mapred.JobClient: Map-Reduce Framework
12/06/01 18:19:28 INFO mapred.JobClient: Map output materialized bytes=1474341
12/06/01 18:19:28 INFO mapred.JobClient: Map input records=77932
12/06/01 18:19:28 INFO mapred.JobClient: Reduce shuffle bytes=1207328
12/06/01 18:19:28 INFO mapred.JobClient: Spilled Records=255962
12/06/01 18:19:28 INFO mapred.JobClient: Map output bytes=6076095
12/06/01 18:19:28 INFO mapred.JobClient: CPU time spent (ms)=12100
12/06/01 18:19:28 INFO mapred.JobClient: Total committed heap usage (bytes)=516882432
12/06/01 18:19:28 INFO mapred.JobClient: Combine input records=629172
12/06/01 18:19:28 INFO mapred.JobClient: SPLIT_RAW_BYTES=361
12/06/01 18:19:28 INFO mapred.JobClient: Reduce input records=102322
12/06/01 18:19:28 INFO mapred.JobClient: Reduce input groups=82335
12/06/01 18:19:28 INFO mapred.JobClient: Combine output records=102322
12/06/01 18:19:28 INFO mapred.JobClient: Physical memory (bytes) snapshot=605229056
12/06/01 18:19:28 INFO mapred.JobClient: Reduce output records=82335
12/06/01 18:19:28 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2276663296
12/06/01 18:19:28 INFO mapred.JobClient: Map output records=629172附言:目前,我正在处理一个小数据集,其中快速压缩和解压缩实际上并不重要。但是一旦我有了一个工作流程,我就会用大型数据集来加载它。
发布于 2012-06-04 19:18:54
如果snappy的共享库(.so)不在LD_LIBARAY_PATH / java.library.path上,您将看到此错误消息。如果您将库安装在正确的位置,那么您应该不会看到上面的错误消息。
如果你确实将.so安装在hadoop原生库(libhadoop.so)所在的文件夹中,那么上面的“错误”可能与你提交作业的节点有关(就像你说的,你的作业没有错误,这看起来像是客户端上的一条消息)。
您是否可以分享一些作业配置的详细信息(配置输出格式和相关的压缩选项)。
https://stackoverflow.com/questions/10878038
复制相似问题