文章/答案/技术大牛

发布

社区首页 >问答首页 >如何让Hadoop with Cascading显示调试日志输出？

问如何让Hadoop with Cascading显示调试日志输出？
EN

Stack Overflow用户

提问于 2012-03-14 03:49:00

回答 2查看 4.6K关注 0票数 1

我很难让Hadoop和Cascading 1.2.6向我展示应该来自使用Debug过滤器的输出。Cascading guide says this is how you can view the current tuples。我正在使用此命令来尝试查看任何调试输出：

Debug debug = new Debug(Debug.Output.STDOUT, true);
debug.setPrintTupleEvery(1);
debug.setPrintFieldsEvery(1);
assembly = new Each( assembly, DebugLevel.VERBOSE, debug );

我对Hadoop和Cascading还很陌生，但可能我没有找到正确的位置，或者我遗漏了一些简单的log4j设置(我没有对Cloudera hadoop-0.20.2-cdh3u3获得的默认设置进行任何更改。

这是我正在使用的WordCount示例类(从cascading user guide复制)，其中添加了调试语句：

package org.cascading.example;

import cascading.flow.Flow;
import cascading.flow.FlowConnector;
import cascading.operation.Aggregator;
import cascading.operation.Debug;
import cascading.operation.DebugLevel;
import cascading.operation.Function;
import cascading.operation.aggregator.Count;
import cascading.operation.regex.RegexGenerator;
import cascading.pipe.Each;
import cascading.pipe.Every;
import cascading.pipe.GroupBy;
import cascading.pipe.Pipe;
import cascading.scheme.Scheme;
import cascading.scheme.TextLine;
import cascading.tap.Hfs;
import cascading.tap.SinkMode;
import cascading.tap.Tap;
import cascading.tuple.Fields;

import java.util.Properties;

public class WordCount {
    public static void main(String[] args) {
        String inputPath = args[0];
        String outputPath = args[1];

        // define source and sink Taps.
        Scheme sourceScheme = new TextLine( new Fields( "line" ) );
        Tap source = new Hfs( sourceScheme, inputPath );

        Scheme sinkScheme = new TextLine( new Fields( "word", "count" ) );
        Tap sink = new Hfs( sinkScheme, outputPath, SinkMode.REPLACE );

        // the 'head' of the pipe assembly
        Pipe assembly = new Pipe( "wordcount" );

        // For each input Tuple
        // using a regular expression
        // parse out each word into a new Tuple with the field name "word"
        String regex = "(?<!\\pL)(?=\\pL)[^ ]*(?<=\\pL)(?!\\pL)";
        Function function = new RegexGenerator( new Fields( "word" ), regex );

        assembly = new Each( assembly, new Fields( "line" ), function );

        Debug debug = new Debug(Debug.Output.STDOUT, true);
        debug.setPrintTupleEvery(1);
        debug.setPrintFieldsEvery(1);
        assembly = new Each( assembly, DebugLevel.VERBOSE, debug );

        // group the Tuple stream by the "word" value
        assembly = new GroupBy( assembly, new Fields( "word" ) );

        // For every Tuple group
        // count the number of occurrences of "word" and store result in
        // a field named "count"
        Aggregator count = new Count( new Fields( "count" ) );
        assembly = new Every( assembly, count );

        // initialize app properties, tell Hadoop which jar file to use
        Properties properties = new Properties();
        FlowConnector.setApplicationJarClass( properties, WordCount.class );

        // plan a new Flow from the assembly using the source and sink Taps
        FlowConnector flowConnector = new FlowConnector();
        FlowConnector.setDebugLevel( properties, DebugLevel.VERBOSE );
        Flow flow = flowConnector.connect( "word-count", source, sink, assembly );

        // execute the flow, block until complete
        flow.complete();

        // Ask Cascading to create a GraphViz DOT file
        // brew install graphviz # install viewer to look at dot file
        flow.writeDOT("build/flow.dot");
    }
}

它工作得很好，我只是在任何地方都找不到任何显示这些单词的调试语句。我已经使用hadoop dfs -ls和jobtracker web ui查看了HDFS文件系统。jobtracker中映射器的日志输出没有任何STDOUT输出：

Task Logs: 'attempt_201203131143_0022_m_000000_0'



stdout logs


stderr logs
2012-03-13 14:32:24.642 java[74752:1903] Unable to load realm info from SCDynamicStore


syslog logs
2012-03-13 14:32:24,786 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
2012-03-13 14:32:25,278 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2012-03-13 14:32:25,617 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2012-03-13 14:32:25,903 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : null
2012-03-13 14:32:25,945 INFO cascading.tap.hadoop.MultiInputSplit: current split input path: hdfs://localhost/usr/tnaleid/shakespeare/input/comedies/cymbeline
2012-03-13 14:32:25,980 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not loaded
2012-03-13 14:32:25,988 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2012-03-13 14:32:26,002 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2012-03-13 14:32:26,246 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720
2012-03-13 14:32:26,247 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
2012-03-13 14:32:27,623 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2012-03-13 14:32:28,274 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0
2012-03-13 14:32:28,310 INFO org.apache.hadoop.mapred.Task: Task:attempt_201203131143_0022_m_000000_0 is done. And is in the process of commiting
2012-03-13 14:32:28,337 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201203131143_0022_m_000000_0' done.
2012-03-13 14:32:28,361 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1

最后，我还写出了DOT文件，其中没有我期望的Debug语句(尽管这些语句可能会被删除)：

有没有我遗漏的一些日志文件，或者是我需要设置的配置设置？

debugging

logging

hadoop

stdout

cascading

回答 2

Stack Overflow用户

发布于 2012-03-14 05:43:41

我从mailing list那里得到了这个问题的答案。

将其更改为如下所示：

assembly = new Each( assembly, new Fields( "line" ), function );

// simpler debug statement
assembly = new Each( assembly, new Debug("hello", true) );

assembly = new GroupBy( assembly, new Fields( "word" ) );

这将在stderr下的jobdetails UI中输出以下内容：

Task Logs: 'attempt_201203131143_0028_m_000000_0'



stdout logs


stderr logs
2012-03-13 16:21:41.304 java[78617:1903] Unable to load realm info from SCDynamicStore
hello: ['word']
hello: ['CYMBELINE']
<SNIP>

我已经直接从文档中尝试过了，但这对我不起作用(尽管我也将FlowConnector debugLevel设置为VERBOSE)：

assembly = new Each( assembly, DebugLevel.VERBOSE, new Debug() );

它似乎与文档中的DebugLevel.VERBOSE有关，因为当我尝试这样做时，我仍然没有得到任何输出：

assembly = new Each( assembly, DebugLevel.VERBOSE, new Debug("hello", true) );

将其更改为删除DebugLevel也会给出输出

assembly = new Each( assembly, new Debug() );

我还可以通过执行以下操作将其切换到stdout：

assembly = new Each( assembly, new Debug(Debug.Output.STDOUT) );

我敢打赌，我仍然对VERBOSE日志级别的东西进行了错误配置，或者1.2.6与文档不再匹配，但至少现在我可以在日志中看到输出。

票数 2

Stack Overflow用户

发布于 2012-08-02 14:15:27

你有没有尝试设置

flow.setDebugLevel( DebugLevel.VERBOSE );

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/9691005

复制

相似问题

问如何让Hadoop with Cascading显示调试日志输出？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何让Hadoop with Cascading显示调试日志输出？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何让Hadoop with Cascading显示调试日志输出？
EN