首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >reduce步骤似乎停留在99%

reduce步骤似乎停留在99%
EN

Stack Overflow用户
提问于 2015-09-10 21:54:29
回答 1查看 2.1K关注 0票数 2

我正在尝试运行一个连接到其他5个表的配置单元查询。其中一个表非常大(有150亿条记录),但由于其中一个join子句,我实际上只从该表中查找了800万条记录。

我一直在cloudera的日志控制台中看到这样的情况...

代码语言:javascript
复制
INFO  : 2015-09-10 09:51:43,209 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 437512.26 sec

我读到过桌子倾斜,然后一个减速器成了瓶颈,但我不知道如何检查桌子的倾斜程度,如果是这样的话。这会是问题所在吗?

编辑:

下面是查询中的解释计划,表c是大表...

代码语言:javascript
复制
STAGE DEPENDENCIES:
  Stage-8 is a root stage , consists of Stage-1
  Stage-1
  Stage-9 depends on stages: Stage-1
  Stage-6 depends on stages: Stage-9
  Stage-0 depends on stages: Stage-6

STAGE PLANS:
  Stage: Stage-8
    Conditional Operator

  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: a
            Statistics: Num rows: 8939332 Data size: 53635992 Basic stats: COMPLETE Column stats: NONE
            Reduce Output Operator
              key expressions: event_type_id (type: int)
              sort order: +
              Map-reduce partition columns: event_type_id (type: int)
              Statistics: Num rows: 8939332 Data size: 53635992 Basic stats: COMPLETE Column stats: NONE
              value expressions: marketing_contact_id (type: int), ent_cust_id (type: string), campaign_master_id (type: string), event_id (type: string), event_timestamp (type: string)
          TableScan
            alias: b
            Statistics: Num rows: 6 Data size: 2750 Basic stats: COMPLETE Column stats: NONE
            Reduce Output Operator
              key expressions: event_type_id (type: int)
              sort order: +
              Map-reduce partition columns: event_type_id (type: int)
              Statistics: Num rows: 6 Data size: 2750 Basic stats: COMPLETE Column stats: NONE
              value expressions: event_type (type: string), event_type_category (type: string), event_type_subcategory (type: string), description (type: string)
          TableScan
            alias: c
            filterExpr: (loaddate > 20150401) (type: boolean)
            Statistics: Num rows: 4415906479 Data size: 185130335983 Basic stats: COMPLETE Column stats: NONE
            Reduce Output Operator
              key expressions: event_type_id (type: int)
              sort order: +
              Map-reduce partition columns: event_type_id (type: int)
              Statistics: Num rows: 4415906479 Data size: 185130335983 Basic stats: COMPLETE Column stats: NONE
              value expressions: display_event_id (type: string), category (type: string), time (type: string), user_id (type: string), ip (type: string), buy_id (type: double), ad_id (type: string), creative_id (type: string), creative_version (type: double), creative_size_id (type: string), site_id (type: string), page_id (type: string), keyword (type: string), country_id (type: double), state_province (type: string), browser_id (type: double), browser_version (type: double), os_id (type: double), local_user_id (type: string), sv1 (type: string), browser_type (type: string), country (type: string), os_type (type: string), state_province_name (type: string)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
               Inner Join 0 to 2
          keys:
            0 event_type_id (type: int)
            1 event_type_id (type: int)
            2 event_type_id (type: int)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col11, _col12, _col13, _col14, _col22, _col23, _col24, _col25, _col26, _col27, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col41, _col42, _col43, _col48, _col58, _col59, _col60, _col61, _col62
          Statistics: Num rows: 9714994464 Data size: 407286747990 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-9
    Map Reduce Local Work
      Alias -> Map Local Tables:
        e 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        e 
          TableScan
            alias: e
            Statistics: Num rows: 7080 Data size: 3695175 Basic stats: PARTIAL Column stats: NONE
            HashTable Sink Operator
              keys:
                0 _col30 (type: string)
                1 ad_id (type: string)

  Stage: Stage-6
    Map Reduce
      Map Operator Tree:
          TableScan
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 _col30 (type: string)
                1 ad_id (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col11, _col12, _col13, _col14, _col22, _col23, _col24, _col25, _col26, _col27, _col29, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col41, _col42, _col43, _col48, _col58, _col59, _col60, _col61, _col62, _col67, _col68, _col69, _col70, _col72, _col73, _col74
              Statistics: Num rows: 10686494142 Data size: 448015432499 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col41 (type: double), _col59 (type: string), _col42 (type: double), _col29 (type: double), _col23 (type: string), _col60 (type: string), _col37 (type: double), _col31 (type: string), _col33 (type: string), _col32 (type: double), _col22 (type: string), _col24 (type: int), _col27 (type: string), _col36 (type: string), _col48 (type: string), _col43 (type: double), _col61 (type: string), _col35 (type: string), _col34 (type: string), _col38 (type: string), _col62 (type: string), _col58 (type: string), _col25 (type: string), _col26 (type: string), _col67 (type: int), _col68 (type: int), _col69 (type: string), _col70 (type: string), _col72 (type: string), _col73 (type: string), _col74 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col39, _col40
                Statistics: Num rows: 10686494142 Data size: 448015432499 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 10686494142 Data size: 448015432499 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
EN

回答 1

Stack Overflow用户

发布于 2015-09-10 22:14:51

如果您有倾斜的表,请尝试使用此属性: hive.optimize.skewjoin=true

此外,您是否可以粘贴作业日志,这将使问题更加清晰

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/32503845

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档