我正在执行我的HQL查询之一,它只有很少的联接、联合和插入覆盖操作,如果只运行一次,它就能正常工作。
如果我第二次执行相同的任务,我将面临这个问题。有人能帮我确定在哪种情况下我们会得到这个异常吗?
Error: java.lang.RuntimeException: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 107
Serialization trace:
rowSchema (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:364)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:275)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:440)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:433)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 107
Serialization trace:
rowSchema (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)发布于 2015-05-21 19:37:09
通过将下面的属性修改为false来避免Hive的并行执行。
hive.exec.parallel
如果对你有用的话请告诉我。
发布于 2016-07-14 13:49:25
我尝试了set hive.exec.parallel = false;,然后它成功地运行了,尽管运行速度较慢。我的代码是:
SELECT
CASE WHEN a.did IS NOT NULL THEN a.did ELSE b.did END AS device_id,
CASE WHEN a.did IS NOT NULL THEN a.package ELSE b.package END AS package,
CASE WHEN a.did IS NOT NULL THEN a.channel ELSE b.channel END AS channel,
CASE WHEN a.did IS NOT NULL THEN a.time ELSE b.time END AS time
FROM
(SELECT
a1.package,
a1.did,
MIN(a1.source) AS channel,
MIN(a1.time) AS time
FROM
(SELECT * FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('A', 'B', 'C')
) a1
JOIN
(SELECT
package AS package,
did AS did,
MIN(time) AS time
FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('A', 'B', 'C')
GROUP BY package, did
) min
ON (a1.package = min.package
AND a1.did = min.did
AND a1.time = min.time)
GROUP BY a1.package, a1.did
) a
FULL OUTER JOIN
(SELECT
a1.package,
a1.did,
MIN(a1.source) AS channel,
MIN(a1.time) AS time
FROM
(SELECT * FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('D')
) a1
JOIN
(SELECT
package AS package,
did AS did,
MIN(time) AS time
FROM thetable
WHERE date_hour = "20160601"
AND source_type IN ('D')
GROUP BY package, did
) min
ON (a1.package = min.package
AND a1.did = min.did
AND a1.time = min.time)
GROUP BY a1.package, a1.did
) b
ON (a.package = b.package AND a.did = b.did);https://stackoverflow.com/questions/29946841
复制相似问题