我正在一个数据库集群中工作,该集群具有内存的240GB和64个核心。这是我定义的设置。
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import *
import pyspark.sql.functions as fs
from pyspark.sql import SQLContext
from pyspark import SparkContext
from pyspark.sql.functions import count
from pyspark.sql.functions import col, countDistinct
from pyspark import SparkContext
from geospark.utils import GeoSparkKryoRegistrator, KryoSerializer
from geospark.register import upload_jars
from geospark.register import GeoSparkRegistrator
spark.conf.set("spark.sql.shuffle.partitions", 1000)
#Recommended settings for using GeoSpark
spark.conf.set("spark.driver.memory", "20g")
spark.conf.set("spark.network.timeout", "1000s")
spark.conf.set("spark.driver.maxResultSize", "10g")
spark.conf.set("spark.serializer", KryoSerializer.getName)
spark.conf.set("spark.kryo.registrator", GeoSparkKryoRegistrator.getName)
upload_jars()
SparkContext.setSystemProperty("geospark.global.charset","utf8")
spark.conf.set我正在处理大型数据集,这是我在运行几个小时后遇到的错误。
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 10.0 failed 4 times, most recent failure: Lost task 3.3 in stage 10.0 (TID 6054, 10.17.21.12, executor 7):
ExecutorLostFailure (executor 7 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 170684 ms发布于 2021-12-14 08:05:29
让心跳间隔默认(10s),并将网络超时间隔(默认为120 s)增加到300 s(300000 to )并查看。使用集合和获取。
spark.conf.set("spark.sql.<name-of-property>", <value>)
spark.conf.set("spark.network.timeout", 300000 )或者在笔记本上运行这个脚本。
%scala
dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
|#!/bin/bash
|
|cat << 'EOF' > /databricks/driver/conf/00-custom-spark-driver-defaults.conf
|[driver] {
| "spark.network.timeout" = "300000"
|}
|EOF
""".stripMargin, true)发布于 2021-12-14 10:35:15
错误告诉您,由于时间过长,员工已超时。在背景中可能出现了一些瓶颈。检查执行器7、任务3和阶段10的火花用户界面。您还想检查您一直在运行的查询。
您还希望检查这些设置以获得更好的配置:
spark.conf.set("spark.databricks.io.cache.enabled", True) # delta caching
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", True) # adaptive query execution for skewed data
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1) # setting treshhold on broadcasting
spark.conf.set("spark.databricks.optimizer.rangeJoin.binSize", 20) #range optimizer请随时给我们更多关于星火用户界面的信息,我们可以更好地帮助你找到问题的方式。还有,你在做什么样的查询?
发布于 2021-12-14 18:15:16
你能试试以下选项吗?
df.repartition(1000)--conf spark.network.timeout 10000000--conf spark.executor.heartbeatInterval=10000000https://stackoverflow.com/questions/60944923
复制相似问题