文章/答案/技术大牛

发布

社区首页 >问答首页 >如何为Spark 2.1作业指标集成Ganglia，Spark忽略Ganglia指标

问如何为Spark 2.1作业指标集成Ganglia，Spark忽略Ganglia指标
EN

Stack Overflow用户

提问于 2017-07-26 19:57:57

回答 4查看 1.8K关注 0票数 1

我正在尝试将Spark 2.1作业的指标集成到Ganglia中。

我的spark-default.conf看起来像

*.sink.ganglia.class org.apache.spark.metrics.sink.GangliaSink
*.sink.ganglia.name Name
*.sink.ganglia.host $MASTERIP
*.sink.ganglia.port $PORT

*.sink.ganglia.mode unicast
*.sink.ganglia.period 10
*.sink.ganglia.unit seconds

当我提交作业时，我可以看到警告

Warning: Ignoring non-spark config property: *.sink.ganglia.host=host
Warning: Ignoring non-spark config property: *.sink.ganglia.name=Name
Warning: Ignoring non-spark config property: *.sink.ganglia.mode=unicast
Warning: Ignoring non-spark config property: *.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink
Warning: Ignoring non-spark config property: *.sink.ganglia.period=10
Warning: Ignoring non-spark config property: *.sink.ganglia.port=8649
Warning: Ignoring non-spark config property: *.sink.ganglia.unit=seconds

我的环境详细信息是

Hadoop : Amazon 2.7.3 - emr-5.7.0  
Spark  : Spark 2.1.1, 
Ganglia: 3.7.2

如果您有任何输入或Ganglia的其他替代方案，请回复。

apache-spark

spark-streaming

emr

amazon-emr

ganglia

回答 4

Stack Overflow用户

发布于 2018-03-01 03:13:24

根据spark docs的说法

指标系统通过配置文件进行配置，Spark希望该配置文件出现在$SPARK_HOME/conf/metrics.properties中。可以通过spark.metrics.conf配置属性指定自定义文件位置。

因此，不是将这些confs放在spark-default.conf中，而是将它们移到$SPARK_HOME/conf/metrics.properties中

票数 1

Stack Overflow用户

发布于 2018-04-27 07:05:57

特别是对于电子病历，您需要将这些设置放在主节点上的/etc/spark/conf/metrics.properties中。

Spark on EMR确实包含了Ganglia库：

$ ls -l /usr/lib/spark/external/lib/spark-ganglia-lgpl_*
-rw-r--r-- 1 root root 28376 Mar 22 00:43 /usr/lib/spark/external/lib/spark-ganglia-lgpl_2.11-2.3.0.jar

此外，您的示例在配置名称和值之间缺少等号(=) -不确定这是不是一个问题。下面是一个为我成功工作的示例配置。

*.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink
*.sink.ganglia.name=AMZN-EMR
*.sink.ganglia.host=$MASTERIP
*.sink.ganglia.port=8649

*.sink.ganglia.mode=unicast
*.sink.ganglia.period=10
*.sink.ganglia.unit=seconds

票数 1

Stack Overflow用户

发布于 2017-07-27 05:02:36

在此页面中：https://spark.apache.org/docs/latest/monitoring.html

Spark also supports a Ganglia sink which is not included in the default build due to licensing restrictions:

GangliaSink: Sends metrics to a Ganglia node or multicast group.
**To install the GangliaSink you’ll need to perform a custom build of Spark**. Note that by embedding this library you will include LGPL-licensed code in your Spark package. For sbt users, set the SPARK_GANGLIA_LGPL environment variable before building. For Maven users, enable the -Pspark-ganglia-lgpl profile. In addition to modifying the cluster’s Spark build user

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45326305

复制

相似问题

问如何为Spark 2.1作业指标集成Ganglia，Spark忽略Ganglia指标
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何为Spark 2.1作业指标集成Ganglia，Spark忽略Ganglia指标EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何为Spark 2.1作业指标集成Ganglia，Spark忽略Ganglia指标
EN