首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在AWS (ElasticMapReduce)中监视Apache?

如何在AWS (ElasticMapReduce)中监视Apache?
EN

Stack Overflow用户
提问于 2019-03-06 15:24:30
回答 2查看 1.3K关注 0票数 4

我现在已经设置了Flink,并在EMR上运行了一个作业,现在我正试图通过将指标发送到prometheus来添加监视。

在EMR上运行Flink遇到了一个问题。我使用Terraform来提供EMR (我在下载和运行作业之后运行ansible )。打开盒子,它看起来不像EMR的Flink发行版包括可选的jars (flink-指标-prometheus,flink-cep等等)。

看看弗林克的文档,上面写着

“要使用此记者,必须将/opt/flink-metrics-prometheus-1.6.1.jar复制到Flink发行版的/lib文件夹”https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/metrics.html#prometheuspushgateway-orgapacheflinkmetricsprometheusprometheuspushgatewayreporter

但是,当登录到EMR主节点时,/etc/flink或/usr/lib/flink都没有一个名为opts的目录,而且我在任何地方都看不到flink-metrics-prometheus-1.6.1.jar

我知道Flink还有其他可选的库,如果您想要使用flink-cep,通常需要复制它们,但是我不知道如何在使用EMR时做到这一点。

这是我得到的例外,我认为这是因为它无法在其类路径中找到度量jar。

代码语言:javascript
复制
java.lang.ClassNotFoundException: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.flink.runtime.metrics.MetricRegistryImpl.<init>(MetricRegistryImpl.java:144)
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createMetricRegistry(ClusterEntrypoint.java:419)
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:276)
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:227)
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:191)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
    at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:190)
    at org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint.main(YarnSessionClusterEntrypoint.java:137)

地形中的EMR资源

代码语言:javascript
复制
  resource "aws_emr_cluster" "emr_flink" {
  name          = "ce-emr-flink-arn"
  release_label = "emr-5.20.0" # 5.21.0 is not found, could be a region thing
  applications  = ["Flink"]

  ec2_attributes {
    key_name                          = "ce_test"
    subnet_id                         = "${aws_subnet.ce_test_subnet_public.id}"
    instance_profile                  = "${aws_iam_instance_profile.emr_profile.arn}"
    emr_managed_master_security_group = "${aws_security_group.allow_all_vpc.id}"
    emr_managed_slave_security_group  = "${aws_security_group.allow_all_vpc.id}"
    additional_master_security_groups  = "${aws_security_group.external_connectivity.id}"
    additional_slave_security_groups  = "${aws_security_group.external_connectivity.id}"
  }

  ebs_root_volume_size = 100
  master_instance_type = "m4.xlarge"
  core_instance_type   = "m4.xlarge"
  core_instance_count  = 2

  service_role = "${aws_iam_role.iam_emr_service_role.arn}"

  configurations_json = <<EOF
[
  {
    "Classification": "flink-conf",
    "Properties": {
        "parallelism.default": "8",
        "state.backend": "RocksDB",
        "state.backend.async": "true",
        "state.backend.incremental": "true",
        "state.savepoints.dir": "file:///savepoints",
        "state.checkpoints.dir": "file:///checkpoints",
        "web.submit.enable": "true",
        "metrics.reporter.promgateway.class": "org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter",
        "metrics.reporter.promgateway.host": "${aws_instance.monitoring.private_ip}",
        "metrics.reporter.promgateway.port": "9091",
        "metrics.reporter.promgateway.jobName": "ce-test",
        "metrics.reporter.promgateway.randomJobNameSuffix": "true",
        "metrics.reporter.promgateway.deleteOnShutdown": "false"
    }
  }
]
EOF
}

我怀疑我可能不得不在引导阶段下载Jar,但我想先检查一下,看看是否有这样做的例子

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-03-06 21:00:01

我还没有使用Terraform,但请注意,您通常需要在EMR中为主从双方提供(设置jars)。找出EMR认为jars应该去哪里的一种方法是,在作业运行时登录到一个从站,执行ps auxwww | grep java,查找TaskManager进程,查看启动时添加到类路径中的jars,并找到这些jars位于服务器上的位置。或者至少在过去对我起过作用。

票数 2
EN

Stack Overflow用户

发布于 2019-06-08 08:32:34

我选择了EMR释放的emr-5.24.0,我用suceed的进水数据库.jar进行监测。

我已经将.jar文件复制到/usr/lib/flink/lib文件夹,并使用以下bash命令重新启动Flink集群(具有sudo权限)。

代码语言:javascript
复制
/usr/lib/flink/bin/stop-cluster.sh && /usr/lib/flink/bin/stop-cluster.sh

我想你可以用普罗米修斯同样的步骤来解决你的问题

代码语言:javascript
复制
[ec2-user@ip-10-0-11-17 ~]$ cd /usr/lib/flink/opt/flink-metrics-
flink-metrics-datadog-1.8.0.jar     flink-metrics-influxdb-1.8.0.jar    flink-metrics-slf4j-1.8.0.jar
flink-metrics-graphite-1.8.0.jar    flink-metrics-prometheus-1.8.0.jar  flink-metrics-statsd-1.8.0.jar


[ec2-user@ip-10-0-11-17 ~]$ ll /usr/lib/flink/opt/flink-metrics-prometheus-1.8.0.jar
-rw-r--r-- 1 root root 101984 may 14 19:21 /usr/lib/flink/opt/flink-metrics-prometheus-1.8.0.jar


[ec2-user@ip-10-0-11-17 ~]$ uname -a
Linux ip-10-0-11-17 4.14.114-83.126.amzn1.x86_64 #1 SMP Tue May 7 02:26:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55026568

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档