文章/答案/技术大牛

发布

社区首页 >问答首页 >使用火花魔术将aws emr群集附加到远程jupyter笔记本上

问使用火花魔术将aws emr群集附加到远程jupyter笔记本上
EN

Stack Overflow用户

提问于 2020-12-30 11:10:41

回答 1查看 1.3K关注 0票数 2

我试图连接和连接一个AWS EMR集群(emr-5.29.0)到一个木星笔记本，我正在我的本地windows机器工作。我已经用Hive 2.3.6，Pig 0.17.0，Hue 4.4.0，Livy 0.6.0，Spark2.4.4启动了一个集群，子网是公开的。我发现这可以用Azure HDInsight来完成，所以我希望使用EMR可以做一些类似的事情。我遇到的问题是在config.json文件中传递正确的值。我应该如何附加电子病历集群？

我可以在AWS原产的EMR笔记本上工作，但我认为我可以走当地的发展路线，并遇到了路障。

{
    "kernel_python_credentials" : {
      "username": "{IAM ACCESS KEY ID}", # not sure about the username for the cluster
      "password": "{IAM SECRET ACCESS KEY}", # I use putty to ssh into the cluster with the pem key, so again not sure about the password for the cluster
      "url": "ec2-xx-xxx-x-xxx.us-west-2.compute.amazonaws.com", # as per the AWS blog When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy
      "auth": "None"
    },
  
    "kernel_scala_credentials" : {
      "username": "{IAM ACCESS KEY ID}",
      "password": "{IAM SECRET ACCESS KEY}",
      "url": "{Master public DNS}",
      "auth": "None"
    },
    "kernel_r_credentials": {
      "username": "{}",
      "password": "{}",
      "url": "{}"
    },

更新1/4/2021

在4/1，我得到了火花魔术的工作，我的本地jupyter笔记本。使用这些文档作为引用(参考文献1、参2和参3)来设置本地端口转发(如果可能的话，避免使用sudo)。

 sudo ssh -i ~/aws-key/my-pem-file.pem -N -L 8998:ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8998 hadoop@ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com

配置细节发布标签:EMR-5.32.0 Hadoop发行版:Amazon2.10.1应用程序:Hive2.3.7，Livy 0.7.0，JupyterHub 1.1.0，Spark2.4.7，Zeppelin 0.8.2

更新配置文件

{
    "kernel_python_credentials" : {
      "username": "",
      "password": "",
      "url": "http://localhost:8998"
    },
  
    "kernel_scala_credentials" : {
      "username": "",
      "password": "",
      "url": "http://localhost:8998",
      "auth": "None"
    },
    "kernel_r_credentials": {
      "username": "",
      "password": "",
      "url": "http://localhost:8998"
    },
  
    "logging_config": {
      "version": 1,
      "formatters": {
        "magicsFormatter": { 
          "format": "%(asctime)s\t%(levelname)s\t%(message)s",
          "datefmt": ""
        }
      },
      "handlers": {
        "magicsHandler": { 
          "class": "hdijupyterutils.filehandler.MagicsFileHandler",
          "formatter": "magicsFormatter",
          "home_path": "~/.sparkmagic"
        }
      },
      "loggers": {
        "magicsLogger": { 
          "handlers": ["magicsHandler"],
          "level": "DEBUG",
          "propagate": 0
        }
      }
    },
    "authenticators": {
      "Kerberos": "sparkmagic.auth.kerberos.Kerberos",
      "None": "sparkmagic.auth.customauth.Authenticator", 
      "Basic_Access": "sparkmagic.auth.basic.Basic"
    },
  
    "wait_for_idle_timeout_seconds": 15,
    "livy_session_startup_timeout_seconds": 60,
  
    "fatal_error_suggestion": "The code failed because of a fatal error:\n\t{}.\n\nSome things to try:\na) Make sure Spark has enough available resources for Jupyter to create a Spark context.\nb) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.\nc) Restart the kernel.",
  
    "ignore_ssl_errors": false,
  
    "session_configs": {
      "driverMemory": "1000M",
      "executorCores": 2
    },
  
    "use_auto_viz": true,
    "coerce_dataframe": true,
    "max_results_sql": 2500,
    "pyspark_dataframe_encoding": "utf-8",
    
    "heartbeat_refresh_seconds": 5,
    "livy_server_heartbeat_timeout_seconds": 60,
    "heartbeat_retry_seconds": 1,
  
    "server_extension_default_kernel_name": "pysparkkernel",
    "custom_headers": {},
    
    "retry_policy": "configurable",
    "retry_seconds_to_sleep_list": [0.2, 0.5, 1, 3, 5],
    "configurable_retry_policy_max_retries": 8
  }

第二次更新1/9

回到原点。继续获取此错误，并花费数天时间进行调试。不知道我以前做了什么让事情顺利进行。还检查了我的安全组配置，它看起来很好，端口22上的ssh。

An error was encountered:
Error sending http request and maximum retry encountered.

python-3.x

apache-spark

pyspark

amazon-emr

jupyter-lab

回答 1

Stack Overflow用户

发布于 2021-01-11 11:20:35

在端口8998上创建了一个本地端口转发(ssh隧道)到livy服务器，它的工作原理就像魔术一样。

sudo ssh -i ~/aws-key/my-pem-file.pem -N -L 8998:ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8998 hadoop@ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com

没有从1/4更新中更改我的config.json文件

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65506021

复制

相似问题

问使用火花魔术将aws emr群集附加到远程jupyter笔记本上
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用火花魔术将aws emr群集附加到远程jupyter笔记本上EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用火花魔术将aws emr群集附加到远程jupyter笔记本上
EN