首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用火花魔术将aws emr群集附加到远程jupyter笔记本上

使用火花魔术将aws emr群集附加到远程jupyter笔记本上
EN

Stack Overflow用户
提问于 2020-12-30 11:10:41
回答 1查看 1.3K关注 0票数 2

我试图连接和连接一个AWS EMR集群(emr-5.29.0)到一个木星笔记本,我正在我的本地windows机器工作。我已经用Hive 2.3.6,Pig 0.17.0,Hue 4.4.0,Livy 0.6.0,Spark2.4.4启动了一个集群,子网是公开的。我发现这可以用Azure HDInsight来完成,所以我希望使用EMR可以做一些类似的事情。我遇到的问题是在config.json文件中传递正确的值。我应该如何附加电子病历集群?

我可以在AWS原产的EMR笔记本上工作,但我认为我可以走当地的发展路线,并遇到了路障。

代码语言:javascript
复制
{
    "kernel_python_credentials" : {
      "username": "{IAM ACCESS KEY ID}", # not sure about the username for the cluster
      "password": "{IAM SECRET ACCESS KEY}", # I use putty to ssh into the cluster with the pem key, so again not sure about the password for the cluster
      "url": "ec2-xx-xxx-x-xxx.us-west-2.compute.amazonaws.com", # as per the AWS blog When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy
      "auth": "None"
    },
  
    "kernel_scala_credentials" : {
      "username": "{IAM ACCESS KEY ID}",
      "password": "{IAM SECRET ACCESS KEY}",
      "url": "{Master public DNS}",
      "auth": "None"
    },
    "kernel_r_credentials": {
      "username": "{}",
      "password": "{}",
      "url": "{}"
    },

更新1/4/2021

在4/1,我得到了火花魔术的工作,我的本地jupyter笔记本。使用这些文档作为引用(参考文献1参2参3)来设置本地端口转发(如果可能的话,避免使用sudo)。

代码语言:javascript
复制
 sudo ssh -i ~/aws-key/my-pem-file.pem -N -L 8998:ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8998 hadoop@ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com

配置细节发布标签:EMR-5.32.0 Hadoop发行版:Amazon2.10.1应用程序:Hive2.3.7,Livy 0.7.0,JupyterHub 1.1.0,Spark2.4.7,Zeppelin 0.8.2

更新配置文件

代码语言:javascript
复制
{
    "kernel_python_credentials" : {
      "username": "",
      "password": "",
      "url": "http://localhost:8998"
    },
  
    "kernel_scala_credentials" : {
      "username": "",
      "password": "",
      "url": "http://localhost:8998",
      "auth": "None"
    },
    "kernel_r_credentials": {
      "username": "",
      "password": "",
      "url": "http://localhost:8998"
    },
  
    "logging_config": {
      "version": 1,
      "formatters": {
        "magicsFormatter": { 
          "format": "%(asctime)s\t%(levelname)s\t%(message)s",
          "datefmt": ""
        }
      },
      "handlers": {
        "magicsHandler": { 
          "class": "hdijupyterutils.filehandler.MagicsFileHandler",
          "formatter": "magicsFormatter",
          "home_path": "~/.sparkmagic"
        }
      },
      "loggers": {
        "magicsLogger": { 
          "handlers": ["magicsHandler"],
          "level": "DEBUG",
          "propagate": 0
        }
      }
    },
    "authenticators": {
      "Kerberos": "sparkmagic.auth.kerberos.Kerberos",
      "None": "sparkmagic.auth.customauth.Authenticator", 
      "Basic_Access": "sparkmagic.auth.basic.Basic"
    },
  
    "wait_for_idle_timeout_seconds": 15,
    "livy_session_startup_timeout_seconds": 60,
  
    "fatal_error_suggestion": "The code failed because of a fatal error:\n\t{}.\n\nSome things to try:\na) Make sure Spark has enough available resources for Jupyter to create a Spark context.\nb) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.\nc) Restart the kernel.",
  
    "ignore_ssl_errors": false,
  
    "session_configs": {
      "driverMemory": "1000M",
      "executorCores": 2
    },
  
    "use_auto_viz": true,
    "coerce_dataframe": true,
    "max_results_sql": 2500,
    "pyspark_dataframe_encoding": "utf-8",
    
    "heartbeat_refresh_seconds": 5,
    "livy_server_heartbeat_timeout_seconds": 60,
    "heartbeat_retry_seconds": 1,
  
    "server_extension_default_kernel_name": "pysparkkernel",
    "custom_headers": {},
    
    "retry_policy": "configurable",
    "retry_seconds_to_sleep_list": [0.2, 0.5, 1, 3, 5],
    "configurable_retry_policy_max_retries": 8
  }

第二次更新1/9

回到原点。继续获取此错误,并花费数天时间进行调试。不知道我以前做了什么让事情顺利进行。还检查了我的安全组配置,它看起来很好,端口22上的ssh。

代码语言:javascript
复制
An error was encountered:
Error sending http request and maximum retry encountered.
EN

回答 1

Stack Overflow用户

发布于 2021-01-11 11:20:35

在端口8998上创建了一个本地端口转发(ssh隧道)到livy服务器,它的工作原理就像魔术一样。

代码语言:javascript
复制
sudo ssh -i ~/aws-key/my-pem-file.pem -N -L 8998:ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8998 hadoop@ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com

没有从1/4更新中更改我的config.json文件

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65506021

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档