我试图连接和连接一个AWS EMR集群(emr-5.29.0)到一个木星笔记本,我正在我的本地windows机器工作。我已经用Hive 2.3.6,Pig 0.17.0,Hue 4.4.0,Livy 0.6.0,Spark2.4.4启动了一个集群,子网是公开的。我发现这可以用Azure HDInsight来完成,所以我希望使用EMR可以做一些类似的事情。我遇到的问题是在config.json文件中传递正确的值。我应该如何附加电子病历集群?
我可以在AWS原产的EMR笔记本上工作,但我认为我可以走当地的发展路线,并遇到了路障。
{
"kernel_python_credentials" : {
"username": "{IAM ACCESS KEY ID}", # not sure about the username for the cluster
"password": "{IAM SECRET ACCESS KEY}", # I use putty to ssh into the cluster with the pem key, so again not sure about the password for the cluster
"url": "ec2-xx-xxx-x-xxx.us-west-2.compute.amazonaws.com", # as per the AWS blog When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy
"auth": "None"
},
"kernel_scala_credentials" : {
"username": "{IAM ACCESS KEY ID}",
"password": "{IAM SECRET ACCESS KEY}",
"url": "{Master public DNS}",
"auth": "None"
},
"kernel_r_credentials": {
"username": "{}",
"password": "{}",
"url": "{}"
},更新1/4/2021
在4/1,我得到了火花魔术的工作,我的本地jupyter笔记本。使用这些文档作为引用(参考文献1、参2和参3)来设置本地端口转发(如果可能的话,避免使用sudo)。
sudo ssh -i ~/aws-key/my-pem-file.pem -N -L 8998:ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8998 hadoop@ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com配置细节发布标签:EMR-5.32.0 Hadoop发行版:Amazon2.10.1应用程序:Hive2.3.7,Livy 0.7.0,JupyterHub 1.1.0,Spark2.4.7,Zeppelin 0.8.2
更新配置文件
{
"kernel_python_credentials" : {
"username": "",
"password": "",
"url": "http://localhost:8998"
},
"kernel_scala_credentials" : {
"username": "",
"password": "",
"url": "http://localhost:8998",
"auth": "None"
},
"kernel_r_credentials": {
"username": "",
"password": "",
"url": "http://localhost:8998"
},
"logging_config": {
"version": 1,
"formatters": {
"magicsFormatter": {
"format": "%(asctime)s\t%(levelname)s\t%(message)s",
"datefmt": ""
}
},
"handlers": {
"magicsHandler": {
"class": "hdijupyterutils.filehandler.MagicsFileHandler",
"formatter": "magicsFormatter",
"home_path": "~/.sparkmagic"
}
},
"loggers": {
"magicsLogger": {
"handlers": ["magicsHandler"],
"level": "DEBUG",
"propagate": 0
}
}
},
"authenticators": {
"Kerberos": "sparkmagic.auth.kerberos.Kerberos",
"None": "sparkmagic.auth.customauth.Authenticator",
"Basic_Access": "sparkmagic.auth.basic.Basic"
},
"wait_for_idle_timeout_seconds": 15,
"livy_session_startup_timeout_seconds": 60,
"fatal_error_suggestion": "The code failed because of a fatal error:\n\t{}.\n\nSome things to try:\na) Make sure Spark has enough available resources for Jupyter to create a Spark context.\nb) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.\nc) Restart the kernel.",
"ignore_ssl_errors": false,
"session_configs": {
"driverMemory": "1000M",
"executorCores": 2
},
"use_auto_viz": true,
"coerce_dataframe": true,
"max_results_sql": 2500,
"pyspark_dataframe_encoding": "utf-8",
"heartbeat_refresh_seconds": 5,
"livy_server_heartbeat_timeout_seconds": 60,
"heartbeat_retry_seconds": 1,
"server_extension_default_kernel_name": "pysparkkernel",
"custom_headers": {},
"retry_policy": "configurable",
"retry_seconds_to_sleep_list": [0.2, 0.5, 1, 3, 5],
"configurable_retry_policy_max_retries": 8
}第二次更新1/9
回到原点。继续获取此错误,并花费数天时间进行调试。不知道我以前做了什么让事情顺利进行。还检查了我的安全组配置,它看起来很好,端口22上的ssh。
An error was encountered:
Error sending http request and maximum retry encountered.发布于 2021-01-11 11:20:35
在端口8998上创建了一个本地端口转发(ssh隧道)到livy服务器,它的工作原理就像魔术一样。
sudo ssh -i ~/aws-key/my-pem-file.pem -N -L 8998:ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8998 hadoop@ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com没有从1/4更新中更改我的config.json文件
https://stackoverflow.com/questions/65506021
复制相似问题