文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在云企业集群上使用BigInsights上的spark，使用python > 2.6.6？

问如何在云企业集群上使用BigInsights上的spark，使用python > 2.6.6？
EN

Stack Overflow用户

提问于 2016-12-18 09:04:07

回答 1查看 194关注 0票数 1

使用BigInsights的python版本目前为2.6.6。如果我的火花作业运行在纱线上，我如何使用不同版本的Python？

请注意，云上的BigInsights用户没有根访问权限。

apache-spark

ibm-cloud

biginsights

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-12-18 09:04:07

安装Anaconda

这个脚本在云4.2Enterprise集群上的BigInsights上安装anaconda。请注意，这些指令不适用于基本群集，因为您只能登录到shell节点，而不能登录任何其他节点。

Ssh进入主管理器节点，然后运行(更改环境的值)：

export BI_USER=snowch
export BI_PASS=changeme
export BI_HOST=bi-hadoop-prod-4118.bi.services.us-south.bluemix.net

接下来运行以下命令。该脚本试图尽可能地保持idemopotent，因此如果您多次运行它，就不重要了：

# abort if the script encounters an error or undeclared variables
set -euo

CLUSTER_NAME=$(curl -s -k -u $BI_USER:$BI_PASS  -X GET https://${BI_HOST}:9443/api/v1/clusters | python -c 'import sys, json; print(json.load(sys.stdin)["items"][0]["Clusters"]["cluster_name"]);')
echo Cluster Name: $CLUSTER_NAME

CLUSTER_HOSTS=$(curl -s -k -u $BI_USER:$BI_PASS  -X GET https://${BI_HOST}:9443/api/v1/clusters/${CLUSTER_NAME}/hosts | python -c 'import sys, json; items = json.load(sys.stdin)["items"]; hosts = [ item["Hosts"]["host_name"] for item in items ]; print(" ".join(hosts));')
echo Cluster Hosts: $CLUSTER_HOSTS

wget -c https://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh

# Install anaconda if it isn't already installed
[[ -d anaconda2 ]] || bash Anaconda2-4.1.1-Linux-x86_64.sh -b

# You can install your pip modules using something like this:
# ${HOME}/anaconda2/bin/python -c 'import yourlibrary' || ${HOME}/anaconda2/pip install yourlibrary

# Install anaconda on all of the cluster nodes
for CLUSTER_HOST in ${CLUSTER_HOSTS}; 
do 
   if [[ "$CLUSTER_HOST" != "$BI_HOST" ]];
   then
      echo "*** Processing $CLUSTER_HOST ***"
      ssh $BI_USER@$CLUSTER_HOST "wget -q -c https://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh"
      ssh $BI_USER@$CLUSTER_HOST "[[ -d anaconda2 ]] || bash Anaconda2-4.1.1-Linux-x86_64.sh -b"

      # You can install your pip modules on each node using something like this:
      # ssh $BI_USER@$CLUSTER_HOST "${HOME}/anaconda2/bin/python -c 'import yourlibrary' || ${HOME}/anaconda2/pip install yourlibrary"

      # Set the PYSPARK_PYTHON path on all of the nodes
      ssh $BI_USER@$CLUSTER_HOST "grep '^export PYSPARK_PYTHON=' ~/.bash_profile || echo export PYSPARK_PYTHON=${HOME}/anaconda2/bin/python2.7 >> ~/.bash_profile"
      ssh $BI_USER@$CLUSTER_HOST "sed -i -e 's;^export PYSPARK_PYTHON=.*$;export PYSPARK_PYTHON=${HOME}/anaconda2/bin/python2.7;g' ~/.bash_profile"
      ssh $BI_USER@$CLUSTER_HOST "cat ~/.bash_profile"
   fi
done

echo 'Finished installing'

运行火花放电作业的

如果您使用的是pyspark，您可以使用anaconda python，在运行pyspark命令之前设置以下变量：

export SPARK_HOME=/usr/iop/current/spark-client
export HADOOP_CONF_DIR=/usr/iop/current/hadoop-client/conf

# set these to the folders where you installed anaconda
export PYSPARK_PYTHON=/home/biadmin/anaconda2/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/home/biadmin/anaconda2/bin/python2.7

spark-submit --master yarn --deploy-mode client ...

# NOTE: --deploy-mode cluster does not seem to use the PYSPARK_PYTHON setting
...

齐柏林飞艇(可选)

如果使用齐柏林飞艇(按照这些关于云上BigInsights的说明)，请在zeppelin_env.sh中设置以下变量：

# set these to the folders where you installed anaconda
export PYSPARK_PYTHON=/home/biadmin/anaconda2/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/home/biadmin/anaconda2/bin/python2.7

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/41206889

复制

相似问题

问如何在云企业集群上使用BigInsights上的spark，使用python > 2.6.6？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在云企业集群上使用BigInsights上的spark，使用python > 2.6.6？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在云企业集群上使用BigInsights上的spark，使用python > 2.6.6？
EN