首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >安装AWS Glue ETL库

安装AWS Glue ETL库
EN

Stack Overflow用户
提问于 2020-04-05 23:15:01
回答 1查看 1.4K关注 0票数 1

问题

设置AWS Glue Library后,我将面临以下错误:

代码语言:javascript
复制
PS C:\Users\[user]\Documents\[company]\projects\code\data-lake\etl\tealium> python visitor.py
20/04/05 19:33:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "visitor.py", line 9, in <module>
    glueContext = GlueContext(sc.getOrCreate())
  File "C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\PyGlue.zip\awsglue\context.py", line 45, in __init__
  File "C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\PyGlue.zip\awsglue\context.py", line 66, in _get_glue_scala_context
TypeError: 'JavaPackage' object is not callable

场景

我试图使用PIPENV在虚拟环境中安装AWS GLue ETL库。因此,下面的.env文件包含了环境变量:

代码语言:javascript
复制
HADOOP_HOME="C:\Users\[user]\AppData\Local\Spark\winutils"
SPARK_HOME="C:\Users\[user]\AppData\Local\Spark\spark-2.4.3-bin-hadoop2.8\spark-2.4.3-bin-hadoop2.8\spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8"
JAVA_HOME="C:\Program Files\Java\jdk1.8.0_231"
PATH="${HADOOP_HOME}\bin"
PATH="${SPARK_HOME}\bin:${PATH}"
PATH="${JAVA_HOME}\bin:${PATH}"
SPARK_CONF_DIR="C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\conf"
PYTHONPATH="${SPARK_HOME}/python/:${PYTHONPATH}"
PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.7-src.zip:${PYTHONPATH}"
PYTHONPATH="C:/Users/[user]/Documents/[company]/projects/code/aws-glue-libs-glue-1.0/PyGlue.zip:${PYTHONPATH}" 

我的代码最初非常简单,我只创建Glue上下文,如下所示:

代码语言:javascript
复制
from awsglue.context import GlueContext
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from pyspark.conf import SparkConf

sc = SparkContext()

glueContext = GlueContext(sc.getOrCreate())

print(glueContext)
print(sc)

你们知道这可能是什么问题吗?

EN

回答 1

Stack Overflow用户

发布于 2022-07-21 18:06:19

试一试:

代码语言:javascript
复制
import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

sc = SparkContext.getOrCreate()
sc.setLogLevel('INFO')
glueContext = GlueContext(sc)
logger = glueContext.get_logger()
spark = glueContext.spark_session()
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

另外,如果你创造新的胶水作业,它会给你样板代码,解决你的问题.

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61050740

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档