向GCP提交数据流作业时,我收到以下错误:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 766, in run
self._load_main_session(self.local_staging_directory)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 482, in _load_main_session
pickler.load_session(session_file)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 266, in load_session
return dill.load_session(file_path)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 402, in load_session
module = unpickler.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 818, in _import_module
return __import__(import_name)
ImportError: No module named tensorflow_transform我的假设是,像tensorflow-transform和apache-beam这样的需求是预先安装的,并且在几个月前可以正常工作。
发布于 2019-03-19 05:26:19
这就是解决方案,把它放在这里给面临同样问题的人。
假设文件包含所有梁步骤,则需要将setup.py文件与正在运行的文件放在同一目录中。
import setuptools
setuptools.setup(
name='whatever-name',
version='0.0.1',
install_requires=[
'apache-beam==2.10.0',
'tensorflow-transform==0.12.0'
],
packages=setuptools.find_packages(),
)在我的python文件中
options = PipelineOptions()必须更改为:
options = PipelineOptions(setup_file="./setup.py")https://stackoverflow.com/questions/55227224
复制相似问题