我在我本地的虚拟环境中使用python部署了一个数据流模板,这抛出了一堆令人费解的问题,结果如下:
Discarding https://files.pythonhosted.org/packages/e0/e6/d14b4a2b54ef065b1a2c576537abe805c1af0c94caef70d365e2d78fc528/pyarrow-0.15.1.tar.gz#sha256=7ad074690ba38313067bf3bbda1258966d38e2037c035d08b9ffe3cce07747a5 (from https://pypi.org/simple/pyarrow/). Command errored out with exit status 1: \'C:\\Users\\PhuongAnhNguyenVenef\\AppData\\Local\\Programs\\Python\\Python37\\python.exe\' \'C:\\Users\\PhuongAnhNguyenVenef\\AppData\\Local\\Programs\\Python\\Python37\\lib\\site-packages\\pip\' install --ignore-installed --no-user --prefix \'C:\\Users\\PHUONG~1\\AppData\\Local\\Temp\\pip-build-env-kj19czll\\overlay\' --no-warn-script-location --no-binary :all: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel setuptools_scm \'cython >= 0.29\' Check the logs for full command output.\r\nERROR: Could not find a version that satisfies the requirement pyarrow<3.0.0,>=0.15.1 (from apache-beam[gcp])\r\nERROR: No matching distribution found for pyarrow<3.0.0,>=0.15.1\r\n'这是我的需求文件:
apache-beam[gcp]==2.28.0
pandas
numpy
google-cloud==0.34.0
google-cloud-storage==1.33.0
google-cloud-bigquery==1.28.0
pyarrow==0.17.1
fsspec==0.8.4
geopy~=1.21.0当我使用DirectRunner在本地运行数据流作业时,它成功运行,没有任何错误。我也可以毫无问题地安装整个需求文件。在第一次失败后,我注释掉了pyarrow和fsspec,重新安装CPython,但问题仍然存在。我尝试在CloudShell上部署模板,成功了,但作业失败,出现以下错误
"Error syncing pod d18cc4b816792b6af6e1c00dd0ced7fb ("dataflow-create-utilities-177a775e-04131228-8n1u-harness-2rxs_default(d18cc4b816792b6af6e1c00dd0ced7fb)"), skipping: failed to "StartContainer" for "python" with CrashLoopBackOff: "back-off 5m0s restarting failed container=python pod=dataflow-create-utilities-177a775e-04131228-8n1u-harness-2rxs_default(d18cc4b816792b6af6e1c00dd0ced7fb)""这似乎也是由依赖项安装引起的:
ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)所以我的问题是:我如何使用这些包来部署这个模板?奇怪的是,当我部署模板时,DirectRunner和pip install都能工作,但不能。
编辑:我将pyarrow版本更改为1.0.1,这正是我需要的。这个问题仍然存在。
发布于 2021-12-03 17:48:05
请从此链接https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies中引用SDK及其对应的SDK依赖项。
尽量只提到setup.py中没有预先安装的那些与apache-beamgcp版本对应的包。另外,从你的云日志中启用dataflow.googleapis.com/ worker -startup log,以了解当你的worker安装它们时发生的确切依赖冲突。
https://stackoverflow.com/questions/67081547
复制相似问题