文章/答案/技术大牛

发布

社区首页 >问答首页 >VertexAI管道:如何使用来自自定义kfp组件的输出作为google_cloud_pipeline_components的输入？

问VertexAI管道:如何使用来自自定义kfp组件的输出作为google_cloud_pipeline_components的输入？
EN

Stack Overflow用户

提问于 2021-11-17 14:51:02

回答 1查看 172关注 0票数 1

我正在尝试使用kfp组件在VertexAI中为管道编写Python代码。我有一个创建system.Dataset对象的步骤，如下所示：

@component(base_image="python:3.9", packages_to_install=["google-cloud-bigquery","pandas","pyarrow","fsspec","gcsfs"])
def create_dataframe(
    project: str,
    region: str,
    destination_dataset: str,
    destination_table_name: str,
    dataset: Output[Dataset],
):
    
    from google.cloud import bigquery
    
    client = bigquery.Client(project=project, location=region)
    dataset_ref = bigquery.DatasetReference(project, destination_dataset)
    table_ref = dataset_ref.table(destination_table_name)
    table = client.get_table(table_ref)

    train = client.list_rows(table).to_dataframe()
    train.drop("<list_of_columns>", axis=1, inplace=True)
    train['class'] = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1]
    
    train.to_csv(dataset.uri)

然后，我使用数据集作为AutoMLTabularTrainingJobRunOp的输入

df = create_dataframe(project=project,
                      region=region,
                      destination_dataset=destination_dataset,
                      destination_table_name=destination_table_name,
)
    
# Training with AutoML
training_op = gcc_aip.AutoMLTabularTrainingJobRunOp(
            project=project,
            display_name="train-automl-task",
            optimization_prediction_type="classification",
            column_transformations=[
                "<nested_dict>",
            ],
            dataset=df.outputs["dataset"],
            target_column="class",
            budget_milli_node_hours=1000,
)

查看日志，我发现了这个错误：

"Traceback (most recent call last): "

" File "/opt/python3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main "

" "__main__", mod_spec) "

" File "/opt/python3.7/lib/python3.7/runpy.py", line 85, in _run_code "

" exec(code, run_globals) "

" File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/remote/aiplatform/remote_runner.py", line 284, in <module> "

" main() "

" File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/remote/aiplatform/remote_runner.py", line 280, in main "

" print(runner(args.cls_name, args.method_name, executor_input, kwargs)) "

" File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/remote/aiplatform/remote_runner.py", line 236, in runner "

" prepare_parameters(serialized_args[METHOD_KEY], method, is_init=False) "

" File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/remote/aiplatform/remote_runner.py", line 205, in prepare_parameters "

" value = cast(value, param_type) "

" File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/remote/aiplatform/remote_runner.py", line 176, in cast "

" return annotation_type(value) "

" File "/opt/python3.7/lib/python3.7/site-packages/google/cloud/aiplatform/datasets/dataset.py", line 81, in __init__ "

" self._gca_resource = self._get_gca_resource(resource_name=dataset_name) "

" File "/opt/python3.7/lib/python3.7/site-packages/google/cloud/aiplatform/base.py", line 532, in _get_gca_resource "

" location=self.location, "

" File "/opt/python3.7/lib/python3.7/site-packages/google/cloud/aiplatform/utils/__init__.py", line 192, in full_resource_name "

" raise ValueError(f"Please provide a valid {resource_noun[:-1]} name or ID") "

"ValueError: Please provide a valid dataset name or ID "

因此，我查看了google/cloud/aiplatform/utils/__init__.py中第192行的源代码，发现资源名称应该类似于："projects/.../locations/.../datasets/12345"或"projects/.../locations/.../metadataStores/.../contexts/12345"。

运行create_dataframe后，打开在我的存储桶中创建的executor_output.json文件，我发现文件名的格式似乎是正确的：

{"artifacts": {"dataset": {"artifacts": [{"name": "projects/my_project/locations/my_region/metadataStores/default/artifacts/1299...", "uri": "my_bucket/object_folder", "metadata": {"name": "reshaped-training-dataset"}}]}}}

我还试图为元数据中的数据集设置一个人类可读的名称，但我没有成功。任何建议都会很有帮助。

kfp

python

google-cloud-platform

google-cloud-ml

google-cloud-vertex-ai

回答 1

Stack Overflow用户

发布于 2021-11-18 21:06:13

您可以添加参数dataset: InputDataset，如下例所示：

df = create_dataframe(project=project,
                      region=region,
                      destination_dataset=destination_dataset,
                      destination_table_name=destination_table_name,
                      dataset: Input[Dataset],
)

您还可以查看更多文档pipelines和pipelines with kfp。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70006594

复制

相似问题

问VertexAI管道:如何使用来自自定义kfp组件的输出作为google_cloud_pipeline_components的输入？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问VertexAI管道:如何使用来自自定义kfp组件的输出作为google_cloud_pipeline_components的输入？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问VertexAI管道:如何使用来自自定义kfp组件的输出作为google_cloud_pipeline_components的输入？
EN