文章/答案/技术大牛

发布

社区首页 >问答首页 >顶点AI管道失效的前提条件

问顶点AI管道失效的前提条件
EN

Stack Overflow用户

提问于 2022-02-23 22:57:22

回答 2查看 538关注 0票数 1

我一直在跟踪这段视频：https://www.youtube.com/watch?v=1ykDWsnL2LE&t=310s

位于：https://codelabs.developers.google.com/vertex-pipelines-intro#5的代码(我已经按照视频完成了最后两个步骤，这对于google_cloud_pipeline_components版本来说不是问题: 0.1.1)

我在顶点ai中创建了一个管道，它运行并使用以下代码来创建管道(来自上面链接中的视频而不是代码摘录)：

#run pipeline
response = api_client.create_run_from_job_spec(
    "tab_classif_pipeline.json", pipeline_root = PIPELINE_ROOT,
    parameter_values = {
    "project" : PROJECT_ID,
    "display_name" : DISPLAY_NAME
    }
)

在GCP日志中，我得到以下错误：

"google.api_core.exceptions.FailedPrecondition: 400 BigQuery Dataset location `eu` must be in the same location as the service location `us-central1`.

我在dataset_create_op阶段得到了错误：

    dataset_create_op = gcc_aip.TabularDatasetCreateOp(
    project = project, display_name = display_name, bq_source = bq_source
)

我的数据集是在EU (整个区域)中配置的，所以我不知道我们集中1来自哪里(或者服务位置是什么？)。

下面是我使用的所有代码：

 PROJECT_ID = "marketingtown"
 BUCKET_NAME = f"gs://lookalike_model"
 from typing import NamedTuple
 import kfp
 from kfp import dsl
 from kfp.v2 import compiler
 from kfp.v2.dsl import (Artifact, Input, InputPath, Model, Output, 
                            OutputPath, ClassificationMetrics, 
 Metrics, component)
 from kfp.v2.components.types.artifact_types import Dataset
 from kfp.v2.google.client import AIPlatformClient
 from google.cloud import aiplatform
 from google_cloud_pipeline_components import aiplatform as gcc_aip

 #set environment variables
 PATH = %env PATH
 %env PATH = (PATH)://home/jupyter/.local/bin
 REGION = "europe-west2"
    
 #cloud storage path where artifact is created by pipeline
 PIPELINE_ROOT = f"{BUCKET_NAME}/pipeline_root/"
 PIPELINE_ROOT
 import time
 DISPLAY_NAME = f"lookalike_model_pipeline_{str(int(time.time()))}"
 print(DISPLAY_NAME)
 
@kfp.dsl.pipeline(name = "lookalike-model-training-v2", 
pipeline_root = PIPELINE_ROOT)

def pipeline(
    bq_source : str = f"bq://{PROJECT_ID}.MLOp_pipeline_temp.lookalike_training_set",
    display_name : str = DISPLAY_NAME,
    project : str = PROJECT_ID,
    gcp_region : str = "europe-west2",
    api_endpoint : str = "europe-west2-aiplatform.googleapis.com",
    thresholds_dict_str : str = '{"auPrc" : 0.3}'
):
    dataset_create_op = gcc_aip.TabularDatasetCreateOp(
        project = project, display_name = display_name, bq_source = bq_source
    )
    
    training_op = gcc_aip.AutoMLTabularTrainingJobRunOp(
        project=project,
        display_name=display_name,
        optimization_prediction_type="classification",
        budget_milli_node_hours=1000,
        column_transformations=[
            {"categorical": {"column_name": "agentId"}},
            {"categorical": {"column_name": "postcode"}},
            {"categorical": {"column_name": "isMobile"}},
            {"categorical": {"column_name": "gender"}},
            {"categorical": {"column_name": "timeOfDay"}},
            {"categorical": {"column_name": "sale"}},
        ],
        dataset=dataset_create_op.outputs["dataset"], #dataset from previous step
        target_column="sale",
    )
    
    #outputted evaluation metrics
    model_eval_task = classification_model_eval_metrics(
        project,
        gcp_region,
        api_endpoint,
        thresholds_dict_str,
        training_op.outputs["model"],
    )
    
    #if deployment threshold is mean, deploy
    with dsl.Condition(
        model_eval_task.outputs["dep_decision"] == "true",
        name="deploy_decision",
    ):
        
    endpoint_op = gcc_aip.EndpointCreateOp(
        project=project,
        location=gcp_region,
        display_name="train-automl-beans",
    )
        
    #deploys model to an endpoint
    gcc_aip.ModelDeployOp(
        model=training_op.outputs["model"],
        endpoint=endpoint_op.outputs["endpoint"],
        min_replica_count=1,
        max_replica_count=1,
        machine_type="n1-standard-4",
        )
   

     compiler.Compiler().compile(
        pipeline_func = pipeline, package_path = "tab_classif_pipeline.json"
    )

    #run pipeline
    response = api_client.create_run_from_job_spec(
        "tab_classif_pipeline.json", pipeline_root = PIPELINE_ROOT,
        parameter_values = {
        "project" : PROJECT_ID,
        "display_name" : DISPLAY_NAME
        }
    )

python

google-cloud-platform

pipeline

kubeflow

google-cloud-vertex-ai

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-03-09 12:14:43

我通过将位置添加到TabularDatasetCreateJob解决了这个问题：

    dataset_create_op = gcc_aip.TabularDatasetCreateOp(
    project=project,
    display_name=display_name, 
    bq_source=bq_source,
    location = gcp_region
)

我现在对模型培训工作也有同样的问题，但我已经了解到，上面代码中的许多函数都带有一个位置参数，或者默认为us-central1 1。我会更新如果我得到任何进一步。

票数 0

Stack Overflow用户

发布于 2022-03-04 09:45:13

正如@scottlucas所证实的那样，通过升级到google云-aiplatform的最新版本，可以通过pip install --upgrade google-cloud-aiplatform来解决这个问题。

升级到最新的库确保所有可用作参考的正式文档与实际产品保持一致。

为了将来可能遇到这个用例的社区的利益，将答案作为社区wiki发布。

请随意编辑此答案以获得更多信息。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71245000

复制

相似问题

问顶点AI管道失效的前提条件
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问顶点AI管道失效的前提条件EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问顶点AI管道失效的前提条件
EN