跟进this的问题。我使用的是Kedro v0.18.2。我正在尝试使用TemplateConfig,因此我在conf/base下创建了一个globals.yml,如下所示:
paths:
base_path: s3://my_project
datasets:
pdf: base.PDFDataSet
png: pillow.ImageDataSet
csv: pandas.CSVDataSet
excel: pandas.ExcelDataSet
data_folders:
raw: 01_raw
intermediate: 02_intermediate
primary: 03_primary
feature: 04_feature
model_input: 05_model_input
models: 06_models
model_output: 07_model_output
reporting: 08_reporting我已经跟踪了这些文档,并且取消了一些settings.py的注释:
"""Project settings. There is no need to edit this file unless you want to change values
from the Kedro defaults. For further information, including these default values, see
https://kedro.readthedocs.io/en/stable/kedro_project_setup/settings.html."""
# Instantiated project hooks.
# from certifai.hooks import ProjectHooks
# HOOKS = (ProjectHooks(),)
# Installed plugins for which to disable hook auto-registration.
# DISABLE_HOOKS_FOR_PLUGINS = ("kedro-viz",)
# Class that manages storing KedroSession data.
# from kedro.framework.session.store import ShelveStore
# SESSION_STORE_CLASS = ShelveStore
# Keyword arguments to pass to the `SESSION_STORE_CLASS` constructor.
# SESSION_STORE_ARGS = {
# "path": "./sessions"
# }
# Class that manages Kedro's library components.
# from kedro.framework.context import KedroContext
# CONTEXT_CLASS = KedroContext
# Directory that holds configuration.
# CONF_SOURCE = "conf"
# Class that manages how configuration is loaded.
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {
"globals_pattern": "*globals.yml",
}
# Class that manages the Data Catalog.
# from kedro.io import DataCatalog
# DATA_CATALOG_CLASS = DataCatalogcatalog.yml看起来是这样的:
_label_images: &label_images
type: PartitionedDataSet
path: ${paths.base_path}/data/${data_folders.raw}/label_images
dataset: ${datasets.png}
label_images_png:
<<: *label_images
filename_suffix: .png
label_images_jpg:
<<: *label_images
filename_suffix: .jpg
label_images_jpeg:
<<: *label_images
filename_suffix: .jpeg
label_images_pdf:
<<: *label_images
dataset: base.PDFDataSet
filename_suffix: .pdf
my_project_label_extracts:
type: PartitionedDataSet
path: s3://my_project/data/01_raw/label_extracts
dataset: pandas.ExcelDataSet我的测试脚本如下所示:
from kedro.config import ConfigLoader
from kedro.framework.project import settings
from pathlib import Path
from kedro.extras.datasets import pillow
project_path = Path(__file__).parent.parent.parent
conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="base")
conf_catalog = conf_loader.get("catalog*", "catalog*/**")
images_dataset = pillow.ImageDataSet.from_config("label_images_png", conf_catalog["label_images_png"])
images_loader = images_dataset.load()
images_loader["00337180800086"]().show()在catalog.yml中使用硬编码的值,脚本运行并输出图像,但是,使用模板配置它不起作用。我是不是遗漏了什么?
如果问题被重复了,很抱歉。
发布于 2022-07-25 11:18:36
我注意到的第一个bug是在条目的目录中:
_label_images: &label_images
type: PartitionedDataSet
path: ${paths.base_path}/data/${data_folders.raw}/label_images
dataset: ${datasets.png}您错过了数据集的类型键。正确的条目应该是:
_label_images: &label_images
type: PartitionedDataSet
path: ${paths.base_path}/data/${data_folders.raw}/label_images
dataset:
type: ${datasets.png}如果您现在使用TemplatedConfigLoader运行脚本,那么您应该希望不再收到所提到的错误了:
from kedro.config import ConfigLoader, TemplatedConfigLoader
from kedro.framework.project import settings
from pathlib import Path
from kedro.extras.datasets import pillow
project_path = Path(__file__).parent.parent.parent
conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = TemplatedConfigLoader(conf_source=conf_path, env="base", globals_pattern="*globals.yml")
conf_catalog = conf_loader.get("catalog*", "catalog*/**")
images_dataset = pillow.ImageDataSet.from_config("label_images_png", conf_catalog["label_images_png"])
images_loader = images_dataset.load()
images_loader["00337180800086"]().show()为了便于沟通,您可能需要加入Kedro不和谐频道,这样我们就可以实时地对您作出响应:https://discord.gg/akJDeVaxnB。
https://stackoverflow.com/questions/73105524
复制相似问题