首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >great_expectations在ADLS Gen2上创建csv文件数据源

great_expectations在ADLS Gen2上创建csv文件数据源
EN

Stack Overflow用户
提问于 2022-07-04 13:25:35
回答 1查看 144关注 0票数 0

我想对我的ADLS great_expectation Gen2中的csv文件运行csv测试套件。在我的ADLS中,我有一个名为"data“的容器,其中我在mypath/test/mydata.csv上有一个文件。我使用InferredAssetAzureDataConnector。我能够创建和测试/验证数据源配置,但我认为存在一个“沉默”问题,没有被发现。

问题是我不能基于这个数据源创建一个测试套件。当我运行great_expectations suite new

我选择(3)使用分析器创建套件,然后stacktrace):

  • 选择我新创建的数据源,然后选择

  • ,而不是向我展示数据源上的可用文件,它崩溃时会出现以下错误(参见下面完整的

  • )

代码语言:javascript
复制
TypeError: __init__() missing 1 required positional argument: 'data_asset_name'

当我使用本地数据源(InferredAssetFilesystemDataConnector)执行此操作时,它将显示数据源上的可用文件,以便在CLI中进行选择。

该错误是否意味着它无法在ADLS上找到csv文件,从而没有显示名称?我该怎么解决这个问题?

创建数据源的代码:

代码语言:javascript
复制
import great_expectations as ge
from great_expectations.cli.datasource import sanitize_yaml_and_save_datasource, check_if_datasource_name_exists
context = ge.get_context()
datasource_name = "my_datasource_name"


example_yaml = f"""
name: {datasource_name}
class_name: Datasource
execution_engine:
  class_name: SparkDFExecutionEngine
  azure_options:
      account_url: https://<ACCOUNT-NAME>.blob.core.windows.net
      credential: <ACCOUNT-KEY>
data_connectors:
  default_inferred_data_connector_name:
    class_name: InferredAssetAzureDataConnector
    azure_options:
        account_url: https://<ACCOUNT-NAME>.blob.core.windows.net
        credential: <ACCOUNT-KEY>
    container: data
    name_starts_with: mypath/test
    default_regex:
      group_names:
        - data_asset_name
      pattern: (.csv)
  default_runtime_data_connector_name:
    class_name: RuntimeDataConnector
    assets:
      my_runtime_asset_name:
        batch_identifiers:
          - runtime_batch_identifier_name
"""
print(example_yaml)
# Test the yml:
context.test_yaml_config(yaml_config=example_yaml)

通过木星笔记本创建数据源后的输出:

代码语言:javascript
复制
Attempting to instantiate class from config...
    Instantiating as a Datasource, since class_name is Datasource
    Successfully instantiated Datasource


ExecutionEngine class name: SparkDFExecutionEngine
Data Connectors:
    default_inferred_data_connector_name : InferredAssetAzureDataConnector
    Available data_asset_names (0 of 0):
    Unmatched data_references (0 of 0):[]
    default_runtime_data_connector_name:RuntimeDataConnector
    default_runtime_data_connector_name : RuntimeDataConnector
    Available data_asset_names (1 of 1):
        my_runtime_asset_name (0 of 0): []
    Unmatched data_references (0 of 0):[]
<great_expectations.datasource.new_datasource.Datasource at 0x1cdc9e01f70>

全错误堆栈:

代码语言:javascript
复制
Traceback (most recent call last):
  File "c:\coding\python38\lib\runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\coding\python38\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\coding\myrepo\venv\Scripts\great_expectations.exe\__main__.py", line 7, in <module>
  File "C:\coding\myrepo\venv\lib\site-packages\great_expectations\cli\cli.py", line 190, in main
    cli()
  File "C:\coding\myrepo\venv\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "C:\coding\myrepo\venv\lib\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "C:\coding\myrepo\venv\lib\site-packages\click\core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\coding\myrepo\venv\lib\site-packages\click\core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\coding\myrepo\venv\lib\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\coding\myrepo\venv\lib\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "C:\coding\myrepo\venv\lib\site-packages\click\decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "C:\coding\myrepo\venv\lib\site-packages\great_expectations\cli\suite.py", line 151, in suite_new
    _suite_new_workflow(
  File "C:\coding\myrepo\venv\lib\site-packages\great_expectations\cli\suite.py", line 335, in _suite_new_workflow
    raise e
  File "C:\coding\myrepo\venv\lib\site-packages\great_expectations\cli\suite.py", line 279, in _suite_new_workflow
    toolkit.add_citation_with_batch_request(
  File "C:\coding\myrepo\venv\lib\site-packages\great_expectations\cli\toolkit.py", line 1020, in add_citation_with_batch_request
    and BatchRequest(**batch_request)
TypeError: __init__() missing 1 required positional argument: 'data_asset_name'
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-07-05 11:58:34

我犯了个错误.使用下面的模式,它工作得完美无缺:

代码语言:javascript
复制
    default_regex:
      group_names:
        - data_asset_name
      pattern: (.*\.csv)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72857502

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档