首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >对于bigtable和python,是什么原因导致了像google.api_core.exceptions.Aborted: 409读表错误这样的异常?

对于bigtable和python,是什么原因导致了像google.api_core.exceptions.Aborted: 409读表错误这样的异常?
EN

Stack Overflow用户
提问于 2021-10-29 12:50:59
回答 2查看 166关注 0票数 0

当在一个表上使用read_rows时,我得到了这个异常。表中有文档的特征行,每个文档有300到800个特征,大约有200万个文档。row_key是特性,列是具有该特性的文档ids。有数十亿行。

我使用的是python bigtable SDK,python 3.6.8和google-cloud-bigtable 2.3.3。

在使用table.read_rows(start_key=foo#xy,end_key=foo#xz)读取行时,我得到了这种异常。foo#xy和foo#xy来自table.sample_row_keys()。我从sample_row_keys得到了200个前缀,并且在得到这个错误之前,我成功地处理了前5个前缀。我在ThreadPool中运行table.read_rows()调用。

如果您遇到过类似的异常并对其进行了调查,原因是什么?您采取了什么措施来防止它?

代码语言:javascript
复制
Traceback (most recent call last):
  File "/home/bdc34/docsim/venv/lib64/python3.6/site-packages/google/api_core/grpc_helpers.py", line 106, in __next__
    return next(self._wrapped)
  File "/home/bdc34/docsim/venv/lib64/python3.6/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "/home/bdc34/docsim/venv/lib64/python3.6/site-packages/grpc/_channel.py", line 809, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.ABORTED
        details = "Error while reading table 'projects/arxiv-production/instances/docsim/tables/docsim' : Response was not consumed in time; terminating connection.(Possible causes: slow client data read or network problems)"
        debug_error_string = "{"created":"@1635477504.521060666","description":"Error received from peer ipv4:172.217.0.42:443","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Error while reading table 'projects/arxiv-production/instances/docsim/tables/docsim' : Response was not consumed in time; terminating connection.(Possible causes: slow client data read or network problems)","grpc_status":10}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bdc34/docsim/docsim/loading/all_common_hashes.py", line 53, in <module>
    for hash, n, c, dt in pool.imap_unordered( do_prefix, jobs ):
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/bdc34/docsim/docsim/loading/all_common_hashes.py", line 33, in do_prefix
    for hash, common, papers in by_prefix(db, start, end):
  File "/home/bdc34/docsim/docsim/loading/all_common_hashes.py", line 15, in by_prefix
    for row in db.table.read_rows(start_key=start, end_key=end):
  File "/home/bdc34/docsim/venv/lib64/python3.6/site-packages/google/cloud/bigtable/row_data.py", line 485, in __iter__
    response = self._read_next_response()
  File "/home/bdc34/docsim/venv/lib64/python3.6/site-packages/google/cloud/bigtable/row_data.py", line 474, in _read_next_response
    return self.retry(self._read_next, on_error=self._on_error)()
  File "/home/bdc34/docsim/venv/lib64/python3.6/site-packages/google/api_core/retry.py", line 288, in retry_wrapped_func
    on_error=on_error,
  File "/home/bdc34/docsim/venv/lib64/python3.6/site-packages/google/api_core/retry.py", line 190, in retry_target
    return target()
  File "/home/bdc34/docsim/venv/lib64/python3.6/site-packages/google/cloud/bigtable/row_data.py", line 470, in _read_next
    return six.next(self.response_iterator)
  File "/home/bdc34/docsim/venv/lib64/python3.6/site-packages/google/api_core/grpc_helpers.py", line 109, in __next__
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.Aborted: 409 Error while reading table 'projects/testproject/instances/testinstance/tables/testtable' : 
Response was not consumed in time; terminating connection.(Possible causes: slow client data read or network problems)
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-11-11 17:33:20

我通过调用范围小得多的read_rows解决了这个问题。来自table.sample_row_keys()的前缀大约有15B行。将每个范围一分为二5次,以产生较小的范围。

我通过将开始和结束row_keys填充到相同的长度,将它们转换为整数并找到中点,将其一分为二。

票数 0
EN

Stack Overflow用户

发布于 2021-11-01 14:21:19

此错误可能有不同的原因。您可能需要确保您在这里没有面对hotspotting场景。

此外,您还可以检查是否正在读取表中的许多不同行,以及是否正在创建尽可能少的客户端。如果您读取的行键范围很大,但只包含少量的行数,则性能也会受到影响。您将在here中找到更多有关解决性能问题的一般建议。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69769190

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档