首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从PySpark检索__TABLES__时BigQuery ' internal : request failed: internal error‘

从PySpark检索__TABLES__时BigQuery ' internal : request failed: internal error‘
EN

Stack Overflow用户
提问于 2021-06-28 19:08:46
回答 1查看 136关注 0票数 0

我正在尝试从PySpark查询BigQuery中的__TABLES__数据。我使用下面的代码来查询系统表:

代码语言:javascript
复制
from pyspark.sql import SparkSession


spark = SparkSession.builder\
    .config('parentProject', 'my-parent-project')\
    .config('spark.jars.packages', 'com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.18.1')\
    .getOrCreate()

spark.read.format('bigquery')\
            .option("credentials", my_key)\
            .option("project", 'my-parent-project') \
            .option('table', 'my-dataset.__TABLES__') \
            .load()

一直持续到2021/06/25。第二天,对于这段代码,我突然开始收到这个错误消息:

代码语言:javascript
复制
: com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.InternalException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: INTERNAL: request failed: internal error
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:67)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1074)
at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
at com.google.cloud.spark.bigquery.repackaged.io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:563)
at com.google.cloud.spark.bigquery.repackaged.io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533)
at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:413)
at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:742)
at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:721)
at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed

(堆栈跟踪甚至比这个更长,如果需要,可以索要它,我会发布剩下的部分)。

BigQuery中有什么变化吗?该错误消息对我的故障排除帮助不大,有什么建议吗?为了添加更多上下文,我可以使用相同的代码查询同一数据集中的其他表。

我用Spark 3.0.1和BigQuery连接器com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.18.1观察到了这种行为。我在使用Spark 2.4.3和BigQuery连接器com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.18.1时也体验到了同样的行为

EN

回答 1

Stack Overflow用户

发布于 2021-06-28 22:27:04

注意,__TABLES__不是一个实际的BigQuery表,而是它的元数据的视图。解决此问题的一种方法是执行以下命令:

代码语言:javascript
复制
spark = SparkSession.builder\
    .config('parentProject', 'my-parent-project')\
    .config('viewsEnabled','true')\
    .config('materializationDataset', DATASET)
    .config('spark.jars.packages', 'com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.21.1')\
    .getOrCreate()

tables_df = spark.read.format('bigquery')\
            .option("credentials", my_key)\
            .load("SELECT * FROM my-project.my-dataset.__TABLES__")
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68162281

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档