我按照这里的示例从Spark:https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/tree/master/scala/spark-streaming编写Cloud
在我的例子中,我正在使用Kafka进行一些转换,然后需要将它们写到我的Bigtable实例中。最初,使用该示例中的所有依赖版本,在尝试访问Bigtable中的任何通过连接的内容时,都会从超时中获得未经授权的错误:
Refreshing the OAuth token Retrying failed call. Failure #1, got: Status{code=UNAUTHENTICATED, description=Unexpected failure get auth token,
cause=java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.RefreshingOAuth2CredentialsInterceptor.getHeader(RefreshingOAuth2CredentialsInterceptor.java:290)然后,我将bigtable-hbase-1.x-hadoop依赖项转换为最近的一些东西,比如1.9.0,并通过了表管理工作的身份验证,但是当实际尝试进行saveAsNewAPIHadoopDataset()调用时,我获得了额外的未经授权:
Retrying failed call. Failure #1, got: Status{code=UNAUTHENTICATED, description=Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential.
See https://developers.google.com/identity/sign-in/web/devconsole-project., cause=null} on channel 34.
Trailers: Metadata(www-authenticate=Bearer realm="https://accounts.google.com/",bigtable-channel-id=34)我发现从conf.set(BigtableOptionsFactory.BIGTABLE_HOST_KEY, BigtableOptions.BIGTABLE_BATCH_DATA_HOST_DEFAULT)方法中删除setBatchConfigOptions()允许调用使用默认主机进行身份验证,并将处理几条卡夫卡消息,但随后暂停、挂断并最终抛出一个No route to host错误:
019-07-25 17:29:12 INFO JobScheduler:54 - Added jobs for time 1564093750000 ms
2019-07-25 17:29:21 INFO JobScheduler:54 - Added jobs for time 1564093760000 ms
2019-07-25 17:29:31 INFO JobScheduler:54 - Added jobs for time 1564093770000 ms
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress.
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress.
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress.
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress.
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress.
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress.
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress.
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress.
2019-07-25 17:29:38 WARN AbstractRetryingOperation:130 - Retrying failed call.
Failure #1, got: Status{code=UNAVAILABLE, description=io exception, cause=com.google.bigtable.repackaged.io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: batch-bigtable.googleapis.com/2607:f8b0:400f:801:0:0:0:200a:443我假设这是依赖型版本的一个问题,因为这个示例已经相当老了,但是找不到任何更新的从Spark流写入Bigtable的示例。我还没有找到与bigtable-hbase-2.x-hadoop一起工作的版本组合。
目前的POM:
<scala.version>2.11.0</scala.version>
<spark.version>2.3.3</spark.version>
<hbase.version>1.3.1</hbase.version>
<bigtable.version>1.9.0</bigtable.version>
<dependencies>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.7.1</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>26.0-jre</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-logging</artifactId>
<version>1.74.0</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigtable</groupId>
<artifactId>bigtable-hbase-2.x-hadoop</artifactId>
<version>${bigtable.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigtable</artifactId>
<version>0.95.0-alpha</version>
</dependency>发布于 2019-08-05 17:08:30
批处理模式下的身份验证问题是Bigtable API中已知的问题。他们最近发布了1.12.0版本,解决了这些问题。NoRouteToHostException是孤立于本地运行的,最终成为企业防火墙问题,在设置-Dhttps.proxyHost和-Dhttps.proxyPort时解决了这个问题。
https://stackoverflow.com/questions/57221392
复制相似问题