当我在本地运行下面的代码时,它可以工作,但是当我在Azure Databricks中运行它时,它会永远挂起,并且永远不会停止运行。我知道endpoint和sasToken是正确的,因为它在本地工作。但是当我直接从笔记本上运行它时,它就不能工作了。有什么想法吗?
import com.azure.storage.blob.BlobClientBuilder
import java.io.InputStream
val input: InputStream = new BlobClientBuilder()
.endpoint(s"https://<storage-account>.blob.core.windows.net")
.sasToken("<sas-token>")
.containerName("<container-name>>")
.blobName("<blob-name>")
.buildClient()
.openInputStream()发布于 2022-07-05 13:12:15
我通过在我的应用程序中使用阴影罐(https://maven.apache.org/plugins/maven-shade-plugin/)来解决这个问题。这里的例子帮助我完成了这个设置。https://github.com/anuchandy/azure-sdk-in-data-bricks。有关更新的示例,请参阅下面。现在,我可以在import前加上我在POM插件配置中创建的阴影组id。我在Databricks中的代码现在确切地知道了在从blob存储读取时使用什么依赖项。
import <MY.GROUP.ID>.com.azure.storage.blob.BlobClientBuilder
import java.io.InputStream
val input: InputStream = new BlobClientBuilder()
.endpoint(s"https://<storage-account>.blob.core.windows.net")
.sasToken("<sas-token>")
.containerName("<container-name>>")
.blobName("<blob-name>")
.buildClient()
.openInputStream()Azure Blob存储依赖项:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-blob</artifactId>
<version>12.14.0</version>
</dependency>Maven Shade插件:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.4</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<minimizeJar>true</minimizeJar>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass><MY.MAIN.CLASS></mainClass>
</transformer>
<!--Transforms META-INF/services (essential for azure-core relocation)-->
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
<relocations>
<relocation>
<pattern>com.fasterxml.jackson</pattern>
<shadedPattern>${project.groupId}.shaded.com.fasterxml.jackson</shadedPattern>
</relocation>
<!--In Databricks 10.2 you nay also need to relocate reactor netty classes-->
<relocation>
<pattern>io.netty</pattern>
<shadedPattern>${project.groupId}.shaded.io.netty</shadedPattern>
</relocation>
<relocation>
<pattern>reactor</pattern>
<shadedPattern>${project.groupId}.shaded.reactor</shadedPattern>
</relocation>
<relocation>
<!--Databricks brings its own version of azure-core which may be incompatible with blob storage version. Relocate azure-core so we don't collide with it-->
<pattern>com.azure</pattern>
<shadedPattern>${project.groupId}.shaded.com.azure</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>发布于 2022-07-05 06:31:51
确保检查是否安全传输启用。转到天蓝色存储帐户->设置到您将找到的安全传输配置。如果没有,就启用它。安全传输只允许通过安全连接向存储帐户请求,从而提供存储帐户的安全性。
有其他选择吗?
有不同的选择:直接从Azure笔记本读取Azure存储文件的。
https://stackoverflow.com/questions/72751120
复制相似问题