我正在尝试提交一个dataproc作业,它将使用Kerberized集群中的数据。当前的工作解决方案是使dataproc作业提交命令的机器上有jaas配置文件和keytab:
gcloud dataproc jobs submit pyspark \
--cluster MY-CLUSTER --region us-west1 --project MY_PROJECT \
--files my_keytab_file.keytab,my_jaas_file.conf \
--properties spark.driver.extraJavaOptions=-Djava.security.auth.login.config=my_jaas_file.conf,spark.executor.extraJavaOptions=-Djava.security.auth.login.config=my_jaas_file.conf \
gs://CODE_BUCKET/path/to/python/main.py My_jaas_file.conf的内容:
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
debug=true
useKeyTab=true
serviceName="kafka"
keyTab="my_keytab_file.keytab"
principal="principal@MY.COMPANY.COM";
};消费者代码:
spark = SparkSession \
.builder \
.appName("MY_APP") \
.master("yarn") \
.getOrCreate()
df = spark.read.format("kafka") \
.option("kafka.bootstrap.servers", "BOOTSTRAP_SERVERS_LIST[broker:port,broker:port,broker:port]") \
.option("kafka.sasl.mechanism", "GSSAPI") \
.option("kafka.security.protocol", "SASL_SSL") \
.option("kafka.group.id", "PREDEFINED_CG") \
.option("subscribe", "MY_TOPIC") \
.option("startingOffsets", "earliest") \
.option("endingOffsets", "latest") \
.load()
df.show()文件被复制到GCS,我认为从那里它们被复制到纱线工作区中。JVM能够获取它们,并且身份验证是成功的。
但是,这个设置是不可行的,因为我将无法访问keytab文件。keytab将是部署过程的一部分,并且可以在主节点和工作节点上的磁盘位置下使用。服务将获取keytab文件并编写一个缓存文件,这将成为kerberized身份验证的来源。
我尝试在主节点和每个节点上创建一个jaas配置文件:
nano /path/to/keytab/my_jaas_file.config
# variant 1
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
debug=true
useKeyTab=true
serviceName="kafka"
keyTab="/path/to/keytab/my_keytab_file.keytab"
principal="principal@MY.COMPANY.COM";
};
# variant 2
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required debug=true
useTicketCache=true
ticketCache="/path/to/keytab/krb5_ccache"
serviceName="kafka"
principal="principal@MY.COMPANY.COM";
};并使用以下配置启动dataproc:
gcloud dataproc jobs submit pyspark \
--cluster MY-CLUSTER --region us-west1 --project MY_PROJECT \
--properties spark.driver.extraJavaOptions=-Djava.security.auth.login.config=file:///path/to/keytab/my_jaas_file.config,spark.executor.extraJavaOptions=-Djava.security.auth.login.config=file:///path/to/keytab/my_jaas_file.config \
gs://CODE_BUCKET/path/to/python/main.py jaas配置文件被spark进程正确地从磁盘中获取和读取,因为我有意从一个节点中删除它,并且它由于“found”错误而失败。没有选择keytab文件或ticketCache文件,并且正在生成以下错误:
org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner authentication information from the user
javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner authentication information from the user在深入研究了Krb5LoginModule文档之后,这似乎是默认行为:
当提供多个检索票证或密钥的机制时,首选顺序是:
备选案文1:
Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner authentication information from the user我尝试过多种方法来定义jaas配置中的keytab / ccache文件:
keyTab="file:/path/to/keytab/my_keytab_file.keytab"
keyTab="file:///path/to/keytab/my_keytab_file.keytab"
keyTab="local:/path/to/keytab/my_keytab_file.keytab"但他们似乎都没有拿起如此需要的keytab文件。
有很多事情火花和数据处理在幕后做。
发布于 2022-06-29 09:18:05
设法解决了!
看来其他用户无法访问ccache文件/ keytab文件。
sudo chmod 744 /path/to/keytab/my_jaas_file.config
sudo chmod 744 /path/to/keytab/krb5_ccache
sudo chmod 744 /path/to/keytab/my_keytab_file.keytab作业运行在具有根用户的驱动程序上,但在执行程序上不使用root运行。它可能正在使用纱线或hadoop用户。
希望这能帮助其他流浪的灵魂!
https://stackoverflow.com/questions/72776534
复制相似问题