试图通过Ambari (v2.7.3.0) (HDP 3.1.0.0-78)将客户端节点添加到集群中,并看到奇怪的错误
stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 38, in <module>
BeforeAnyHook().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 31, in hook
setup_users()
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/shared_initialization.py", line 51, in setup_users
fetch_nonlocal_groups = params.fetch_nonlocal_groups,
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/accounts.py", line 90, in action_create
shell.checked_call(command, sudo=True)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
self.save_component_version_to_structured_out(self.command_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
stack_select_package_name = stack_select.get_package_name()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
supported_packages = get_supported_packages()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select
stdout:
2019-11-25 13:07:57,644 - Stack Feature Version Info: Cluster Stack=3.1, Command Stack=None, Command Version=None -> 3.1
2019-11-25 13:07:57,651 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2019-11-25 13:07:57,652 - Group['livy'] {}
2019-11-25 13:07:57,654 - Group['spark'] {}
2019-11-25 13:07:57,654 - Group['ranger'] {}
2019-11-25 13:07:57,654 - Group['hdfs'] {}
2019-11-25 13:07:57,654 - Group['zeppelin'] {}
2019-11-25 13:07:57,655 - Group['hadoop'] {}
2019-11-25 13:07:57,655 - Group['users'] {}
2019-11-25 13:07:57,656 - User['yarn-ats'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,658 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-25 13:07:57,971 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed
2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
self.save_component_version_to_structured_out(self.command_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
stack_select_package_name = stack_select.get_package_name()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
supported_packages = get_supported_packages()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select
Command failed after 1 tries问题似乎是
resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd起因于
2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']在将主机添加到集群之前,手动添加ambari 1.1.repo和yum-安装hdp-select会显示相同的错误消息,这一事实进一步加强了这一点,只是将其截断到这里显示的stdout/err的部分。
跑步时
[root@HW001 .ssh]# /usr/bin/hdp-select versions
3.1.0.0-78从ambari服务器节点,我可以看到命令正在运行。
查看钩子脚本试图运行/访问的内容,我看到了
[root@client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
-rw-r--r-- 1 root root 1.2K Nov 25 10:51 /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
[root@client001~]# ls -lha /var/lib/ambari-agent/data/command-632.json
-rw------- 1 root root 545K Nov 25 13:07 /var/lib/ambari-agent/data/command-632.json
[root@client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY
total 0
drwxr-xr-x 4 root root 34 Nov 25 10:51 .
drwxr-xr-x 8 root root 147 Nov 25 10:51 ..
drwxr-xr-x 2 root root 34 Nov 25 10:51 files
drwxr-xr-x 2 root root 188 Nov 25 10:51 scripts
[root@client001~]# ls -lha /var/lib/ambari-agent/data/structured-out-632.json
ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory
[root@client001~]# ls -lha /var/lib/ambari-agent/tmp
total 96K
drwxrwxrwt 3 root root 4.0K Nov 25 13:06 .
drwxr-xr-x 10 root root 267 Nov 25 10:50 ..
drwxr-xr-x 6 root root 4.0K Nov 25 13:06 ambari_commons
-rwx------ 1 root root 1.4K Nov 25 13:06 ambari-sudo.sh
-rwxr-xr-x 1 root root 1.6K Nov 25 13:06 create-python-wrap.sh
-rwxr-xr-x 1 root root 1.6K Nov 25 10:50 os_check_type1574715018.py
-rwxr-xr-x 1 root root 1.6K Nov 25 11:12 os_check_type1574716360.py
-rwxr-xr-x 1 root root 1.6K Nov 25 11:29 os_check_type1574717391.py
-rwxr-xr-x 1 root root 1.6K Nov 25 13:06 os_check_type1574723161.py
-rwxr-xr-x 1 root root 16K Nov 25 10:50 setupAgent1574715020.py
-rwxr-xr-x 1 root root 16K Nov 25 11:12 setupAgent1574716361.py
-rwxr-xr-x 1 root root 16K Nov 25 11:29 setupAgent1574717392.py
-rwxr-xr-x 1 root root 16K Nov 25 13:06 setupAgent1574723163.py注意,这里有ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory。但不确定这是否正常。
有谁知道是什么原因导致了这种情况,或者从这一点引发的任何调试提示?
更新01:在错误跟踪的最后一行附近添加一些日志打印行,即。File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages,我打印代码和stdout:
2
ambari-python-wrap: can't open file '/usr/bin/hdp-select': [Errno 2] No such file or directory那到底怎么回事?它希望hdp-select已经存在,但是如果我事先手动安装这个二进制文件,那么ambari就会抱怨。当我手动安装它(使用与其他现有集群节点相同的repo文件)时,我看到的是.
0
Packages:
accumulo-client
accumulo-gc
accumulo-master
accumulo-monitor
accumulo-tablet
accumulo-tracer
atlas-client
atlas-server
beacon
beacon-client
beacon-server
druid-broker
druid-coordinator
druid-historical
druid-middlemanager
druid-overlord
druid-router
druid-superset
falcon-client
falcon-server
flume-server
hadoop-client
hadoop-hdfs-client
hadoop-hdfs-datanode
hadoop-hdfs-journalnode
hadoop-hdfs-namenode
hadoop-hdfs-nfs3
hadoop-hdfs-portmap
hadoop-hdfs-secondarynamenode
hadoop-hdfs-zkfc
hadoop-httpfs
hadoop-mapreduce-client
hadoop-mapreduce-historyserver
hadoop-yarn-client
hadoop-yarn-nodemanager
hadoop-yarn-registrydns
hadoop-yarn-resourcemanager
hadoop-yarn-timelinereader
hadoop-yarn-timelineserver
hbase-client
hbase-master
hbase-regionserver
hive-client
hive-metastore
hive-server2
hive-server2-hive
hive-server2-hive2
hive-webhcat
hive_warehouse_connector
kafka-broker
knox-server
livy-client
livy-server
livy2-client
livy2-server
mahout-client
oozie-client
oozie-server
phoenix-client
phoenix-server
pig-client
ranger-admin
ranger-kms
ranger-tagsync
ranger-usersync
shc
slider-client
spark-atlas-connector
spark-client
spark-historyserver
spark-schema-registry
spark-thriftserver
spark2-client
spark2-historyserver
spark2-thriftserver
spark_llap
sqoop-client
sqoop-server
storm-client
storm-nimbus
storm-slider-client
storm-supervisor
superset
tez-client
zeppelin-server
zookeeper-client
zookeeper-server
Aliases:
accumulo-server
all
client
hadoop-hdfs-server
hadoop-mapreduce-server
hadoop-yarn-server
hive-server
Command failed after 1 tries更新02:从File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 322打印一些自定义日志(打印err_msg、code、out、err的值),即。
....
312 if throw_on_failure and not code in returns:
313 err_msg = Logger.filter_text("Execution of '{0}' returned {1}. {2}".format(command_alias, c ode, all_output))
314
315 #TODO remove
316 print("\n----------\nMY LOGS\n----------\n")
317 print(err_msg)
318 print(code)
319 print(out)
320 print(err)
321
322 raise ExecutionFailed(err_msg, code, out, err)
323
324 # if separate stderr is enabled (by default it's redirected to out)
325 if stderr == subprocess32.PIPE:
326 return code, out, err
327
328 return code, out
....我明白了
Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
6
usermod: user 'hive' does not exist in /etc/passwd
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-816.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-816.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-26 10:25:46,928 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed因此,它似乎无法创建hive用户(尽管在此之前创建yarn-ats用户似乎没有问题)
发布于 2019-11-26 21:18:54
在放弃并尝试自己手动创建蜂巢用户之后,我看到
[root@airflowetl ~]# useradd -g hadoop -s /bin/bash hive
useradd: user 'hive' already exists
[root@airflowetl ~]# cat /etc/passwd | grep hive
<nothing>
[root@airflowetl ~]# id hive
uid=379022825(hive) gid=379000513(domain users) groups=379000513(domain users)这个现有用户的uid看起来是这样的,并且不在/etc/passwd文件中,这使我认为,已经有一些已有的Active Directory用户(这个客户端节点通过已安装的SSSD与其同步)已经有了这个名称。检查我们的AD用户,结果证明这是真的。
在重新运行客户机主机Ambari之前,暂时停止与AD (service sssd stop)的同步(因为不确定是否可以让服务器根据单个用户来忽略AD同步),我解决了这个问题。
https://stackoverflow.com/questions/59041580
复制相似问题