是否可以使用webhdfs检查HDFS中目录的内容?
这可以像hdfs dfs -ls通常所做的那样工作,但是可以使用webhdfs。
如何使用Python2.6列出webhdfs目录?
发布于 2016-06-23 11:24:37
您可以使用LISTSTATUS动词。文档位于列出目录,在WebHDFS REST文档上可以找到以下代码:
对于curl,如下所示:
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS"响应是一个FileStatuses JSON对象:
{
"name" : "FileStatuses",
"properties":
{
"FileStatuses":
{
"type" : "object",
"properties":
{
"FileStatus":
{
"description": "An array of FileStatus",
"type" : "array",
"items" : fileStatusProperties
}
}
}
}
}fileStatusProperties (用于items字段)具有以下JSON模式:
var fileStatusProperties =
{
"type" : "object",
"properties":
{
"accessTime":
{
"description": "The access time.",
"type" : "integer",
"required" : true
},
"blockSize":
{
"description": "The block size of a file.",
"type" : "integer",
"required" : true
},
"group":
{
"description": "The group owner.",
"type" : "string",
"required" : true
},
"length":
{
"description": "The number of bytes in a file.",
"type" : "integer",
"required" : true
},
"modificationTime":
{
"description": "The modification time.",
"type" : "integer",
"required" : true
},
"owner":
{
"description": "The user who is the owner.",
"type" : "string",
"required" : true
},
"pathSuffix":
{
"description": "The path suffix.",
"type" : "string",
"required" : true
},
"permission":
{
"description": "The permission represented as a octal string.",
"type" : "string",
"required" : true
},
"replication":
{
"description": "The number of replication of a file.",
"type" : "integer",
"required" : true
},
"type":
{
"description": "The type of the path object.",
"enum" : ["FILE", "DIRECTORY"],
"required" : true
}
}
};您可以使用pywebhdfs处理Python中的文件名,如下所示:
import json
from pprint import pprint
from pywebhdfs.webhdfs import PyWebHdfsClient
hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs') # Use your own host/port/user_name config
data = hdfs.list_dir("dir/dir") # Use your preferred directory, without the leading "/"
file_statuses = data["FileStatuses"]
pprint file_statuses # Display the dict
for item in file_statuses["FileStatus"]:
print item["pathSuffix"] # Display the item filename与对每个对象进行print不同,您实际上可以根据需要使用这些项。file_statuses的结果只是一个Python dict,所以它可以像任何其他dict一样使用,只要您使用正确的键。
https://stackoverflow.com/questions/37989520
复制相似问题