假设我有一个host:port格式的字符串,其中:port是可选的。如何可靠地提取这两个组件?
主机可以是下列任何一种:
localhost,www.google.com)1.2.3.4)[aaaa:bbbb::cccc])。换句话说,这是在互联网上使用的标准格式(例如在URI中:在https://www.rfc-editor.org/rfc/rfc3986#section-3.2完成语法,不包括“用户信息”组件)。
因此,一些可能的输入和期望的输出:
'localhost' -> ('localhost', None)
'my-example.com:1234' -> ('my-example.com', 1234)
'1.2.3.4' -> ('1.2.3.4', None)
'[0abc:1def::1234]' -> ('[0abc:1def::1234]', None)发布于 2017-10-22 23:15:21
这是我的最后一次尝试,感谢其他给出灵感的回答者:
def parse_hostport(s, default_port=None):
if s[-1] == ']':
# ipv6 literal (with no port)
return (s, default_port)
out = s.rsplit(":", 1)
if len(out) == 1:
# No port
port = default_port
else:
try:
port = int(out[1])
except ValueError:
raise ValueError("Invalid host:port '%s'" % s)
return (out[0], port)发布于 2018-11-06 13:10:21
这是Python,里面有电池。您已经提到该格式是URI中使用的标准格式,那么urllib.parse如何?
import urllib.parse
def parse_hostport(hp):
# urlparse() and urlsplit() insists on absolute URLs starting with "//"
result = urllib.parse.urlsplit('//' + hp)
return result.hostname, result.port这应该可以处理任何可以抛出的有效host:port。
发布于 2017-10-22 17:45:14
这应该在一个正则表达式中处理整个解析。
regex = re.compile(r'''
( # first capture group = Addr
\[ # literal open bracket IPv6
[:a-fA-F0-9]+ # one or more of these characters
\] # literal close bracket
| # ALTERNATELY
(?: # IPv4
\d{1,3}\. # one to three digits followed by a period
){3} # ...repeated three times
\d{1,3} # followed by one to three digits
| # ALTERNATELY
[-a-zA-Z0-9.]+ # one or more hostname chars ([-\w\d\.]) Hostname
) # end first capture group
(?:
: # a literal :
( # second capture group = PORT
\d+ # one or more digits
) # end second capture group
)? # ...or not.''', re.X)那么所需要的就是将第二组转换为int。
def parse_hostport(hp):
# regex from above should be defined here.
m = regex.match(hp)
addr, port = m.group(1, 2)
try:
return (addr, int(port))
except TypeError:
# port is None
return (addr, None)https://stackoverflow.com/questions/46876770
复制相似问题