我需要检查字符串是否符合以下规则:http://www.w3.org/TR/widgets/#zip-rel-path
Zip-rel-path = [locale-folder] *folder-name file-name /
[locale-folder] 1*folder-name
locale-folder = %x6C %x6F %x63 %x61 %x6C %x65 %x73
"/" lang-tag "/"
folder-name = file-name "/"
file-name = 1*allowed-char
allowed-char = safe-char / zip-UTF8-char
zip-UTF8-char = UTF8-2 / UTF8-3 / UTF8-4
safe-char = ALPHA / DIGIT / SP / "$" / "%" /
"'" / "-" / "_" / "@" / "~" /
"(" / ")" / "&" / "+" / "," /
"=" / "[" / "]" / "."
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
%xF4 %x80-8F 2( UTF8-tail )
UTF8-tail = %x80-BF
lang-tag = primary-subtag *( "-" subtag )
primary-subtag = 1*8low-alphasubtag = 1*8(alphanum)
alphanum = low-alpha / DIGITlow-alpha = %x61-7a上面规则的代码示例会有所帮助,我对ABNF并不熟悉。我不需要一种方法来解析ABNF,我只需要由习惯或理解ABNF的人手动翻译的上述规则,到python代码的正则表达式或任何其他方式。实际上,只需输入一个字符串并根据上面提到的规则进行验证,最终作为一个函数输入一个字符串,并在规则匹配与否时返回true或false。所以用一个问题的形式来表达:这在python中实现时会是什么样子?
我从UTF8文档中看到,上面规则中的大部分只是检查字符串是否为utf8:https://www.rfc-editor.org/rfc/rfc3629
UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
UTF8-1 = %x00-7F
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
%xF4 %x80-8F 2( UTF8-tail )
UTF8-tail = %x80-BF 发布于 2012-08-29 17:31:19
我试着为你写一个解析器。
我同意bulk是对UTF-8的测试,如果字符串中已经有了值,这是多余的(UTF-8是文件系统上的编码,unicode是有效UTF-8的内部表示)。这确实极大地简化了事情。
据我所知,BNF说:
的混合
的路径
也就是说,这里有一个简单的实现(出于调试的目的,它捕获解析的输出。我这样做是为了调试,但如果您不需要它,请随时删除它)。路径中的错误会导致ZipRelPath构造函数引发ValueError:
import re
class ZipRelPath:
FILE_NAME_RE = re.compile(u"^[a-zA-Z0-9 \$\%\'\-_@~\(\)&+,=\[\]\.\u0080-\uFFFF]+$")
LANG_TAG_RE = re.compile("^[a-z]{1,8}(\-[a-z0-9]{1,8})*$")
LOCALES = "locales/"
def __init__(self, path):
self.path = path
self.lang_tag = None
self.folders = []
self.file_name = None
self._parse_locales()
self._parse_folders()
def _parse_locales(self):
"""Consumes any leading 'locales' and lang-tag"""
if self.path.startswith(ZipRelPath.LOCALES):
self.path = self.path[len(ZipRelPath.LOCALES):]
self._parse_lang_tag()
def _parse_lang_tag(self):
"""Parses, consumes and validates the lang-tag"""
self.lang_tag, _, self.path = self.path.partition("/")
if not self.path:
raise ValueError("lang-tag missing closing /")
if not ZipRelPath.LANG_TAG_RE.match(self.lang_tag):
raise ValueError(u"'%s' is not a valid language tag" % self.lang_tag)
def _parse_folders(self):
"""Handles the folders and file-name after the locale"""
while (self.path):
self._parse_folder_or_file()
if not self.folders and not self.file_name:
raise ValueError("Missing folder or file name")
def _parse_folder_or_file(self):
"""Each call consumes a single path entry, validating it"""
folder_or_file, _, self.path = self.path.partition("/")
if not ZipRelPath.FILE_NAME_RE.match(folder_or_file):
raise ValueError(u"'%s' is not a valid file or folder name" % folder_or_file)
if self.path:
self.folders.append(folder_or_file)
else:
self.file_name = folder_or_file
def __unicode__(self):
return u"ZipRelPath [lang-tag: %s, folders: %s, file_name: %s" % (self.lang_tag, self.folders, self.file_name)还有一组简短的测试:
GOOD = [
"$%'-_@~()&+,=[].txt9",
"my/path/to/file.txt",
"locales/en/file.txt",
"locales/en-us/file.txt",
"locales/en-us-abc123-xyz/file.txt",
"locales/abcdefgh-12345678/file.txt",
"locales/en/my/path/to/file.txt",
u"my\u00A5\u0160\u039E\u04FE\u069E\u0BCC\uFFFD/path/to/file.txt"
]
BAD = [
"",
"/starts/with/slash",
"bad^file",
"locales//bad/locale",
"locales/en123/bad/locale",
"locales/EN/bad/locale",
"locales/en-US/bad/locale",
]
for path in GOOD:
print unicode(ZipRelPath(path))
for path in BAD:
try:
zip = ZipRelPath(path)
raise Exception("Illegal path {0} was accepted by {1}".format(path, zip))
except ValueError as exception:
print "Incorrect path '{0}' fails with: {1}".format(path, exception)这会产生:
ZipRelPath [lang-tag: None, folders: [], file_name: $%'-_@~()&+,=[].txt9
ZipRelPath [lang-tag: None, folders: ['my', 'path', 'to'], file_name: file.txt
ZipRelPath [lang-tag: en, folders: [], file_name: file.txt
ZipRelPath [lang-tag: en-us, folders: [], file_name: file.txt
ZipRelPath [lang-tag: en-us-abc123-xyz, folders: [], file_name: file.txt
ZipRelPath [lang-tag: abcdefgh-12345678, folders: [], file_name: file.txt
ZipRelPath [lang-tag: en, folders: ['my', 'path', 'to'], file_name: file.txt
ZipRelPath [lang-tag: None, folders: [u'my\xa5\u0160\u039e\u04fe\u069e\u0bcc\ufffd', u'path', u'to'], file_name: file.txt
Incorrect path '' fails with: Missing folder or file name
Incorrect path '/starts/with/slash' fails with: '' is not a valid file or folder name
Incorrect path 'bad^file' fails with: 'bad^file' is not a valid file or folder name
Incorrect path 'locales//bad/locale' fails with: '' is not a valid language tag
Incorrect path 'locales/en123/bad/locale' fails with: 'en123' is not a valid language tag
Incorrect path 'locales/EN/bad/locale' fails with: 'EN' is not a valid language tag
Incorrect path 'locales/en-US/bad/locale' fails with: 'en-US' is not a valid language tag如果您的测试用例失败了,请让我知道,我会看看是否可以修复它。
发布于 2012-08-21 00:31:59
你可能应该试试pyparsing。这是一个来自pyparsing网站的quick example,你可以很容易地修改它来满足你的目的。
https://stackoverflow.com/questions/12041439
复制相似问题