我目前正在使用Gmail API抓取Gmail数据。我正在抓取的一些电子邮件包含一些粗俗的片段,如下所示:
8⅜
6⅞
7¾
7⅞使用Gmail API的上述粗俗片段的HTML输出如下所示:
8=E2=85=9C
6=E2=85=9E
7=C2=BE
7=E2=85=9E如何将这些字符串转换回'8 3/8'之类的字符串,以便在Python中处理?
发布于 2021-05-29 21:59:04
字符串使用quoted printable编码进行编码,这是一种将非ASCII字节编码为ASCII码的方法。你可以像这样解码成str:
import quopri
s = '8=E2=85=9C'
f = quopri.decodestring(s).decode('utf-8')
print(f)打印
8⅜它由str(8)加上unicode字符VULGAR FRACTION THREE EIGHTHS组成。
我们可以使用unicode normalisation进一步分解字符串。
import unicodedata as ud
decomposed = ud.normalize('NFKD', f)
print(decomposed)输出
83⁄8我们可以组合这些方法来获取每个字符串的所有部分,并将它们转换为int或fractions
import fractions
import quopri
import unicodedata as ud
values = """\
8=E2=85=9C
6=E2=85=9E
7=C2=BE
7=E2=85=9E
"""
for value in values.splitlines():
string_ = quopri.decodestring(value).decode('utf-8')
# Assume each string is composed solely of one or more digits,
# with the fraction character at the end
int_part = int(string_[:-1])
normalised = ud.normalize('NFKD', string_[-1])
# Note that the separator character here is chr(8260),
# the 'FRACTION SLASH' character, not the ASCII 'SOLIDUS'
nominator, _, denominator = normalised.partition('⁄')
fractional_part = fractions.Fraction(*map(int, (nominator, denominator)))
print(f'Integer part {int_part}, fractional part {fractional_part!r}')
print()结果:
Integer part 8, fractional part Fraction(3, 8)
Integer part 6, fractional part Fraction(7, 8)
Integer part 7, fractional part Fraction(3, 4)
Integer part 7, fractional part Fraction(7, 8)通常情况下,可以将Fraction实例转换为float或str:
>>> ff = fractions.Fraction(15, 8)
>>> ff
Fraction(15, 8)
>>> str(ff)
'15/8'
>>> float(ff)
1.875https://stackoverflow.com/questions/67752044
复制相似问题