我正在使用C#创建一个简单的iCalendar,发现RFC2445 4.1节的内容折叠是相当令人头疼的(对我来说:-)。
http://www.apps.ietf.org/rfc/rfc2445.html#sec-4.1
对于长行,您要转义一些字符(我相信是反斜杠、分号、逗号和换行符),然后将其折叠,使行的长度不超过75个八位字节。我在网上找到了几种直接的方法来做这件事。最简单的方法是将有问题的字符替换为转义版本,然后在每75个字符处插入CRLF。类似于:
// too simple, could break at an escape sequence boundary or multi-byte character may overflow 75 octets
txt = txt.Replace(@"\", "\\\\").Replace(";", "\\;").Replace(",", "\\,").Replace("\r\n", "\\n");
var regex = new System.Text.RegularExpressions.Regex( ".{75}");
var escape_and_folded = regex.Replace( txt, "$0\r\n ");我看到了两个问题。CRLF可能被插入到转义序列中。例如,如果发生插入使得转义的新行序列“\n”变成“\CRLF”(那么“n”将在下一行上)。第二个问题是当有多字节字符时。由于计算是按字符计算的,因此该行的长度可能会超过75个八位字节。
一个简单的解决方案是逐个字符遍历字符串,然后转义和折叠,但这似乎是一种蛮力。有谁有更优雅的解决方案吗?
发布于 2012-10-29 19:54:00
首先,请确保您使用的是RFC5545。RFC2445已过时。你可以在这里找到我的PHP实现:
https://github.com/fruux/sabre-vobject/blob/master/lib/Property.php#L252
在php中,我们有mb_strcut函数。我不确定是否有等同于.NET的东西,但这至少会让事情变得简单得多。到目前为止,我还没有遇到将转义序列(\)一分为二的问题。一个好的解析器会首先展开代码行,然后才会处理取消转义。尤其是因为哪些字符必须转义,这取决于实际的属性。( ,或;有时会转义,有时不会)。
发布于 2014-11-27 15:09:30
我试过你的解决方案--除了它还折叠了一些长度不到75个八位字节的线之外,它是有效的。因此,我按照传统重写了代码(即不使用正则表达式-我确实很怀念它们),如下所示。
public static string FoldLines(this string value, int max, string newline = "\r\n")
{
var lines = value.Split(new string[]{newline}, System.StringSplitOptions.RemoveEmptyEntries);
using (var ms = new System.IO.MemoryStream(value.Length))
{
var crlf = Encoding.UTF8.GetBytes(newline); //CRLF
var crlfs = Encoding.UTF8.GetBytes(string.Format("{0} ", newline)); //CRLF and SPACE
foreach (var line in lines)
{
var bytes = Encoding.UTF8.GetBytes(line);
var len = Encoding.UTF8.GetByteCount(line);
if (len <= max)
{
ms.Write(bytes, 0, len);
ms.Write(crlf, 0, crlf.Length);
}
else
{
var blen = len / max; //calculate block length
var rlen = len % max; //calculate remaining length
var b = 0;
while (b < blen)
{
ms.Write(bytes, (b++) * max, max);
ms.Write(crlfs, 0, crlfs.Length);
}
if (rlen > 0)
{
ms.Write(bytes, blen * max, rlen);
ms.Write(crlf, 0, crlf.Length);
}
}
}
return Encoding.UTF8.GetString(ms.ToArray());
}
}笔记:
DESCRIPTION我尽量做到简洁--即我不是按字符而是以八位字节的形式解析字符串(由八位字节决定),最好在生成的VCALENDAR对象上调用max).
公共静态字符串Replace(此字符串值,IEnumerable>对){ foreach (var对)值= value.Replace(pair.Item1,pair.Item2);返回值;}公共静态字符串EscapeStrings(此字符串值){ return value.Replace(新List> {新Tuple(@"\","\\"),新Tuple(";",@“\”);"),新建Tuple(",",@"\,"),新建Tuple("\r\n",@"\n"),});}
发布于 2016-05-25 19:22:32
reexmonkey的解决方案在中间折叠的行上写了76个字符,因为它不会减去用crlfs添加的额外空格字符
我重写了折叠函数来纠正这个错误:
public static string FoldLines(string value, int max, string newline = "\r\n")
{
var lines = value.Split(new string[] { newline }, System.StringSplitOptions.RemoveEmptyEntries);
using (var ms = new System.IO.MemoryStream(value.Length))
{
var crlf = Encoding.UTF8.GetBytes(newline); //CRLF
var crlfs = Encoding.UTF8.GetBytes(string.Format("{0} ", newline)); //CRLF and SPACE
foreach (var line in lines)
{
var bytes = Encoding.UTF8.GetBytes(line);
var len = Encoding.UTF8.GetByteCount(line);
if (len <= max)
{
ms.Write(bytes, 0, len);
ms.Write(crlf, 0, crlf.Length);
}
else
{
var offset = 0; //current offset position
var count = max; //characters to take
while (offset + count < len)
{
ms.Write(bytes, offset, count);
ms.Write(crlfs, 0, crlfs.Length);
offset += count;
count = max - 1;
}
count = len - offset; //remaining characters
if (count > 0)
{
ms.Write(bytes, offset, count);
ms.Write(crlf, 0, crlf.Length);
}
}
}
return Encoding.UTF8.GetString(ms.ToArray());
}
}另外,我在EscapeStrings函数中添加了一个额外的元组:
public static string ReplaceText(string value, IEnumerable<Tuple<string, string>> pairs)
{
foreach (var pair in pairs) value = value.Replace(pair.Item1, pair.Item2);
return value;
}
public static string EscapeStrings(string value)
{
return ReplaceText(value, new List <Tuple<string, string>>
{
new Tuple<string, string>(@"\", "\\\\"),
new Tuple<string, string>(";", @"\;"),
new Tuple<string, string>(",", @"\,"),
new Tuple<string, string>("\r\n", @"\n"),
new Tuple<string, string>("\n", @"\n"),
});
}https://stackoverflow.com/questions/13055298
复制相似问题