我正在创建这个简短的桌面应用程序,它可以清除额外的空间或从字符串中输入。你知道,有时候当你复制pdf格式的文本,把它放到谷歌翻译器上,而不是你粘贴的时候,文字就像刹车一样,有额外的输入或空格。所以我为我创建了一个简单的应用程序,它清理了这些额外的空间,并将其加入到一个段落中。
下面是我调试错误的代码和内容:
List<string> content = new List<string>();
TextRange textRange = new TextRange(RichTb1.Document.ContentStart, RichTb1.Document.ContentEnd);
TextRange joiniText = new TextRange(RichTb2.Document.ContentStart, RichTb2.Document.ContentEnd);
string[] lines = textRange.Text.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
//to here is all ok, you can see in my List "lines" all lines that I have put it on RichTb1
content.AddRange(lines);
//this is just validation if entry in RichTb1 is empty (if not empty procede with action)
string match1 = content.ElementAt(0);
if (!string.IsNullOrWhiteSpace(match1))
{
//**Here is problem, it clean all spaces or enters - empty lines, but also it clean not empty lines it also cleans some strings, see example down**
content = content.Where(s => !string.IsNullOrWhiteSpace(s)).Distinct().ToList();
joinText.Text = content.Aggregate((i, j) => i + " " + j);
}这是它所做的结果,例如,您可以随意放置一些这样的文本:
"Chapter 4 illustrates the growing recognition
of
the
benefits
of
community
management
of
natural
resources.
To
ensure
that
such
approaches
do
not
exclude
poor
people,
**women,
the
elderly**
and
other
marginalized
groups,
governments
and
other
organizations
that
sponsor
community-based
projects
need
to
involve
all
groups
in
decision-making
and
implementation."我的应用程序的结果是:
"Chapter 4 illustrates the growing recognition of the benefits community management natural resources. To ensure that such approaches do not exclude poor people, **women, elderly** and other marginalized groups, governments organizations sponsor community-based projects need to involve all groups in decision-making implementation."正如您所看到的(这只是一个例子),它只是清除了一些它不应该使用的单词,例如在上面(强文本),您可以看到,单词"the"缺失了,在第一个文本中有这个单词。在我的台词里我也能看到这个词。但是,当行出现问题时,它会清除不应该使用的字符串(单词)。
有什么问题吗..。提前感谢
发布于 2014-10-20 21:07:20
DISTINCT只允许返回不同的单词。只要移除它,你就不会再有问题了。
参见这里的MSDN文档:http://msdn.microsoft.com/en-us/library/system.linq.enumerable.distinct(v=vs.95).aspx
发布于 2014-10-20 21:25:02
即使它被接受了,我也会建议一个不酷的方法。普通的StringBuilder更高效、更可靠:
StringBuilder sb = new StringBuilder(text.Length);
bool firstSpace = true;
char[] dont = { '\n', '\r' };
for(int i = 0; i < text.Length; i++)
{
char c = text[i];
if (dont.Contains(c)) c = ' '; // replace new-line characters with a single space
bool isWhiteSpace = Char.IsWhiteSpace(c) ;
bool append = !isWhiteSpace || firstSpace;
firstSpace = !isWhiteSpace;
if(append) sb.Append(c);
}
string withOneSpaceAndNoLines = sb.ToString();https://stackoverflow.com/questions/26474788
复制相似问题