我的问题是,假设我有一个字符串:
“跳过懒惰的狗”它有8个单词,我还有一些其他字符串,我必须将上面的字符串进行比较,这些字符串是:
例如,用户给出的阈值(匹配字符串的比率,百分比)为60%,这意味着
=8* 60 /100 (此处8为上述字符串的总字数,60为阈值)
= 4.8
这意味着至少有4个单词应该匹配,这意味着结果应该是
如何在c#中进行模糊匹配,请帮助我。
发布于 2015-11-12 07:44:51
我建议比较一下dictionarie_s,而不是_strings
所以实现
public static Dictionary<String, int> WordsToCounts(String value) {
if (String.IsNullOrEmpty(value))
return new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);
return value
.Split(' ', '\r', '\n', '\t')
.Select(item => item.Trim(',', '.', '?', '!', ':', ';', '"'))
.Where(item => !String.IsNullOrEmpty(item))
.GroupBy(item => item, StringComparer.OrdinalIgnoreCase)
.ToDictionary(chunk => chunk.Key,
chunk => chunk.Count(),
StringComparer.OrdinalIgnoreCase);
}
public static Double DictionaryPercentage(
IDictionary<String, int> left,
IDictionary<String, int> right) {
if (null == left)
if (null == right)
return 1.0;
else
return 0.0;
else if (null == right)
return 0.0;
int all = left.Sum(pair => pair.Value);
if (all <= 0)
return 0.0;
double found = 0.0;
foreach (var pair in left) {
int count;
if (!right.TryGetValue(pair.Key, out count))
count = 0;
found += count < pair.Value ? count : pair.Value;
}
return found / all;
}
public static Double StringPercentage(String left, String right) {
return DictionaryPercentage(WordsToCounts(left), WordsToCounts(right));
}你们提供的样本将是
String original = "Quick Brown Fox Jumps over the lazy dog";
String[] extracts = new String[] {
"This is un-match string with above string.",
"Quick Brown fox Jumps.",
"brown fox jumps over the lazy.",
"quick brown fox over the dog.",
"fox jumps over the lazy dog.",
"jumps over the.",
"lazy dog.",
};
var data = extracts
.Select(item => new {
text = item,
perCent = StringPercentage(original, item) * 100.0
})
//.Where(item => item.perCent >= 60.0) // uncomment this to apply threshold
.Select(item => String.Format(CultureInfo.InvariantCulture,
"\"{0}\" \t {1:F2}%",
item.text, item.perCent));
String report = String.Join(Environment.NewLine, data);
Console.write(report);报告是
"This is un-match string with above string." 0.00%
"Quick Brown fox Jumps." 50.00%
"brown fox jumps over the lazy." 75.00%
"quick brown fox over the dog." 75.00%
"fox jumps over the lazy dog." 75.00%
"jumps over the." 37.50%
"lazy dog." 25.00%发布于 2015-11-12 07:15:40
正则表达式应该是这样的。
(\bWord1\b|\bWord2\b|\bWord3\b|\betc\b)然后你只需数一数火柴,并将其与字数进行比较。
string sentence = "Quick Brown Fox Jumps over the lazy dog";
string[] words = sentence.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
Regex regex = new Regex("(" + string.Join("|", words.Select(x => @"\b" + x + @"\b"))) + ")", RegexOptions.IgnoreCase);
string input = "Quick Brown fox Jumps";
int threshold = 60;
var matches = regex.Matches(input);
bool isMatch = words.Length*threshold/100 <= matches.Count;
Console.WriteLine(isMatch);https://stackoverflow.com/questions/33665871
复制相似问题