文章/答案/技术大牛

发布

社区首页 >问答首页 >以最高的相似性将两组数据中的项匹配。

问以最高的相似性将两组数据中的项匹配。
EN

Stack Overflow用户

提问于 2018-02-16 11:29:53

回答 2查看 321关注 0票数 0

任务:我有两列有产品名称。我需要从B列中找到最相似的单元格，用于单元格A1，然后是A2、A3等。

输入：

Col A | Col B
-------------
Red   | Blackwell
Black | Purple      
White | Whitewater     
Green | Reddit

输出：

Red = Reddit / 66%相似

黑色= Blackwell / 71%相似

白色=白水/ 66%相似

绿色= Reddit / 30%相似

我认为Levenstein距离可以帮助排序，但我不知道如何应用它。

事先谢谢，任何信息都有帮助。

php

matching

levenshtein-distance

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-02-16 13:13:54

使用嵌套循环

<?php

// Arrays of words
$colA = ['Red', 'Black', 'White', 'Green'];
$colB = ['Blackwell', 'Purple', 'Whitewater', 'Reddit'];

// loop through words to find the closest
foreach ($colA as $a) {

    // Current max number of matches
    $maxMatches = -1;
    $bestMatch = '';

    foreach ($colB as $b) {

        // Calculate the number of matches
        $matches = similar_text($a, $b, $percent);

        if ($matches > $maxMatches) {

            // Found a better match, update
            $maxMatches = $matches;
            $bestMatch = $b;
            $matchPercentage = $percent;

        }

    }

    echo "$a = $bestMatch / " . 
        number_format($matchPercentage, 2) . 
        "% similar\n";
}

第一个循环迭代第一个数组的元素，为每个元素初始化找到的最佳匹配和匹配字符的数量。

内循环遍历可能匹配的数组，寻找最佳匹配，对于每个候选对象，它检查相似之处(您可以在这里使用levenshtein而不是similar_text，但后者比较方便，因为它为您计算百分比)，如果当前单词比更新该变量的当前最佳匹配更匹配，则后者更方便。

对于外循环中的每一个单词，我们响应找到的最佳匹配和百分比。按要求格式化。

票数 3

Stack Overflow用户

发布于 2018-02-16 13:36:57

我不知道您在哪里得到这些期望的百分比，所以我将只使用php函数生成的值，您可以决定是否要对它们执行任何计算。

levenshtein()根本不提供您在问题中所要求的匹配。我认为使用similar_text()会更明智。

代码：(演示)

$arrayA=['Red','Black','White','Green'];
$arrayB=['Blackwell','Purple','Whitewater','Reddit'];

// similar text
foreach($arrayA as $a){
    $temp=array_combine($arrayB,array_map(function($v)use($a){similar_text($v,$a,$percent); return $percent;},$arrayB));  // generate assoc array of assessments
    arsort($temp);  // sort descending 
    $result[]="$a is most similar to ".key($temp)." (sim-score:".number_format(current($temp))."%)";  // access first key and value
}
var_export($result);

echo "\n--\n";
// levenstein doesn't offer the desired matching
foreach($arrayA as $a){
    $temp=array_combine($arrayB,array_map(function($v)use($a){return levenshtein($v,$a);},$arrayB));  // generate assoc array of assessments
    arsort($temp);  // sort descending 
    $result2[]="$a is most similar to ".key($temp)." (lev-score:".current($temp).")";  // access first key and value
}
var_export($result2);

输出：

array (
  0 => 'Red is most similar to Reddit (sim-score:67%)',
  1 => 'Black is most similar to Blackwell (sim-score:71%)',
  2 => 'White is most similar to Whitewater (sim-score:67%)',
  3 => 'Green is most similar to Purple (sim-score:36%)',
)
--
array (
  0 => 'Red is most similar to Whitewater (lev-score:9)',
  1 => 'Black is most similar to Whitewater (lev-score:9)',
  2 => 'White is most similar to Blackwell (lev-score:8)',
  3 => 'Green is most similar to Blackwell (lev-score:8)',
)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48825825

复制

相似问题

问以最高的相似性将两组数据中的项匹配。
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问以最高的相似性将两组数据中的项匹配。EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问以最高的相似性将两组数据中的项匹配。
EN