首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从数组中删除相似元素

从数组中删除相似元素
EN

Stack Overflow用户
提问于 2016-12-12 08:29:52
回答 8查看 627关注 0票数 7
代码语言:javascript
复制
Array
(
    [0] => The N2225 and N2226 SAS/SATA HBAs are low-cost, high-performance host bus adapters for high-performance connectivity between System x® servers and tape drives and RAID storage systems. The N2225 provides two x4 external mini-SAS HD connectors with eight lanes of 12 Gbps SAS. The N2226 provides four x4 external mini-SAS HD connectors with 16 lanes of 12 Gbps SAS.
    [1] => The N2225 and N2226 SAS/SATA HBAs are low-cost, high-performance host bus adapters for high-performance connectivity between System x® servers and tapes drives and RAID storage systems. The N2225 provides two x4 external mini-SAS HD connectors with eight lanes of 12 Gbps SAS. The N2226 provides four x4 external mini-SAS HD connectors with 16 lanes of 12 Gbps SAS.
    [2] => The N2225 and N2226 SAS/SATA HBAs support SAS data transfer rates of 3, 6, and 12 Gbps per lane and SATA transfer rates of 3 and 6 Gbps per lane, and they enable maximum connectivity and performance in a low-profile (N2225) or full-height (N2226) form factor.
    [3] => Rigorous testing of the N2225 and N2226 SAS/SATA HBAs by Lenovo through the ServerProven® program ensures a high degree of confidence in storage subsystem compatibility and reliability. Providing an additional peace of mind, these controllers are covered under Lenovo warranty.
    [4] => The following tables list the compatibility information for the N2225 and N2226 SAS/SATA HBAs and System x®, iDataPlex®, and NeXtScale™ servers.
    [5] => For more information about the System x servers, including older servers that support the N2225 and N2226 adapters, see the following ServerProven® website:
    [6] => The following table lists the external storage systems that are currently offered by Lenovo that can be used with the N2225 and N2226 SAS/SATA HBAs in storage solutions.
    [7] => The following table lists the external tape backup units that are currently offered by Lenovo that can be used with the N2225 and N2226 SAS/SATA HBAs in tape backup solutions.
    [8] => For more information about the specific versions and service levels that are supported and any other prerequisites, see the ServerProven website:
    [9] => The N2225 and N2226 SAS/SATA HBAs carry a one-year limited warranty. When installed in a supported System x server, the adapters assume your system’s base warranty and any Lenovo warranty upgrade.
)

可以用array_unique删除的元素并不完全相同,但是由包含完全相同数据和更多数据的另一个元素变得过时的元素,或者有时只是几个单词是不同的。

我怎么过滤这些?

EN

回答 8

Stack Overflow用户

回答已采纳

发布于 2016-12-30 11:27:06

首先,问题不是那么简单,也没有足够好的表述:你不想删除相同的元素,你想要删除相似的元素,所以你的第一个问题是确定哪些元素是相似的。

考虑到字符串中的任何一点都可能出现类似情况,要求它们以相同的字符集开头是不够的。例如,取这两句话(根据你的问题改编):

代码语言:javascript
复制
Rigorous testing of the N2225 and N2226 SAS/SATA HBAs by Lenovo through the ServerProven® program ensures a high degree of confidence in storage subsystem compatibility and reliability. Providing an additional peace of mind, these controllers are covered under Lenovo warranty.
The rigorous testing of the N2225 and N2226 SAS/SATA HBAs by Lenovo through the ServerProven® program ensures a high degree of confidence in storage subsystem compatibility and reliability. Providing an additional peace of mind, these controllers are covered under Lenovo warranty.

它们非常相似,没有以相同的字符串开头。确定相似性度量的一种方法是算法,有一个可用的这里实现。

--以后编辑

下面是使用PHP在案文()中构建的实现

代码语言:javascript
复制
/**
 * @param mixed $array          input array
 * @param int $minSimilarity    minimum similarity for an item to be removed (percentage)
 * @return array
 */
function applyFilter ($array, $minSimilarity = 90) {
    $result = [];

    foreach ($array as $outerValue) {
        $append = true;
        foreach ($result as $key => $innerValue) {
            $similarity = null;
            similar_text($innerValue, $outerValue, $similarity);
            if ($similarity >= $minSimilarity) {
                if (strlen($outerValue) > strlen($innerValue)) {
                    // always keep the longer one
                    $result[$key] = $outerValue;
                }
                $append = false;
                break;
            }
        }

        if ($append) {
            $result[] = $outerValue;
        }
    }

    return $result;
}

$test = [
    'The N2225 and N2226 SAS/SATA HBAs are low-cost, high-performance host bus adapters for high-performance connectivity between System x® servers and tape drives and RAID storage systems. The N2225 provides two x4 external mini-SAS HD connectors with eight lanes of 12 Gbps SAS. The N2226 provides four x4 external mini-SAS HD connectors with 16 lanes of 12 Gbps SAS.',
    'The N2225 and N2226 SAS/SATA HBAs are low-cost, high-performance host bus adapters for high-performance connectivity between System x® servers and tapes drives and RAID storage systems. The N2225 provides two x4 external mini-SAS HD connectors with eight lanes of 12 Gbps SAS. The N2226 provides four x4 external mini-SAS HD connectors with 16 lanes of 12 Gbps SAS.',
    'The N2225 and N2226 SAS/SATA HBAs support SAS data transfer rates of 3, 6, and 12 Gbps per lane and SATA transfer rates of 3 and 6 Gbps per lane, and they enable maximum connectivity and performance in a low-profile (N2225) or full-height (N2226) form factor.',
    'Rigorous testing of the N2225 and N2226 SAS/SATA HBAs by Lenovo through the ServerProven® program ensures a high degree of confidence in storage subsystem compatibility and reliability. Providing an additional peace of mind, these controllers are covered under Lenovo warranty.',
    'The following tables list the compatibility information for the N2225 and N2226 SAS/SATA HBAs and System x®, iDataPlex®, and NeXtScale™ servers.',
    'For more information about the System x servers, including older servers that support the N2225 and N2226 adapters, see the following ServerProven® website:',
    'The following table lists the external storage systems that are currently offered by Lenovo that can be used with the N2225 and N2226 SAS/SATA HBAs in storage solutions.',
    'The following table lists the external tape backup units that are currently offered by Lenovo that can be used with the N2225 and N2226 SAS/SATA HBAs in tape backup solutions.',
    'For more information about the specific versions and service levels that are supported and any other prerequisites, see the ServerProven website:',
    'The N2225 and N2226 SAS/SATA HBAs carry a one-year limited warranty. When installed in a supported System x server, the adapters assume your system’s base warranty and any Lenovo warranty upgrade.',
];

var_dump(applyFilter($test));

-- EOF后来编辑--

下面是使用算法的完整工作代码

代码语言:javascript
复制
class SmithWatermanGotoh
{
    private $gapValue;
    private $substitution;

    /**
     * Constructs a new Smith Waterman metric.
     *
     * @param gapValue
     *            a non-positive gap penalty
     * @param substitution
     *            a substitution function
     */
    public function __construct($gapValue=-0.5,
                $substitution=null)
    {
        if($gapValue > 0.0) throw new Exception("gapValue must be <= 0");
        //if(empty($substitution)) throw new Exception("substitution is required");
        if (empty($substitution)) $this->substitution = new SmithWatermanMatchMismatch(1.0, -2.0);
        else $this->substitution = $substitution;
        $this->gapValue = $gapValue;
    }

    public function compare($a, $b)
    {
        if (empty($a) && empty($b)) {
            return 1.0;
        }

        if (empty($a) || empty($b)) {
            return 0.0;
        }

        $maxDistance = min(mb_strlen($a), mb_strlen($b))
                * max($this->substitution->max(), $this->gapValue);
        return $this->smithWatermanGotoh($a, $b) / $maxDistance;
    }

    private function smithWatermanGotoh($s, $t)
    {
        $v0 = [];
        $v1 = [];
        $t_len = mb_strlen($t);
        $max = $v0[0] = max(0, $this->gapValue, $this->substitution->compare($s, 0, $t, 0));

        for ($j = 1; $j < $t_len; $j++) {
            $v0[$j] = max(0, $v0[$j - 1] + $this->gapValue,
                    $this->substitution->compare($s, 0, $t, $j));

            $max = max($max, $v0[$j]);
        }

        // Find max
        for ($i = 1; $i < mb_strlen($s); $i++) {
            $v1[0] = max(0, $v0[0] + $this->gapValue, $this->substitution->compare($s, $i, $t, 0));

            $max = max($max, $v1[0]);

            for ($j = 1; $j < $t_len; $j++) {
                $v1[$j] = max(0, $v0[$j] + $this->gapValue, $v1[$j - 1] + $this->gapValue,
                        $v0[$j - 1] + $this->substitution->compare($s, $i, $t, $j));

                $max = max($max, $v1[$j]);
            }

            for ($j = 0; $j < $t_len; $j++) {
                $v0[$j] = $v1[$j];
            }
        }

        return $max;
    }
}

class SmithWatermanMatchMismatch
{
    private $matchValue;
    private $mismatchValue;

    /**
     * Constructs a new match-mismatch substitution function. When two
     * characters are equal a score of <code>matchValue</code> is assigned. In
     * case of a mismatch a score of <code>mismatchValue</code>. The
     * <code>matchValue</code> must be strictly greater then
     * <code>mismatchValue</code>
     *
     * @param matchValue
     *            value when characters are equal
     * @param mismatchValue
     *            value when characters are not equal
     */
    public function __construct($matchValue, $mismatchValue) {
        if($matchValue <= $mismatchValue) throw new Exception("matchValue must be > matchValue");

        $this->matchValue = $matchValue;
        $this->mismatchValue = $mismatchValue;
    }

    public function compare($a, $aIndex, $b, $bIndex) {
        return ($a[$aIndex] === $b[$bIndex] ? $this->matchValue
                : $this->mismatchValue);
    }

    public function max() {
        return $this->matchValue;
    }

    public function min() {
        return $this->mismatchValue;
    }
}

/**
 * @param mixed $array          input array
 * @param int $minSimilarity    minimum similarity for an item to be removed (percentage)
 * @return array
 */
function applyFilter ($array, $minSimilarity = 90) {
    $swg = new SmithWatermanGotoh();

    $result = [];

    foreach ($array as $outerValue) {
        $append = true;
        foreach ($result as $key => $innerValue) {
            $similarity = $swg->compare($innerValue, $outerValue) * 100;
            if ($similarity >= $minSimilarity) {
                if (strlen($outerValue) > strlen($innerValue)) {
                    // always keep the longer one
                    $result[$key] = $outerValue;
                }
                $append = false;
                break;
            }
        }

        if ($append) {
            $result[] = $outerValue;
        }
    }

    return $result;
}


$test = [
    'The N2225 and N2226 SAS/SATA HBAs are low-cost, high-performance host bus adapters for high-performance connectivity between System x® servers and tape drives and RAID storage systems. The N2225 provides two x4 external mini-SAS HD connectors with eight lanes of 12 Gbps SAS. The N2226 provides four x4 external mini-SAS HD connectors with 16 lanes of 12 Gbps SAS.',
    'The N2225 and N2226 SAS/SATA HBAs are low-cost, high-performance host bus adapters for high-performance connectivity between System x® servers and tapes drives and RAID storage systems. The N2225 provides two x4 external mini-SAS HD connectors with eight lanes of 12 Gbps SAS. The N2226 provides four x4 external mini-SAS HD connectors with 16 lanes of 12 Gbps SAS.',
    'The N2225 and N2226 SAS/SATA HBAs support SAS data transfer rates of 3, 6, and 12 Gbps per lane and SATA transfer rates of 3 and 6 Gbps per lane, and they enable maximum connectivity and performance in a low-profile (N2225) or full-height (N2226) form factor.',
    'Rigorous testing of the N2225 and N2226 SAS/SATA HBAs by Lenovo through the ServerProven® program ensures a high degree of confidence in storage subsystem compatibility and reliability. Providing an additional peace of mind, these controllers are covered under Lenovo warranty.',
    'The following tables list the compatibility information for the N2225 and N2226 SAS/SATA HBAs and System x®, iDataPlex®, and NeXtScale™ servers.',
    'For more information about the System x servers, including older servers that support the N2225 and N2226 adapters, see the following ServerProven® website:',
    'The following table lists the external storage systems that are currently offered by Lenovo that can be used with the N2225 and N2226 SAS/SATA HBAs in storage solutions.',
    'The following table lists the external tape backup units that are currently offered by Lenovo that can be used with the N2225 and N2226 SAS/SATA HBAs in tape backup solutions.',
    'For more information about the specific versions and service levels that are supported and any other prerequisites, see the ServerProven website:',
    'The N2225 and N2226 SAS/SATA HBAs carry a one-year limited warranty. When installed in a supported System x server, the adapters assume your system’s base warranty and any Lenovo warranty upgrade.',
];

var_dump(applyFilter($test));

现在,您只需要根据需要调整$minSimilarity变量。例如,在您的情况下,如果您保持默认的90%,将删除第一个元素(类似于第二个到99.86% )。然而,设置一个较低的值(80%)也将删除第8元素。

希望能帮上忙!

票数 11
EN

Stack Overflow用户

发布于 2016-12-12 08:53:56

您仍然可以使用array_filter并使用自定义回调,使用substr_count查找该值在数组中是否不止一次。

代码语言:javascript
复制
$input = array("a","b","c","d","ax","cz");

$str = implode("|",array_unique($input));

$output = array_filter($input, function($var) use ($str){
                        return substr_count($str, $var) == 1;
                    });

print_r($output);
票数 1
EN

Stack Overflow用户

发布于 2016-12-12 09:52:58

假设值总是出现在最开始时,您可以这样做:

代码语言:javascript
复制
$arr = ["Some Text.", "Some Text. And more details."];

foreach($arr as $key => $value) {

    // Look for the value in every element
    foreach($arr as $key2 => $value2) {

        // Remove element if its value appears at the beginning of another element
        if ($key !== $key2 && strpos($value2, $value) === 0) {
            unset($arr[$key]);
            continue 2;
        }
    }
}

// Re-index array 
$arr = array_values($arr);

如果元素顺序是相反的,这也同样有效。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/41096810

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档