我正拼命地试图克服以下问题:在一系列句子/新闻标题中,我试图找到那些非常相似的句子/新闻标题(大约有3个或4个单词),并将它们放入一个新的数组中。因此,对于这个原始数组/列表:
'Title1: Hackers expose trove of snagged Snapchat images',
'Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine',
'Title3: Family says goodbye at funeral for 16-year-old',
'Title4: New Jersey officials talk about Ebola quarantine',
'Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands',
'Title6: Hackers expose Snapchat images'其结果应该是:
Array
(
[0] => Title1: Hackers expose trove of snagged Snapchat images
[1] => Array
(
[duplicate] => Title6: Hackers expose Snapchat images
)
[2] => Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine
[3] => Array
(
[duplicate] => Title4: New Jersey officials talk about Ebola quarantine
)
[4] => Title3: Family says goodbye at funeral for 16-year-old
[5] => Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands
)这是我的密码:
$titles = array(
'Title1: Hackers expose trove of snagged Snapchat images',
'Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine',
'Title3: Family says goodbye at funeral for 16-year-old',
'Title4: New Jersey officials talk about Ebola quarantine',
'Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands',
'Title6: Hackers expose Snapchat images'
);
$z = 1;
foreach ($titles as $feed)
{
$feed_A = explode(' ', $feed);
for ($i=$z; $i<count($titles); $i++)
{
$feed_B = explode(' ', $titles[$i]);
$intersect_A_B = array_intersect($feed_A, $feed_B);
if(count($intersect_A_B)>3)
{
$titluri[] = $feed;
$titluri[]['duplicate'] = $titles[$i];
}
else
{
$titluri[] = $feed;
}
}
$z++;
}它输出这种尴尬的结果,但有点仓促,以达到预期的结果:
Array
(
[0] => Title1: Hackers expose trove of snagged Snapchat images
[1] => Title1: Hackers expose trove of snagged Snapchat images
[2] => Title1: Hackers expose trove of snagged Snapchat images
[3] => Title1: Hackers expose trove of snagged Snapchat images
[4] => Title1: Hackers expose trove of snagged Snapchat images
[5] => Array
(
[duplicate] => Title6: Hackers expose Snapchat images
)
[6] => Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine
[7] => Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine
[8] => Array
(
[duplicate] => Title4: New Jersey officials talk about Ebola quarantine
)
[9] => Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine
[10] => Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine
[11] => Title3: Family says goodbye at funeral for 16-year-old
[12] => Title3: Family says goodbye at funeral for 16-year-old
[13] => Title3: Family says goodbye at funeral for 16-year-old
[14] => Title4: New Jersey officials talk about Ebola quarantine
[15] => Title4: New Jersey officials talk about Ebola quarantine
[16] => Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands
)任何建议都将不胜感激!
发布于 2014-10-12 20:52:01
这里是我的解决方案,灵感来自@DomWeldon,没有重复:
<?php
$titles = array(
'Title1: Hackers expose trove of snagged Snapchat images',
'Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine',
'Title3: Family says goodbye at funeral for 16-year-old',
'Title4: New Jersey officials talk about Ebola quarantine',
'Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands',
'Title6: Hackers expose Snapchat images'
);
$titluri = array(); // unless it's declared elsewhere
$duplicateTitles = array();
// loop through each line of the array
foreach ($titles as $key => $originalFeed)
{
if(!in_array($key, $duplicateTitles)){
$titluri[] = $originalFeed; // all feeds are listed in the new array
$feed_A = explode(' ', $originalFeed);
foreach ($titles as $newKey => $comparisonFeed)
{
// iterate through the array again and see if they intersect
if ($key != $newKey) { // but don't compare same line against eachother!
$feed_B = explode(' ', $comparisonFeed);
$intersect_A_B = array_intersect($feed_A, $feed_B);
// do they share three words?
if(count($intersect_A_B)>3)
{
// yes, add a diplicate entry
$titluri[]['duplicate'] = $comparisonFeed;
$duplicateTitles[] = $newKey;
}
}
}
}
}发布于 2014-10-12 20:20:00
我认为这段代码可能就是你想要的(包括注释)。如果没有,让我知道-这是写得很匆忙,是未经测试的。另外,您可能需要考虑一种替代方法--嵌套的foreach循环可能会在大型站点上造成性能问题。
<?php
$titles = array(
'Title1: Hackers expose trove of snagged Snapchat images',
'Title2: New Jersey officials order symptom-less NBC News crew into Ebola quarantine',
'Title3: Family says goodbye at funeral for 16-year-old',
'Title4: New Jersey officials talk about Ebola quarantine',
'Title5: New Far Cry 4 Trailer Welcomes You to Kyrat Lowlands',
'Title6: Hackers expose Snapchat images'
);
$titluri = array(); // unless it's declared elsewhere
// loop through each line of the array
foreach ($titles as $key => $originalFeed)
{
$titluri[] = $originalFeed; // all feeds are listed in the new array
$feed_A = explode(' ', $originalFeed);
foreach ($titles as $newKey => $comparisonFeed)
{
// iterate through the array again and see if they intersect
if ($key != $newKey) { // but don't compare same line against eachother!
$feed_B = explode(' ', $comparisonFeed);
$intersect_A_B = array_intersect($feed_A, $feed_B);
// do they share three words?
if(count($intersect_A_B)>3)
{
// yes, add a diplicate entry
$titluri[]['duplicate'] = $comparisonFeed;
}
}
}
}https://stackoverflow.com/questions/26329429
复制相似问题