我有一些数据需要清理。这应该是一个常见的问题,但我还没有找到解决办法。数据如下所示,应转换为:
一种特殊情况是双重发生的:
如果没有“停止”字符,并且" the“一词没有结尾,请按原样离开:
因此,需要将多个单词( the,a,la)移到开头,还有几个“停止”字符:,-,(,),字符串的末尾。
我试图用preg_replace来解决这个问题,但却想不出一个可行的解决方案。我相信,对更有经验的人来说,这是可能的。我非常感谢你在这方面的帮助!
基于elclanrs´ answer的最终解决方案
$tests = array(
"Easiest, The",
"Heaviest,The",
"Night, The - Is black",
"Trip,A - Go west",
"Muse, La: 3 chansons",
"Passion, La (OMG)",
"Johnny - One to go for, The",
"Peace, The \"Great one\"",
"Chuck, the fighter",
"Mason, the hero ",
"Internet Generation, The - Dream, A",
);
$patt = '/([^,:"(-]+)\s*?,\s*?([^,:"(-]+)/';
foreach ($tests as $test) {
if (preg_match('/(([:"(-]+)\s*?)|,\s*?\w+\s*?$/', $test)) {
echo trim(preg_replace('/\s+:/', ':', preg_replace('/\s+/', ' ', preg_replace($patt, '$2 $1 ', $test)))) . PHP_EOL;
} else {
echo "Not modified: " . $test . PHP_EOL;
}
}这将使:
The Easiest
The Heaviest
The Night - Is black
A Trip - Go west
La Muse: 3 chansons
La Passion (OMG)
Johnny - The One to go for
The Peace "Great one"
Not modified: Chuck, the fighter
Not modified: Mason, the hero
The Internet Generation - A Dream因此,我只是跳过不需要修改的字符串,并删除所有不必要的空格。
发布于 2013-09-13 09:39:13
以下是一个可能的解决方案:
$tests = array(
"Easiest, The",
"Night, The - Is black",
"Trip,A - Go west",
"Muse, La: 3 chansonss",
"Passion, La (OMG)",
"Johnny - One to go for, The",
"Peace, The \"Great one\""
);
$patt = '/([^,:"(-]+)\s*?,\s*?([^,:"(-]+)/';
foreach ($tests as $test) {
echo preg_replace($patt, '$2 $1 ', $test) .'<br>';
}这将打印出来:
The Easiest
The Night - Is black
A Trip - Go west
La Muse : 3 chansonss
La Passion (OMG)
Johnny - The One to go for
The Peace "Great one"如果您有更多的规则[^,:"(-],则必须更新令牌。这并不完美,因为你可以看到在:之前有一个空格,但我会把它和特例留给你.
https://stackoverflow.com/questions/18782673
复制相似问题