我试图从PHP中的字符串中生成n克,因为我使用以下函数:https://gist.github.com/Xeoncross/5366393
function Bigrams($word){
$ngrams = array();
$len = strlen($word);
for($i=0;$i+1<$len;$i++){
$ngrams[$i]=$word[$i].$word[$i+1];
}
return $ngrams;
}
$word = "abcdefg";
print_r(Bigrams($word));这个OK返回的是预期的ngram:
[0] => ab
[1] => bc
[2] => cd
[3] => de
[4] => ef
[5] => fg但是,对于某些Unicode字符来说,返回情况并不像预期的那样:
例: for $word =“L$word”返回:
[0] => L�
[1] => ò
[2] => �r
[3] => ri或$word = "пожалуйста“返回:
[0] => п
[1] => ��
[2] => о
[3] => ��
[4] => ж
[5] => ��
[6] => а
[7] => ��
[8] => л知道怎么解决这个问题吗?
发布于 2017-04-22 16:41:19
使用面向unicode的字符串函数
function Bigrams($word){
$ngrams = array();
$len = mb_strlen($word);
for($i=0;$i+1<$len;$i++){
$ngrams[$i]=mb_substr($word, $i, 2);
}
return $ngrams;
}
$word = "пожалуйста";
print_r(Bigrams($word));结果
Array
(
[0] => по
[1] => ож
[2] => жа
[3] => ал
[4] => лу
[5] => уй
[6] => йс
[7] => ст
[8] => та
)https://stackoverflow.com/questions/43561626
复制相似问题