首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Magento:改进搜索引擎(词形变化,去掉不相关的单词等)

Magento:改进搜索引擎(词形变化,去掉不相关的单词等)
EN

Stack Overflow用户
提问于 2013-03-08 09:01:23
回答 1查看 199关注 0票数 0

我想知道我是否可以检测词形变化(例如狗/狗),删除不重要的单词("made in the usa“in”->“in”和" the“都不重要),等等,而无需在一个大的PHP代码块中硬编码这么多场景。我可以在一定程度上处理这个搜索字符串,但它看起来不卫生和丑陋。

有什么建议或建议可以让它成为“智能”搜索引擎吗?

EN

回答 1

Stack Overflow用户

发布于 2013-03-08 09:02:52

使用这个类:

代码语言:javascript
复制
class Inflection
{
    static $plural = array(
    '/(quiz)$/i' => "$1zes",
    '/^(ox)$/i' => "$1en",
    '/([m|l])ouse$/i' => "$1ice",
    '/(matr|vert|ind)ix|ex$/i' => "$1ices",
    '/(x|ch|ss|sh)$/i' => "$1es",
    '/([^aeiouy]|qu)y$/i' => "$1ies",
    '/(hive)$/i' => "$1s",
    '/(?:([^f])fe|([lr])f)$/i' => "$1$2ves",
    '/(shea|lea|loa|thie)f$/i' => "$1ves",
    '/sis$/i' => "ses",
    '/([ti])um$/i' => "$1a",
    '/(tomat|potat|ech|her|vet)o$/i'=> "$1oes",
    '/(bu)s$/i' => "$1ses",
    '/(alias)$/i' => "$1es",
    '/(octop)us$/i' => "$1i",
    '/(ax|test)is$/i' => "$1es",
    '/(us)$/i' => "$1es",
    '/s$/i' => "s",
    '/$/' => "s"
    );

    static $singular = array(
    '/(quiz)zes$/i' => "$1",
    '/(matr)ices$/i' => "$1ix",
    '/(vert|ind)ices$/i' => "$1ex",
    '/^(ox)en$/i' => "$1",
    '/(alias)es$/i' => "$1",
    '/(octop|vir)i$/i' => "$1us",
    '/(cris|ax|test)es$/i' => "$1is",
    '/(shoe)s$/i' => "$1",
    '/(o)es$/i' => "$1",
    '/(bus)es$/i' => "$1",
    '/([m|l])ice$/i' => "$1ouse",
    '/(x|ch|ss|sh)es$/i' => "$1",
    '/(m)ovies$/i' => "$1ovie",
    '/(s)eries$/i' => "$1eries",
    '/([^aeiouy]|qu)ies$/i' => "$1y",
    '/([lr])ves$/i' => "$1f",
    '/(tive)s$/i' => "$1",
    '/(hive)s$/i' => "$1",
    '/(li|wi|kni)ves$/i' => "$1fe",
    '/(shea|loa|lea|thie)ves$/i'=> "$1f",
    '/(^analy)ses$/i' => "$1sis",
    '/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i' => "$1$2sis",
    '/([ti])a$/i' => "$1um",
    '/(n)ews$/i' => "$1ews",
    '/(h|bl)ouses$/i' => "$1ouse",
    '/(corpse)s$/i' => "$1",
    '/(us)es$/i' => "$1",
    '/s$/i' => ""
    );

    static $irregular = array(
    'move' => 'moves',
    'foot' => 'feet',
    'goose' => 'geese',
    'sex' => 'sexes',
    'child' => 'children',
    'man' => 'men',
    'tooth' => 'teeth',
    'person' => 'people',
    'admin' => 'admin'
    );

    static $uncountable = array(
    'sheep',
    'fish',
    'deer',
    'series',
    'species',
    'money',
    'rice',
    'information',
    'equipment'
    );

    public static function pluralize( $string )
    {
global $irregularWords;

// save some time in the case that singular and plural are the same
    if ( in_array( strtolower( $string ), self::$uncountable ) )
        return $string;

    // check for irregular singular forms
    foreach ( $irregularWords as $pattern => $result )
    {
        $pattern = '/' . $pattern . '$/i';

        if ( preg_match( $pattern, $string ) )
            return preg_replace( $pattern, $result, $string);
    }

    // check for irregular singular forms
    foreach ( self::$irregular as $pattern => $result )
    {
        $pattern = '/' . $pattern . '$/i';

        if ( preg_match( $pattern, $string ) )
            return preg_replace( $pattern, $result, $string);
    }

    // check for matches using regular expressions
    foreach ( self::$plural as $pattern => $result )
    {
        if ( preg_match( $pattern, $string ) )
            return preg_replace( $pattern, $result, $string );
    }

    return $string;
    }

    public static function singularize( $string )
    {   
global $irregularWords;
    // save some time in the case that singular and plural are the same
    if ( in_array( strtolower( $string ), self::$uncountable ) )
        return $string;

// check for irregular words
    foreach ( $irregularWords as $result => $pattern )
    {
        $pattern = '/' . $pattern . '$/i';

        if ( preg_match( $pattern, $string ) )
            return preg_replace( $pattern, $result, $string);
    }

// check for irregular plural forms
    foreach ( self::$irregular as $result => $pattern )
    {
        $pattern = '/' . $pattern . '$/i';

        if ( preg_match( $pattern, $string ) )
            return preg_replace( $pattern, $result, $string);
    }

// check for matches using regular expressions
    foreach ( self::$singular as $pattern => $result )
    {
        if ( preg_match( $pattern, $string ) )
            return preg_replace( $pattern, $result, $string );
    }

    return $string;
    }

    public static function pluralize_if($count, $string)
    {
    if ($count == 1)
        return "1 $string";
    else
        return $count . " " . self::pluralize($string);
    }
}

如果您有时间,可以使用一种标准的词形变化用法:http://en.wikipedia.org/wiki/Inflection

你可以把数组和XML结合起来放入所有的屈折数据,看看codeigniter是如何让屈折变得非常友好的:http://ellislab.com/codeigniter/user-guide/helpers/inflector_helper.html

许多框架支持内置的词形变化,但它将只关注主要的英语。对于其他语言,您应该编写自己的...如果需要,也可以使用unicode.org和其他语言的一些词形变化标准。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/15284938

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档