文章/答案/技术大牛

发布

社区首页 >问答首页 >从of向量中检索词法发生的位置和次数

问从of向量中检索词法发生的位置和次数
EN

Stack Overflow用户

提问于 2014-08-22 11:12:36

回答 1查看 918关注 0票数 6

有没有任何方法可以从to向量中获得关于句子中一个词的位置和出现次数的信息？

就像这样

SELECT *
FROM get_position(to_tsvector('english', 'The Fat Rats'), to_tsquery('Rats'));

将返回3

和

SELECT *
FROM get_occurrences(to_tsvector('english', 'The Fat Rats'), to_tsquery('Rats'));

将返回1。

sql

postgresql

full-text-search

tsvector

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-08-23 16:31:04

list向量的文本表示包含特定词汇的出现列表：

test=# select to_tsvector ( 'english', 'new bar in New York' );
        to_tsvector
----------------------------
 'bar':2 'new':1,4 'york':5

下面是基于这一点的示范功能。它接受文本参数并在内部将它们转换为ts_vector，但是可以很容易地重写为接受ts_vector。

CREATE OR REPLACE FUNCTION lexeme_occurrences (
    IN _document text
,   IN _word text
,   IN _config regconfig
,   OUT lexeme_count int
,   OUT lexeme_positions int[]
) RETURNS RECORD
AS $$
DECLARE
    _lexemes tsvector := to_tsvector ( _config, _document );
    _searched_lexeme tsvector := strip ( to_tsvector ( _config, _word ) );
    _occurences_pattern text := _searched_lexeme::text || ':([0-9,]+)';
    _occurences_list text := substring ( _lexemes::text, _occurences_pattern );
BEGIN
    SELECT
        count ( a )
    ,   array_agg ( a::int )
    FROM regexp_split_to_table ( _occurences_list, ',' ) a
    WHERE _searched_lexeme::text != '' -- preventing false positives
    INTO
        lexeme_count
    ,   lexeme_positions;
    RETURN;
END $$ LANGUAGE plpgsql;

示例用法：

select * from lexeme_occurrences ( 'The Fat Rats', 'rat', 'english' );
 lexeme_count | lexeme_positions
--------------+-----------------
            1 | {3}
(1 row)

票数 6

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/25445670

复制

相似问题

问从of向量中检索词法发生的位置和次数
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从of向量中检索词法发生的位置和次数EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从of向量中检索词法发生的位置和次数
EN