文章/答案/技术大牛

发布

社区首页 >问答首页 >是否搜索前500个字母并排除html标记？

问是否搜索前500个字母并排除html标记？
EN

Stack Overflow用户

提问于 2010-11-09 03:09:08

回答 4查看 263关注 0票数 0

如何搜索前500个字符，不包括html标记？

下面是我到目前为止想出的方法，它是搜索文本中出现的关键字，

    SELECT *
    FROM root_pages

    WHERE root_pages.pg_cat_id = '2'
    AND root_pages.parent_id != root_pages.pg_id
    AND root_pages.pg_hide != '1'
    AND root_pages.pg_url != 'cms'
    AND root_pages.pg_content_1 REGEXP '[[:<:]]".$search."[[:>:]]'
    OR root_pages.pg_content_2 REGEXP '[[:<:]]".$search."[[:>:]]'

ORDER BY root_pages.pg_created DESC

我如何添加更多的条件到它-前500个字母，不包括html标签？

如果它只能搜索第一段中的关键字，那就太完美了--这可能吗？

编辑：

感谢大家的帮助！这是我的解决方案：

    # query to search for “whole word match” in SQL only, e.g. when I search for "rid", it should not match "arid", but it should match "a rid".
    # you can use REGEXP and the [[:<:]] and [[:>:]] word-boundary markers:
    $sql = "
    SELECT *
    FROM root_pages

    WHERE root_pages.pg_cat_id = '2'
    AND root_pages.parent_id != root_pages.pg_id
    AND root_pages.pg_hide != '1'
    AND root_pages.pg_url != 'cms'
    AND root_pages.pg_content_1 REGEXP '[[:<:]]".$search."[[:>:]]'
    OR root_pages.pg_content_2 REGEXP '[[:<:]]".$search."[[:>:]]'

    ORDER BY root_pages.pg_created DESC
    ";

    # use the instantiated db connection object from the init.php, to process the query
    $items = $connection -> fetch_all($sql);
    $total_item = $connection -> num_rows($sql);

    if ($total_item > 0)
    {
        foreach($items as $item)
        {
            # get the content
            if(empty($item['pg_content_2'])) $pg_content = strip_tags($item['pg_content_1']);
                else $pg_content = strip_tags($item['pg_content_2']);

            # get the first 500 letters only
            $pg_content = substr($pg_content, 0, 500);

            # get the matches
            if (preg_match("/\b(".$search.")\b/", $pg_content)) 
            {
                $match[] = $pg_content;
            }

        }

        $total_match = count($match);
        //echo $count;
    }

    if($total_match > 0)
    {
        echo '<result message="'.$total_match.' matches found! Please wait while redirecting." search="'.$search.'"/>';
    }
    else
    {
        echo '<error elementid="input" message="Sorry no results are found."/>';
    }

regex

mysql

full-text-search

php

回答 4

Stack Overflow用户

回答已采纳

发布于 2010-11-09 03:20:26

适用于：

我如何才能添加更多的条件-前500个字母，不包括html标签？

你不能只使用MySQL来做到这一点(至少对于一个在100%的情况下都能工作的解决方案来说是这样的)--有关更多详细信息，请参阅Parsing Html The Cthulhu Way和此SO answer。

PHP、strip_tags和substr将帮助你实现你想要的。

票数 0

Stack Overflow用户

发布于 2010-11-09 03:28:45

这并不像剥离/跳过标签那么简单--你会发现<head>中的前500个字符通常是<style>或<script>。

此外，简单地删除标签将中断：

separate<br>words

如果您想正确地这样做，我建议在文本输出模式下编写XSLT样式表，通过在块级元素周围添加空格、删除脚本、<head>等方法将HTML转换为纯文本。

一种更简单的方法是使用一系列regexp而不是XSLT来预处理HTML。

将HTML转换为可用文本后，将该文本放入数据库的额外列中，并使用它进行搜索。您甚至可以将FULLTEXT索引放在它上面。

票数 1

Stack Overflow用户

发布于 2010-11-09 03:16:52

如果段落是用p元素定义的：

... REGEXP '<p[^>]*>'".$search."'</p>'

不要忘了对正则表达式特殊字符的$search进行转义。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/4127099

复制

相似问题

问是否搜索前500个字母并排除html标记？
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问是否搜索前500个字母并排除html标记？EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问是否搜索前500个字母并排除html标记？
EN