首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在没有数据库的情况下向Rails应用程序添加静态页面的搜索功能?

如何在没有数据库的情况下向Rails应用程序添加静态页面的搜索功能?
EN

Stack Overflow用户
提问于 2013-08-03 01:26:50
回答 1查看 1.6K关注 0票数 2

有人知道关于如何在rails应用程序中索引静态页面以添加搜索功能的任何好的宝石或文档吗?到目前为止,我的搜索已经引导我找到了太阳黑子蛛网,但两者似乎都比我想要实现的目标要复杂一些。

下面是我的视图目录的一个示例:

代码语言:javascript
复制
views
|
|__Folder_1
   |
   |__ View-1
   |__ View-2
   |
   Folder_2
   |
   |__ View-3
   |__ View-4

每个文件夹都是一个带有视图的控制器,如果这在考虑如何设置此操作时有任何不同,则作为定义的操作。最终目标是返回包含搜索词的页面的链接列表。

编辑:

每个搜索查询的目的是在所有静态页面的HTML内容中爬行,并返回一个链接列表,用于匹配搜索到的任何不间断单词术语的页面。我还计划根据搜索词在静态页面和单词放置中的频率来增加搜索的相关性。

示例:

搜索查询:“炒鸡蛋食谱”--将返回“食谱”、“炒鸡蛋”和“鸡蛋”的链接,并将最相关的链接放在返回列表的顶部:

代码语言:javascript
复制
Search Results:
Page 1 (Most relevant because includes all 3 terms)
Page 2 (Includes 2 terms)
Page 3 (Includes 1 terms)

优选地,搜索功能只尝试将搜索的术语与每个视图的文本匹配,这样如果用户输入' div‘作为搜索词,它就不会返回每个页面,因为div元素存在于HTML内容中。

答案:

经过几周的学习Ruby,这就是我想出来的--基本上,我正在过滤/app/view/目录中的每个子目录,读取子目录内容中的每个文件,处理文本以删除HTML标记和常见的停止词,并将其存储在搜索索引散列中。

search_controller.rb

代码语言:javascript
复制
#include sanitize helper to enable use of strip_tags method in controller
include ActionView::Helpers::SanitizeHelper

class SearchController < ApplicationController

  prepend_before_filter :search

  def search
    if params[:q]
      stopwords = ["a", "about", "above", "after", "again", "against", "all", "am", "an", "and", "any", "are", "aren't", "as", "at", "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "can't", "cannot", "could", "couldn't", "did", "didn't", "do", "does", "doesn't", "doing", "don't", "down", "during", "each", "few", "for", "from", "further", "had", "hadn't", "has", "hasn't", "have", "haven't", "having", "he", "he'd", "he'll", "he's", "her", "here", "here's", "hers", "herself", "him", "himself", "his", "how", "how's", "i", "i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "isn't", "it", "it's", "its", "itself", "let's", "me", "more", "most", "mustn't", "my", "myself", "no", "nor", "not", "of", "off", "on", "once", "only", "or", "other", "ought", "our", "ours", "ourselves", "out", "over", "own", "same", "shan't", "she", "she'd", "she'll", "she's", "should", "shouldn't", "so", "some", "such", "than", "that", "that's", "the", "their", "theirs", "them", "themselves", "then", "there", "there's", "these", "they", "they'd", "they'll", "they're", "they've", "this", "those", "through", "to", "too", "under", "until", "up", "very", "was", "wasn't", "we", "we'd", "we'll", "we're", "we've", "were", "weren't", "what", "what's", "when", "when's", "where", "where's", "which", "while", "who", "who's", "whom", "why", "why's", "with", "won't", "would", "wouldn't", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself"]
      #cleanse all stop words from search query
      @search_terms = strip_tags(params[:q]).downcase.split.delete_if{|x| stopwords.include?(x)}

      #declare empty index hash
      @search_index = {}

      #filter through each view and add view text to search entry
      Rails.root.join('app', "views").entries.each do |view_dir| 
        unless %w(. .. search shared layouts).include?(view_dir.to_s) 
          Rails.root.join('app', "views", view_dir.to_s).entries.each do |view| 
            unless %w(. ..).include?(view.to_s)
              #add relative path for view and processed contents to search index hash as key, value pair
              @search_index["/" + view_dir.to_s + "/" + view.to_s.gsub('.html.erb', '')] = strip_tags(IO.read(Rails.root.join('app', "views", view_dir.to_s, view.to_s))).downcase.squish.split.delete_if{|x| stopwords.include?(x)}.join(" ")
            end
          end
        end
      end

    end
  end

end

如果有人有任何改进或建议,我很乐意听到他们!

EN

回答 1

Stack Overflow用户

发布于 2013-08-03 01:44:56

如果您有一个预定义的搜索术语列表及其匹配的视图,则可以使用硬编码的术语索引实现静态页面搜索功能的有限版本:

代码语言:javascript
复制
# app/controllers/searches_controller.rb
class SearchesController < ApplicationController
  def index
    query = params[:query]

    # Convert query string to lowercase tokens, e.g. 
    # /search?query=cAT+aNd+doG => ['cat', 'and', 'dog']
    terms = query.downcase.split

    # Match each search term against the index, collecting all matching pages.
    @pages = terms.collect do |term|
      get_search_index[term]
    end

    # Remove nil objects resulting from terms not matching anything.
    @pages.compact!

    # Flatten all nested arrays into one array of pages for easy looping.
    @pages.flatten!
  end

  private
    def get_search_index
      @@index ||= {
        "homepage" => [
          {:path => root_path, :name => "Home"}
        ],
        "home" => [
          {:path => root_path, :name => "Home"}
        ],
        "user" => [
          {:path => new_user_path, :name => "Create New User"}, 
          {:path => users_path, :name => "User Index"}
        ]
      }
    end
end

现在请看一看:

代码语言:javascript
复制
# app/views/searches/index.html.erb
Search results:
<ol>
<% @pages.each do |page| %>
    <li><%= link_to page[:name], page[:path] %></li>
<% end %>
</ol>

现在您可以转到/searches?query=some+User+page,新的用户表单和用户索引页面应该都显示在搜索结果中(因为“用户”项匹配)。

您也可以扩展这种静态方法以获得更多的花哨。例如,与硬编码术语不同,您实际上可以从静态页面中选取一个文本体,并将其拆分为术语。例如:

代码语言:javascript
复制
# Get a corpus, lowercase it, replace punctuation with whitespace, and tokenize.
@@homepage_terms = "Block of homepage text.".downcase.gsub(/[^a-z0-9]/, ' ').split
@@about_page_terms = "Block of about page text.".downcase.gsub(/[^a-z0-9]/, ' ').split

def get_search_index
  # Memoize the index so we only build it once.
  @@index ||= build_search_index
end

def build_search_index
  index = {}
  @@homepage_terms.each do |term|
    index[term] ||= []
    index[term] << {path: root_path, name: "Home"}
  end
  @@about_page_terms.each do |term|
    index[term] ||= []
    index[term] << {path: root_path, name: "About page"}
  end
  index
end

您的build_search_index函数可以得到您想要的复杂和功能丰富。本质上,你所做的就是重新发明轮子。Solr和其他搜索后端是为您创建的。我把做排名、从磁盘读取视图文件和HTML清理作为读者的练习:)

如果您想要更动态的内容,即随着页面的更改而自动调整的内容,它将看起来类似,只是通过扫描视图文件夹生成哈希。然而,动态方法要复杂得多。例如,你从哪里得到搜索结果的人类可读的页面标题?该元数据不包含在Rails中的任何地方。另外,您如何知道哪些页面是获取页面?如果你有安全的页面或者你不想出现在搜索结果中的页面,那该怎么办?再加上你得解释下再培训局。在服务器端呈现静态视图并不是不可能的,但是如果每个用户的这些页面有所不同呢?那么国际化呢?

对我来说,像我上面给出的那种手工策划的解决方案似乎是最好的选择。如果你的网站和大多数网站一样,你的静态页面不会有太大的变化,所以维护应该不会是个大问题。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/18028331

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档