我试图使用活动记录在Ruby中进行一些基本的文本匹配。
Here is my code so far;
require 'active_record'
require 'yaml'
require 'pg'
require 'pry'
require 'FileUtils'
$config = '
adapter: postgresql
database: edgar
username: YYYYY
password:
host: 127.0.0.1'
ActiveRecord::Base.establish_connection(YAML::load($config))
class Doc < ActiveRecord::Base; end
class Eightk < ActiveRecord::Base; end
directory = "disease" #Creates a directory called disease
FileUtils.mkpath(directory) # Makes the directory if it doesn't exists
cancer = Eightk.where("text ilike '%cancer%'")
death = Eightk.where("text ilike '%death%'")
cancer.each do |filing| #filing can be used instead of eightks
filename = "#{directory}/#{filing.doc_id}.html"
File.open(filename,"w").puts filing.text
puts "Storing #{filing.doc_id}..."
death.each do |filing| #filing can be used instead of eightks
filename = "#{directory}/#{filing.doc_id}.html"
File.open(filename,"w").puts filing.text
puts "Storing #{filing.doc_id}..."
end
end我有一个很长的条件清单,我想寻找;
谢谢
发布于 2014-09-01 04:13:31
也许就像
keywords = %w(cancer death anotherone)
records = Eightk.where keywords.map{|w| "(text ILIKE '%#{w}%')"}.join(' OR ')
records.each do |filing|
filename = "#{directory}/#{filing.doc_id}.html"
File.open(filename,"w").puts filing.text
end否则,您可以使用“类似于”或“POSIX”http://www.postgresql.org/docs/8.1/static/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP,然后您可以使用正则表达式。
例如
Eightk.where "text SIMILAR TO '%(#{keywords.join '|' })%'"POSIX允许您检查单词的开始和结束,这样您可以只检查一个完整的单词匹配(例如在,death、death或death.上匹配,而不是deathbed等)。
我会把regex的东西留给有更好regex的人。)
https://stackoverflow.com/questions/25598679
复制相似问题