我有LISP格式的数据,需要在RapidMiner中处理它们。我对LISP和RapidMiner都是新手。RapidMiner不接受LISP (我猜这是因为它是编程语言),所以我可能需要以某种方式将LISP格式转换为CSV或类似的格式。代码的小示例:
(def-instance Adelphi
(state newyork)
(control private)
(no-of-students thous:5-10)
...)
(def-instance Arizona-State
(state arizona)
(control state)
(no-of-students thous:20+)
...)
(def-instance Boston-College
(state massachusetts)
(location suburban)
(control private:roman-catholic)
(no-of-students thous:5-10)
...)如果有任何建议,我将不胜感激。
发布于 2011-12-01 06:42:08
您可以利用Lisp的解析器对Lisp用户可用这一事实。这些数据的一个问题是有些值包含冒号,在Common Lisp中使用包名分隔符。我编写了一些可用的Common Lisp代码来解决您的问题,但我必须通过定义适当的包来解决所提到的问题。
以下是代码,当然,对于问题中示例中遗漏的所有内容,必须对其进行扩展(遵循已在其中使用的相同模式):
(defpackage #:thous
(:export #:5-10 #:20+))
(defpackage #:private
(:export #:roman-catholic))
(defstruct (college (:conc-name nil))
(name "")
(state "")
(location "")
(control "")
(no-of-students ""))
(defun data->college (name data)
(let ((college (make-college :name (write-to-string name :case :capitalize))))
(loop for (key value) in data
for string = (remove #\| (write-to-string value :case :downcase))
do (case key
(state (setf (state college) string))
(location (setf (location college) string))
(control (setf (control college) string))
(no-of-students (setf (no-of-students college) string))))
college))
(defun read-data (stream)
(loop for (def-instance name . data) = (read stream nil nil)
while def-instance
collect (data->college name data)))
(defun print-college-as-csv (college stream)
(format stream
"~a~{,~a~}~%"
(name college)
(list (state college)
(location college)
(control college)
(no-of-students college))))
(defun data->csv (in out)
(let ((header (make-college :name "College"
:state "state"
:location "location"
:control "control"
:no-of-students "no-of-students")))
(print-college-as-csv header out)
(dolist (college (read-data in))
(print-college-as-csv college out))))
(defun data-file-to-csv (input-file output-file)
(with-open-file (in input-file)
(with-open-file (out output-file
:direction :output
:if-does-not-exist :create
:if-exists :supersede)
(data->csv in out))))主要函数是data-file-to-csv,加载此代码后,可以在Common Lisp REPL中使用(data-file-to-csv "path-to-input-file" "path-to-output-file")调用该函数。
编辑:一些额外的想法
实际上,与使用冒号为所有值添加包定义相比,执行正则表达式搜索并替换所有值以在所有值周围添加引号(“)会更容易。这将使Lisp立即将它们解析为字符串。在这种情况下,可以删除for string = (remove #\| (write-to-string value :case :downcase))行,并在case语句的所有行中将string替换为value。
由于数据的高度规则性,实际上根本不需要正确地解析Lisp定义。相反,您可以只使用正则表达式提取数据。一种特别适合于基于正则表达式的文本文件转换的语言应该很适合这项工作,比如AWK或Perl。
https://stackoverflow.com/questions/8324659
复制相似问题