首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Scala Regex对UCI数据集的帮助

Scala Regex对UCI数据集的帮助
EN

Stack Overflow用户
提问于 2015-10-13 06:37:29
回答 1查看 61关注 0票数 0

嗨,伙计们,我正在用scala解析newsgroups.tar.gz中的一些数据

是我试图处理的文本:

代码语言:javascript
复制
val inputData = ""xref: cantaloupe.srv.cs.cmu.edu alt.atheism:51121 soc.motss:139944 rec.scouting:5318
newsgroups: alt.atheism,soc.motss,rec.scouting
path: cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!fs7.ece.cmu.edu!europa.eng.gtefsd.com!howland.reston.ans.net!wupost!uunet!newsgate.watson.ibm.com!yktnews.watson.ibm.com!watson!watson.ibm.com!strom
from: strom@watson.ibm.com (rob strom)
subject: re: [soc.motss, et al.] "princeton axes matching funds for boy scouts"
sender: @watson.ibm.com
message-id: <1993apr05.180116.43346@watson.ibm.com>
date: mon, 05 apr 93 18:01:16 gmt
distribution: usa
references: <c47efs.3q47@austin.ibm.com> <1993mar22.033150.17345@cbnewsl.cb.att.com> <n4hy.93apr5120934@harder.ccr-p.ida.org>
organization: ibm research
lines: 15

in article <n4hy.93apr5120934@harder.ccr-p.ida.org>, n4hy@harder.ccr-p.ida.org (bob mcgwier) writes:

|> [1] however, i hate economic terrorism and political correctness
|> worse than i hate this policy.  


|> [2] a more effective approach is to stop donating
|> to any organizating that directly or indirectly supports gay rights issues
|> until they end the boycott on funding of scouts.  

can somebody reconcile the apparent contradiction between [1] and [2]?

-- 
rob strom, strom@watson.ibm.com, (914) 784-7641
ibm research, 30 saw mill river road, p.o. box 704, yorktown heights, ny  10598"

,这是我需要的输出

代码语言:javascript
复制
in article <n4hy.93apr5120934@harder.ccr-p.ida.org>, n4hy@harder.ccr-p.ida.org (bob mcgwier) writes:

|> [1] however, i hate economic terrorism and political correctness
|> worse than i hate this policy.  


|> [2] a more effective approach is to stop donating
|> to any organizating that directly or indirectly supports gay rights issues
|> until they end the boycott on funding of scouts.  

can somebody reconcile the apparent contradiction between [1] and [2]?

,这是我尝试过的:

代码语言:javascript
复制
val docParser = """([\\s\\S]+\\lines: \\d*)([\\s\\S]*\\n\\n)([\\s\\S]*)""".r
val docParser(metadata, content, footer) = inputText

但是我得到了以下错误:

scala.MatchError:[Ljava.lang.String;@62f8fff1 [Ljava.lang.String;]

不过,在线regex构建器似乎有效:

有什么想法吗?)

EN

回答 1

Stack Overflow用户

发布于 2015-10-13 06:42:04

我以前从未用scala编写过程序,但从我在expressions.htm中可以看到的情况来看,您必须转义两次类似于数字的东西。

因此,\d将成为scala中的\\d等等。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/33095773

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档