我有几根像下面这样的圆木
endeavor.fujitsu.co.jp - - [10/Jul/1995:00:00:15 -0400] "GET /images/ HTTP/1.0" 200 17688
ad13-022.compuserve.com - - [10/Jul/1995:00:00:15 -0400] "GET /history/gemini/gemini-spacecraft.txt HTTP/1.0" 200 651
pm2-15.magicnet.net - - [10/Jul/1995:00:00:15 -0400] "GET /images/launch-logo.gif HTTP/1.0" 200 1713
204.239.199.40 - - [10/Jul/1995:00:00:16 -0400] "GET /shuttle/missions/sts-71/images/KSC-95EC-0613.gif HTTP/1.0" 200 45970
pm1-4.tricon.net - - [10/Jul/1995:00:00:17 -0400] "GET /images/WORLD-logosmall.gif HTTP/1.0" 200 669
scorpio.digex.net - - [10/Jul/1995:00:00:19 -0400] "GET /history/mercury/mr-3/mr-3.html HTTP/1.0" 200 1124我需要从上面的日志中提取路径。这是我试过的代码
val pattern = "\\s+([^\\s]+)\\s+HTTP".r
val match = pattern.findFirstIn(log)这是我得到的输出。
/images/ HTTP
/history/gemini/gemini-spacecraft.txt HTTP
/images/launch-logo.gif HTTP
/shuttle/missions/sts-71/images/KSC-95EC-0613.gif HTTP
/images/WORLD-logosmall.gif HTTP
/history/mercury/mr-3/mr-3.html HTTP如何摆脱上述路径中的HTTP?
发布于 2019-07-11 18:46:47
发布于 2019-07-11 18:48:35
您的匹配位于第一个捕获组()中,您可以将其缩短为:
\s(\S+)\s+HTTP 在斯卡拉
val pattern = "\\s(\\S+)\\s+HTTP".r您可以使用findAllIn获得日志:
val pattern = "\\s(\\S+)\\s+HTTP".r
val strings = List(
"""endeavor.fujitsu.co.jp - - [10/Jul/1995:00:00:15 -0400] "GET /images/ HTTP/1.0" 200 17688 """,
"""ad13-022.compuserve.com - - [10/Jul/1995:00:00:15 -0400] "GET /history/gemini/gemini-spacecraft.txt HTTP/1.0" 200 651 """,
"""pm2-15.magicnet.net - - [10/Jul/1995:00:00:15 -0400] "GET /images/launch-logo.gif HTTP/1.0" 200 1713 """,
"""204.239.199.40 - - [10/Jul/1995:00:00:16 -0400] "GET /shuttle/missions/sts-71/images/KSC-95EC-0613.gif HTTP/1.0" 200 45970 """,
"""pm1-4.tricon.net - - [10/Jul/1995:00:00:17 -0400] "GET /images/WORLD-logosmall.gif HTTP/1.0" 200 669 """,
"""scorpio.digex.net - - [10/Jul/1995:00:00:19 -0400] "GET /history/mercury/mr-3/mr-3.html HTTP/1.0" 200 1124"""
)
strings.foreach { log =>
val m = pattern.findAllIn(log).group(1)
println(m)
}结果
/images/
/history/gemini/gemini-spacecraft.txt
/images/launch-logo.gif
/shuttle/missions/sts-71/images/KSC-95EC-0613.gif
/images/WORLD-logosmall.gif
/history/mercury/mr-3/mr-3.html若要与注释中的该行匹配,请执行以下操作:
columbia.acc.brad.ac.uk -10/7/1995:00:52:36 -0400 "GET /ksc.html“200 7067
你可以使用:
\S+ (/(?:[^/\s]+/)*[^\s"]+)https://stackoverflow.com/questions/56995462
复制相似问题