首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用R中的Rex包解析服务器日志

用R中的Rex包解析服务器日志
EN

Stack Overflow用户
提问于 2021-04-28 08:47:35
回答 1查看 51关注 0票数 2

我有要解析的服务器日志数据格式。

在这里前两行

代码语言:javascript
复制
test <- c("5638052581 \"Norway|Oslo County|Oslo|3163036322|503858711|160449504|y|\" n - - [31/Oct/2019:13:00:01 +0000] \"GET /P04_AL?args=app_01&distributor=p4&player=app&playeros=ios&referrer=1&station=1&codec=aac&quality=low&deviceid=1D6A84DA-92A6-4AD1-A2A3-1AB20D2263B2&listenerid=61D1F2EB-7B35-4434-9D8B-A6D074BE28F0&userid=fczUdjf5yEU8j4JlZHG4JXABgiZ2&aw_0_1st.audience=%5B%22P7ActiveListeners%22,%20%22p5hitsactive%22,%20%22P6ActiveListeners%22,%20%22P4ActiveListeners%22,%20%22AppInstalledP4%22%5D HTTP/1.1\" 200 4305805 \"-\" \"AppleCoreMedia//1.0.0.17B84 (iPhone; U; CPU OS 13_2 like Mac OS X; nb_no)\" 702", "616118387 \"Netherlands|North Holland|Haarlem|631068861|616118387|862817723||\" n - - [31/Oct/2019:13:00:01 +0000] \"GET /P04_MH HTTP/1.1\" 200 519546 \"-\" \"MultiRoomAudioPlayer//5.1\" 6")

我试图像下面这样使用雷克斯包,但是经常会遇到意外输入的错误。我做错什么了?有人能帮我做这个吗。以下是我对一个记录的尝试(向量的第一个元素)

代码语言:javascript
复制
library(rex)
re_logic <- rex(
  
  capture(name = "process_id", digits),
  "`\´",
  capture(name = "country", non_spaces),
  "|",
  capture(name = "county", non_spaces),
  "|",
  capture(name = "city", non_spaces), 
  "|",
  capture(name = "x1", digits), 
  "|",
  capture(name = "x2", digits),
  "|",
  capture(name = "x3", digits),
  "|",
  capture(name = "process_name", alpha),
  "`n - -´",
  spaces,
  "[",
  capture(name = "accept_date", except_some_of("]")),
  "]",
  spaces,
  "`\´",
  capture(name = "http_request", non_quotes),
  "`\´",
  spaces,
  capture(name = "status_code", digits),
  spaces,
  capture(name = "bytes_read", some_of("+", digit)),
  "`" \"´",
  capture(name = "actconn", digits),
  "`//´",
  spaces,
  "(",
  capture(name = "Tr", non_quotes),
  ";" )
  

# sample view
re_matches(test, re_logic) %>% as_tibble()
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-04-28 10:20:06

您可以使用

代码语言:javascript
复制
re_logic <- rex(
  capture(name = "process_id", digits),
  spaces, quote,
  capture(name = "country", except_some_of("|")),
  "|",
  capture(name = "county", except_some_of("|")),
  "|",
  capture(name = "city", except_some_of("|")), 
  "|",
  capture(name = "x1", digits), 
  "|",
  capture(name = "x2", digits),
  "|",
  capture(name = "x3", digits),
  "|",
  capture(name = "process_name",  zero_or_more(alpha)),
  "|", quote, spaces, "n", spaces, "-", spaces, "-",spaces,
  "[",
  capture(name = "accept_date", except_some_of("]","[")),
  "]",
  spaces, quote,
  capture(name = "http_request", non_quotes),
  quote, spaces,
  capture(name = "status_code", digits),
  spaces,
  capture(name = "bytes_read", some_of("+", digit)),
  spaces, quote, non_quotes, quote, spaces, quote,
  capture(name = "actconn", except_some_of(quote, "/")),
  "/", non_spaces,
  maybe(
    spaces, "(",
    capture(name = "Tr", except_some_of(";"))
  )
)
re_matches(test, re_logic)

regex演示

注意到

  • 我使用quote来匹配任何'"字符
  • 我没有使用non_spaces来匹配地理名称,而是使用了任何字符(除了|模式,except_some_of("|") )
  • Tr部件是可选的,因此需要用maybe子句包装与该组相关的模式链。
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67296836

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档