我需要将'fiction_work‘专栏(见图)分为三个独立的专栏“工作”、“作者”、“年份”。
图片:https://i.stack.imgur.com/nRat5.jpg
我试过这个,但只想把“作品”和“作者”分开。我真的不明白如何才能把这一年分开。
分隔<-分离(合计,col =‘虚构_works’,into = c('work','author'),sep= ",")
我正在尽我最大的努力来提高我的R技能,但无法理解这一项。任何帮助都是非常感谢的。提前谢谢。
发布于 2022-09-21 16:31:31
这可以通过使用正则表达式很容易地使用dplyr和str_extract来完成。
可复制数据
library(tidyverse)
df <- data.frame(fiction_works = c("The A.B.C Murders (1936), Agatha Christie",
"A ton image (1998), Louise L. Lambrichs",
"About A Boy (1998), Nick Horriby"))溶液
df2 <- df %>%
mutate(Work = str_extract(string = fiction_works, pattern = ".+(?=\\s\\()"),
Author = str_extract(string = fiction_works, pattern = "(?<=,\\s).+"),
Year = str_extract(string = fiction_works, pattern = "[0-9]+")) %>%
select(Work:Year)
df2
Work Author Year
1 The A.B.C Murders Agatha Christie 1936
2 A ton image Louise L. Lambrichs 1998
3 About A Boy Nick Horriby 1998如果标题中有数字,您可能会遇到问题,但我无法通过发布的图片判断您是否有此问题。
发布于 2022-09-21 16:40:37
library(tidyverse)
df %>%
extract(fiction_works, c("work", "year", "author"), "(.*?) [(](\\d+)[), ]+(.*)")
work year author
1 The A.B.C Murders 1936 Agatha Christie
2 A ton image 1998 Louise L. Lambrichs
3 About A Boy 1998 Nick Horriby发布于 2022-09-21 16:50:33
使用base R
read.csv(text = sub("\\)", "", sub("\\s*\\(", ",", df$fiction_works)),
header = FALSE, col.names = c("work", "year", "author"))-output
work year author
1 The A.B.C Murders 1936 Agatha Christie
2 A ton image 1998 Louise L. Lambrichs
3 About A Boy 1998 Nick Horribyhttps://stackoverflow.com/questions/73803796
复制相似问题