我有一个数据帧,需要从中选择几列。但是对于一列,我只需要数据的一部分。
> df <- data.frame(doc_name = c('AXX_1324', 'BXX_3423', 'AXX_2343', 'BXX_3453', 'AXX_9872','AXX_9876'),
+ Branch = c('AMM','GGM','AMM','CBB','GGM','GGM'),
+ Revenue = rnorm(6,50,5))
> df
doc_name Branch Revenue
1 AXX_1324 AMM 55.95013
2 BXX_3423 GGM 43.63848
3 AXX_2343 AMM 47.31363
4 BXX_3453 CBB 47.59680
5 AXX_9872 GGM 46.94639
6 AXX_9876 GGM 45.28648
> df %>% select(doctype = substr(df$doc_name,1,3),Revenue)
Error: Unknown columns `AXX`, `BXX`, `AXX`, `BXX`, `AXX` and `AXX`
Call `rlang::last_error()` to see a backtrace
> 预期输出:
doctype Revenue
AXX 55.95013
BXX 43.63848
AXX 47.31363
BXX 47.59680
AXX 46.94639
AXX 45.28648我也尝试了"substring“而不是substr,但得到了相同的错误。有人能告诉我怎么做吗?
发布于 2020-03-20 17:55:44
使用mutate更改列值,使用select选择列
library(dplyr)
df %>%
mutate(doctype = substr(doc_name,1,3)) %>%
select(doctype, Revenue)
# doctype Revenue
#1 AXX 54.25022
#2 BXX 45.37344
#3 AXX 54.46791
#4 BXX 45.29495
#5 AXX 52.69476
#6 AXX 49.09013正如@hendrikvanb提到的,我们也可以在这里使用transmute:
df %>% transmute(doctype = substr(doc_name,1,3), Revenue)发布于 2020-03-21 01:57:49
我们可以使用separate
library(tidyr)
library(dplyr)
df %>%
separate(doc_name, into = c('doctype', 'other')) %>%
select(doctype, Revenue)
# doctype Revenue
#1 AXX 50.77699
#2 BXX 47.04387
#3 AXX 39.87008
#4 BXX 54.13617
#5 AXX 46.59901
#6 AXX 34.37392或者使用str_remove,以防长度不同
library(stringr)
df %>%
transmute(doctype = str_remove(doc_name, "_.*"), Revenue)https://stackoverflow.com/questions/60771887
复制相似问题