文章/答案/技术大牛

发布

社区首页 >问答首页 >R删除仅包含数字的数据帧条目中的数字

问R删除仅包含数字的数据帧条目中的数字
EN

Stack Overflow用户

提问于 2017-12-02 06:21:14

回答 3查看 12.9K关注 0票数 6

我正在从一个在线csv文件中读取数据框，但是创建该文件的人意外地将一些数字输入到应该只是城市名称的列中。cities.data表的示例。

City        Population   Foo   Bar
Seattle     10           foo1  bar1
98125       20           foo2  bar2
Kent 98042  30           foo3  bar3
98042 Kent  30           foo4  bar4

删除城市列中仅包含数字的行后所需的输出：

City        Population   Foo   Bar
Seattle     10           foo1  bar1
Kent 98042  30           foo3  bar2
98042 Kent  30           foo4  bar4

我想删除城市列中只包含数字的行。Kent 98042和98042 Kent都可以，因为它包含城市名称，但是由于98125不是一个城市，所以我删除了该行。

我不能使用is.numeric，因为数字在csv文件中是以字符串的形式读取的。我试过使用正则表达式，

cities.data <- cities.data[which(grepl("[0-9]+", cities.data) == FALSE)]

但这会删除包含任何数字的行，而不是只包含数字的行，例如

City        Population   Foo   Bar
Seattle     10           foo1  bar1

尽管我想保留那一行，"Kent 98042"还是被删除了。有什么建议吗？请和谢谢！

regex

dataframe

filter

dplyr

回答 3

Stack Overflow用户

回答已采纳

发布于 2017-12-02 06:31:59

使用普通R：

df <- data.frame(City = c('Seattle', '98125', 'Kent 98042'),
                 Population = c(10, 20, 30),
                 Foo = c('foo1', 'foo2', 'foo3'))
df2 <- df[-grep('^\\d+$', df$City),]
df2

这就产生了

        City Population  Foo
1    Seattle         10 foo1
3 Kent 98042         30 foo3

其思想是查找^\d+$ (仅数字)，并从集合中删除这些数字。注意两边的锚。

票数 1

Stack Overflow用户

发布于 2017-12-02 06:33:36

如果您根本不需要城市列中的数字：

# replace all numbers with empty string
cities.data$City <- gsub("[0-9]+", "", cities.data$City) 
# drop observations that are only empty strings
cities.data <- cities.data[cities.data$City!="",]

编辑：这应该可以处理更新后的示例中的所有情况，其中数字可以在字符串中的任何位置。

票数 3

Stack Overflow用户

发布于 2017-12-02 06:26:09

df = read.table(text = "
City        Population   Foo   Bar
Seattle     10           foo1  bar1
98125       20           foo2  bar2
Kent98042  30           foo3  bar2
", header=T, stringsAsFactors=F)

library(dplyr)

df %>% filter(is.na(as.numeric(City)))

#        City Population  Foo  Bar
# 1   Seattle         10 foo1 bar1
# 2 Kent98042         30 foo3 bar2

其思想是，当我们对字符变量应用as.numeric时，只有当它是一个数字时，它才不会返回NA值。

如果你想使用base R，你可以使用这个：df[is.na(as.numeric(df$City)),]

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/47602313

复制

相似问题

问R删除仅包含数字的数据帧条目中的数字
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R删除仅包含数字的数据帧条目中的数字EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R删除仅包含数字的数据帧条目中的数字
EN