首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >stringr::str_sub输出是意外的

stringr::str_sub输出是意外的
EN

Stack Overflow用户
提问于 2016-08-29 15:30:28
回答 2查看 265关注 0票数 2

考虑一下折叠式data.frame:

代码语言:javascript
复制
df <- structure(list(sufix = c("atizado", "atoria", "atório", "auta", 
                         "áutico", "ável"), min_stem_len = c(4, 5, 3, 5, 4, 2), replacement = c("", 
                                                                                                "", "", "", "", ""), exceptions = list(character(0), character(0), 
                                                                                                                                       character(0), character(0), character(0), c("afável", "razoável", 
                                                                                                                                                                                   "potável", "vulnerável"))), .Names = c("sufix", "min_stem_len", 
                                                                                                                                                                                                                          "replacement", "exceptions"), row.names = 21:26, class = c("tbl_df", 
                                                                                                                                                                                                                                                                                    "tbl", "data.frame"))

我在这个sufix的变量data.frame中有一个字符串列表。现在我有了一个单词word <- "amável",我想得到这个单词的sufix,长度与df$sufix的每个字相同。

我用的是折页码:

代码语言:javascript
复制
library(stringr)
word <- "amável"
str_sub(word, start = -stringr::str_length(df$sufix))

但这会产生这样的结果:

代码语言:javascript
复制
> str_sub(word, start = -stringr::str_length(df$sufix))
[1] "amável" "mável"  "mável"  "vel"    "mável"  "vel"   
> df$sufix
[1] "atizado" "atoria"  "atório"  "auta"    "áutico"  "ável"

我原以为得到的向量的最后一个元素是"ável“,因为:

代码语言:javascript
复制
> str_length("ável")
[1] 4
> str_sub(word, start = -4)
[1] "ável"

这里有一个更简单、可重复的例子:

代码语言:javascript
复制
set.seed(100)
a <- sample(1:10, 10000, replace = T)
res <- rep("ábc", 10000) %>% str_sub(start = -a)
sum(ifelse(a > 3, 3, a) != str_length(res))
[1] 2504
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-03-21 14:01:22

这已在stringi的开发分支中得到修正,请参见https://github.com/gagolews/stringi/issues/227 (因为stringrstr_sub依赖于stringi中的stri_sub )。一旦对CRAN进行了更新,任何来自“一般公众”的人都应该复制正确的行为:

代码语言:javascript
复制
str_sub(word, start = -stringr::str_length(df$sufix))
## [1] "amável" "amável" "amável" "ável"   "amável" "ável"  
票数 1
EN

Stack Overflow用户

发布于 2016-08-29 19:57:08

如果你注意到,所有的结果都是错误的(除了第一个)。

他们应该是

代码语言:javascript
复制
[1] "amável" "amável" "amável" "ável"   "amável" "ável" 

这件事可以很容易地通过

代码语言:javascript
复制
library(stringi)
stri_sub(rep(word, 6), from = -stri_length(df$suffix))

我敢打赌,您同样可以重用您的stringr代码。

###编辑以添加###

我现在明白你的意思了。当然,有一种奇怪的行为,很可能是对á这个特殊角色的认识。见下面的例子:

代码语言:javascript
复制
df <- data.frame(suffix = c("Lorem","ipsum","dolor","sit","amet","consectetur","adipiscing", "elit","Donec","arcu")) 
df$len <- stri_length(df$suffix)

然后看看结果的第7元素中的奇怪行为:

代码语言:javascript
复制
stri_sub("amavel", from = -df$len)
##  [1] "mavel"  "mavel"  "mavel"  "vel"    "avel"   "amavel" "amavel" "avel"  
##  [9] "mavel"  "avel" 

# Compared to
stri_sub("amável", from = -df$len)
##  [1] "mável"  "mável"  "mável"  "vel"    "ável"   "amável" "mável"  "ável"  
##  [9] "mável"  "ável"

奇怪的是,在最后一种情况下,如果使用rep,则会更正结果:

代码语言:javascript
复制
stri_sub(rep("amável", 10), from = -df$len)
## [1] "mável"  "mável"  "mável"  "vel"    "ável"   "amável" "amável" "ável"  
## [9] "mável"  "ável"

# note how the 7th element is now correct.

因此,尽管这有点麻烦,但--上面提供的解决方案--应该可以运行

我试着查看stri_sub的代码,它引用了C_stri_sub,但对我来说,这是一个死胡同。也许一个更熟悉C和/或字符串操作的人会来帮忙呢?

###第二次编辑###

在我看来,问题在于对stri_sub调用中字符串的重复。查看您在编辑中添加的代码的替代代码:

代码语言:javascript
复制
set.seed(100)
a <- sample(1:10, 10000, replace = TRUE)
res <- stri_sub(rep("ábc", 10000), from = -a)
sum(ifelse(a > 3, 3, a) != stri_length(res))
## [1] 0
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/39209945

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档