我现在使用了一段时间的stringi包,一切都很好。
我最近想将一些正则表达式放入一个函数中,并将该函数存储在一个单独的文件中。如果函数是从脚本加载的,代码就能正常工作,但是当它被源码时,我得不到预期的结果。
以下是重现该问题的代码:
clean <- function(text){
stri_replace_all_regex(str = text,
pattern = "(?i)[^a-zàâçéèêëîïôûùüÿñæœ0-9,\\.\\?!']",
replacement = " ")
}
text <- "A sample text with some french accent é, è, â, û and some special characters |, [, ( that needs to be cleaned."
clean(text) # OK
[1] "A sample text with some french accent é, è, â, û and some special characters , , that needs to be cleaned."
source(clean.r)
clean(text) # KO
[1] "A sample text with some french accent , , , and some special characters , , that needs to be cleaned."我想删除除字母、重音字母和标点符号以外的所有字符?、!、,和.。
如果函数直接加载到脚本中,代码就能正常工作。如果它是来源的,那么它会给出不同的结果。
我也尝试过使用stringr,我也遇到了同样的问题。我的文件以UTF-8编码保存。
我不明白为什么会发生这种情况,任何帮助都是非常感谢的。
谢谢。
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringi_1.1.5 data.table_1.10.4
loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1 yaml_2.1.14 发布于 2018-08-03 22:55:00
请先尝试将文本转换为ASCII。这将改变字符,并可能允许在R中编写函数时的相同行为。
+1到Felipe Alvarenga https://stackoverflow.com/a/45941699/2069472
text <- "Ábcdêãçoàúü"
iconv(text, to = "ASCII//TRANSLIT")https://stackoverflow.com/questions/46303700
复制相似问题