我正在尝试使用r中的jieba包将中文句子从"content“列分割成单词,然后创建一个新的相应列" words”,其中每一行都包含前一个"content“列中相应行的分词。
df$content (3 rows):
我喜歡吃雞翅;我不喜歡吃雞;哇這是什麼醬做得雞翅?
desired df$words (3 rows):
我 喜歡 吃 雞翅;我 不 喜歡 吃 雞;哇 這 是 什麼 醬 做 得 雞翅?其中单词列具有3个与内容列的分段版本相对应的行。
jieba包对中文单词的切分做得很好,但我在将切分的单词保持在1行内时遇到了麻烦。街霸分词程序似乎对"content“列中的所有单词进行了分段,然后将每个单词视为单独的行。我真的被困在如何解决这个问题上-我需要更改回收的矢量的数量吗??任何帮助都将不胜感激。
这是我的代码:
df$words <- qseg <= df$content这将返回错误:
Error: Assigned data `df$words <- qseg <= df$content` must be compatible with existing data. x Existing data has 29175 rows. x Assigned data has 1327701 rows. ℹ Only vectors of size 1 are recycled. Run `rlang::last_error()` to see where the error occurred.
15.
stop(fallback)
14.
signal_abort(cnd)
13.
cnd_signal(error_assign_incompatible_size(nrow, value, j, i_arg, value_arg))
12.
(function (cnd) { cnd_signal(error_assign_incompatible_size(nrow, value, j, i_arg, value_arg)) ...
11.
signalCondition(cnd)
10.
signal_abort(cnd)
9.
abort(message, class = c(class, "vctrs_error"), ...)
8.
stop_vctrs(x_size = x_size, y_size = size, x_arg = x_arg, class = c("vctrs_error_incompatible_size", "vctrs_error_recycle_incompatible_size"))
7.
stop_recycle_incompatible_size(x_size = 1327701L, size = 29175L, x_arg = "")
6.
vec_recycle(value[[j]], nrow)
5.
withCallingHandlers(for (j in seq_along(value)) { if (!is.null(value[[j]])) { value[[j]] <- vec_recycle(value[[j]], nrow) } ...
4.
vectbl_recycle_rhs(value, fast_nrow(x), length(j), i_arg = NULL, value_arg)
3.
tbl_subassign(x, i = NULL, as_string(name), list(value), i_arg = NULL, j_arg = name, value_arg = substitute(value))
2.
`$<-.tbl_df`(`*tmp*`, testing, value = c("网友", "爆料", "网友", "在", "宝鸡", "贴", "吧", "发帖", "称", "有人", "在", "铁路", "门口", "摆放", "花圈", "灵堂", "抗议", "据", "未", "证实", "消息", "说", "期间", "新", "与", ...
1.
`$<-`(`*tmp*`, testing, value = c("网友", "爆料", "网友", "在", "宝鸡", "贴", "吧", "发帖", "称", "有人", "在", "铁路", "门口", "摆放", "花圈", "灵堂", "抗议", "据", "未", "证实", "消息", "说", "期间", "新", "与", "争执", ...发布于 2020-10-27 04:21:29
更新:我成功了!对于将来在jieba中遇到这个问题的人,请使用"chinese.misc“包,seg_file()函数:https://cran.r-project.org/web/packages/chinese.misc/chinese.misc.pdf。它只需要一行代码就能完美地工作!我花了两天的时间对此感到沮丧,结果发现这只是错误的jieba包。感谢上帝给了我这个chinese.misc包!
https://stackoverflow.com/questions/64525312
复制相似问题