我有一个dataframe,它的列具有字符串形式的JSON数组。我的目标是解析列并将其转换为one-hot编码,但在解析JSON时遇到错误。
library(jsonlite)
> df <- data_frame(Amenities=c("[\"Parking\", \"Lawn\", \"Garage\", \"Frontyard\"]", "[\"Parking\", \"Lawn\", \"Garage\", \"Backyard\"]", "[\"Parking\", \"Lawn\", \"Garage\"]"))
> df
# A tibble: 3 x 1
Amenities
<chr>
1 "[\"Parking\", \"Lawn\", \"Garage\", \"Frontyard\"]"
2 "[\"Parking\", \"Lawn\", \"Garage\", \"Backyard\"]"
3 "[\"Parking\", \"Lawn\", \"Garage\"]"
> df <- df %>% mutate(Amenities=fromJSON(Amenities))
Error: parse error: trailing garbage
awn", "Garage", "Frontyard"] ["Parking", "Lawn", "Garage", "
(right here) ------^
> 预期输出:
Parking Lawn Garage Frontyard Backyard
1 1 1 1 0
1 1 1 0 1
1 1 1 0 0解决方案:同时保留现有的数据帧。
library(qdapTools)
df <- cbind(df, +(mtabulate(str_extract_all(df$amenities, "\\w+( +\\w+)*"))))发布于 2020-12-09 05:13:08
我们可以使用mtabulate在一行代码中完成这项工作
library(qdapTools)
library(stringr)
mtabulate(str_extract_all(df$Amenities, "\\w+"))-output
# Backyard Frontyard Garage Lawn Parking
#1 0 1 1 1 1
#2 1 0 1 1 1
#3 0 0 1 1 1发布于 2020-12-08 11:08:53
您可以将json视为字符串,清理它们并展开dataset。
library(dplyr)
df %>%
mutate(Amenities = gsub('\\[|\\]|"', '', Amenities)) %>%
splitstackshape::cSplit_e("Amenities", sep = ',\\s*',
type = 'character', fill = 0, fixed = FALSE) %>%
rename_with(~sub('Amenities_', '', .))
# Amenities Backyard Frontyard Garage Lawn Parking
#1 Parking, Lawn, Garage, Frontyard 0 1 1 1 1
#2 Parking, Lawn, Garage, Backyard 1 0 1 1 1
#3 Parking, Lawn, Garage 0 0 1 1 1https://stackoverflow.com/questions/65192502
复制相似问题