首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用tidyr进行重构

使用tidyr进行重构
EN

Stack Overflow用户
提问于 2017-03-25 02:56:56
回答 2查看 67关注 0票数 1

我有一个长字符串(每行)中的数据。基本上,它是用分号分隔的,列/答案由=分隔。我正在尝试做以下几件事:

当前结构:

代码语言:javascript
复制
 Row1: “Column1 = blah1;Column2 = blah2;Column3 = blah3;Column4 = blah4” 
 Row2: “Column1 = blah1;Column2 = blah2;Column3 = blah3;Column4 = blah4”

转换为->

代码语言:javascript
复制
Column1|Column2|Column3|Column4
blah1|blah2|blah3|blah4
blah1|blah2|blah3|blah4

我相信R中的tidyr包是可行的,但我还没弄清楚。

这就是我使用tidyr所得到的结果,但我仍然收到错误:

代码语言:javascript
复制
# CREATE TEST DATA
mydata <- as.data.frame(c("Column1 = blah1; Column2 =  blah2; Column3 = blah3; Column4 = blah4","Column1 = blah1; Column2 =  blah2; Column3 = blah3; Column4 = blah4","Column1 = blah1; Column2 =  blah2; Column3 = blah3; Column4 = blah4"))
names(mydata) <- "TEST"

# Create dummy vector
x <- vector(mode="numeric", length=0)

# Separate by ;
x <- separate(mydata, TEST, x, sep = ";" )

任何帮助都是非常感谢的。

EN

回答 2

Stack Overflow用户

发布于 2017-03-25 04:22:01

我将使用dplyr pipes一步一步地展示如何做到这一点,并在每一步之后打印输出,这样您就可以看到数据结构是如何演变的。

代码语言:javascript
复制
mydata <- as.data.frame(c("Column1 = blah1; Column2 = blah2; Column3 = blah3; Column4 = blah4","Column1 = blah1; Column2 = blah2; Column3 = blah3; Column4 = blah4","Column1 = blah1; Column2 = blah2; Column3 = blah3; Column4 = blah4")) 
names(mydata) <- "TEST"

这看起来是这样的:

代码语言:javascript
复制
> mydata
                                                                TEST
1 Column1 = blah1; Column2 = blah2; Column3 = blah3; Column4 = blah4
2 Column1 = blah1; Column2 = blah2; Column3 = blah3; Column4 = blah4
3 Column1 = blah1; Column2 = blah2; Column3 = blah3; Column4 = blah4

下面是转换的步骤:

代码语言:javascript
复制
library(dplyr)
library(tidyr)

1)按变量分隔

代码语言:javascript
复制
mydata %>% 
separate(rows, into=paste0("Column", 1:4), sep=";")

输出:

代码语言:javascript
复制
          Column1          Column2          Column3          Column4
1 Column1 = blah1  Column2 = blah2  Column3 = blah3  Column4 = blah4
2 Column1 = blah1  Column2 = blah2  Column3 = blah3  Column4 = blah4
3 Column1 = blah1  Column2 = blah2  Column3 = blah3  Column4 = blah4

2)添加行标识符

代码语言:javascript
复制
mydata %>% 
  separate(TEST, into=paste0("Column", 1:4), sep=";") %>% 
  mutate(row=row.names(mydata))

输出:

代码语言:javascript
复制
          Column1          Column2          Column3          Column4 row
1 Column1 = blah1  Column2 = blah2  Column3 = blah3  Column4 = blah4   1
2 Column1 = blah1  Column2 = blah2  Column3 = blah3  Column4 = blah4   2
3 Column1 = blah1  Column2 = blah2  Column3 = blah3  Column4 = blah4   3

3)重新格式化为long

代码语言:javascript
复制
mydata %>% 
  separate(TEST, into=paste0("Column", 1:4), sep=";") %>% 
  mutate(row=row.names(mydata)) %>% 
  gather("key", "value", -row)

输出:

代码语言:javascript
复制
   row     key            value
1    1 Column1  Column1 = blah1
2    2 Column1  Column1 = blah1
3    3 Column1  Column1 = blah1
4    1 Column2  Column2 = blah2
5    2 Column2  Column2 = blah2
6    3 Column2  Column2 = blah2
7    1 Column3  Column3 = blah3
8    2 Column3  Column3 = blah3
9    3 Column3  Column3 = blah3
10   1 Column4  Column4 = blah4
11   2 Column4  Column4 = blah4
12   3 Column4  Column4 = blah4

4)然后提取数据

代码语言:javascript
复制
mydata %>% 
  separate(TEST, into=paste0("Column", 1:4), sep=";") %>% 
  mutate(row=row.names(mydata)) %>% 
  gather("key", "value", -row) %>% 
  extract(value, into="value", regex=".* = (.*)$")

输出:

代码语言:javascript
复制
   row     key value
1    1 Column1 blah1
2    2 Column1 blah1
3    3 Column1 blah1
4    1 Column2 blah2
5    2 Column2 blah2
6    3 Column2 blah2
7    1 Column3 blah3
8    2 Column3 blah3
9    3 Column3 blah3
10   1 Column4 blah4
11   2 Column4 blah4
12   3 Column4 blah4

5)如果需要,将其重新展开为宽格式

代码语言:javascript
复制
mydata %>% 
  separate(TEST, into=paste0("Column", 1:4), sep=";") %>% 
  mutate(row=row.names(mydata)) %>% 
  gather("key", "value", -row) %>% 
  extract(value, into="value", regex=".* = (.*)$") %>% 
  spread(key, value)

输出:

代码语言:javascript
复制
  row Column1 Column2 Column3 Column4
1   1   blah1   blah2   blah3   blah4
2   2   blah1   blah2   blah3   blah4
3   3   blah1   blah2   blah3   blah4

6)如果需要,删除行标识符

代码语言:javascript
复制
mydata %>% 
  separate(TEST, into=paste0("Column", 1:4), sep=";") %>% 
  mutate(row=row.names(mydata)) %>% 
  gather("key", "value", -row) %>% 
  extract(value, into="value", regex=".* = (.*)$") %>% 
  spread(key, value) %>% 
  select(-row)

输出:

代码语言:javascript
复制
  Column1 Column2 Column3 Column4
1   blah1   blah2   blah3   blah4
2   blah1   blah2   blah3   blah4
3   blah1   blah2   blah3   blah4
票数 2
EN

Stack Overflow用户

发布于 2017-03-25 03:51:55

下面是一个基数r的尝试

代码语言:javascript
复制
#Example data provided
data <- data.frame(
 string=c(
  "Column1 = blah1; Column2 = blah2; Column3 = blah3; Column4 = blah4",
  "Column1 = blah1; Column2 = blah2; Column3 = blah3; Column4 = blah4"))

#Modulo function for odd and even numbers
odd <- function(x) x%%2 != 0 
even <- function(x) x%%2 == 0 

#split string based on condition and remove all xtra whitespace
s <- gsub("[[:space:]]", "", unlist(strsplit(as.character(data$string), '= |;')))

#bind the data into a df no factors
data <- data.frame(rbind(unique(s[even(1:length(s))]),
                   unique(s[even(1:length(s))])),
                   stringsAsFactors=F)
#rename column names exctrating the odd vector numbers from s
colnames(data) <- unique(s[odd(1:length(s))])

data
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/43007206

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档