首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用列内固定效应信息重塑R中的数据

用列内固定效应信息重塑R中的数据
EN

Stack Overflow用户
提问于 2019-10-13 19:52:52
回答 3查看 118关注 0票数 6

我在excel中得到了一些非常尴尬的格式化数据,我需要对这些数据进行整形,以便在R中进行生存分析。

我将数据的摘录上传到Google驱动器:https://drive.google.com/open?id=1ret3bCDCYPDALQ16YBloaeopfl2-qVbp

原始数据框架有大约2100个观测值和950个变量。

以下是基本数据框架:

代码语言:javascript
复制
my.data<-data.frame(
  ID=c( "", "","C8477","C5273","C5566"),
  LR=c("2012Y","State:FL",5,6,8),
  LR=c("2012Y","State:AZ",5,8,10),
  LR=c("2011Y","State:FL",7,2,1)
)

my.data

#     ID       LR     LR.1     LR.2
# 1          2012Y    2012Y    2011Y
# 2       State:FL State:AZ State:FL
# 3 C8477        5        5        7
# 4 C5273        6        8        2
# 5 C5566        8       10        1

所有列都有相同的名称"LR“。我不知道以后会不会有问题.

第1行给出了年份,第2行给出了相应的观察结果。

作为输出,我需要一些面板数据,我在以后的生存分析工作。

代码语言:javascript
复制
my.data<-data.frame(
  ID=c("C8477","C5273","C5566"),
  Year=c("2012","2012","2011"), 
  State=c("FL","AZ","FL"),LR=c(5,8,1)
) 

my.data

#     ID Year State LR
# 1 C8477 2012    FL  5
# 2 C5273 2012    AZ  8
# 3 C5566 2011    FL  1

我玩了整形函数和seq函数,但是没有这些功能将帮助我朝着正确的方向移动,因为数据帧的排列非常奇怪。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2019-10-14 00:14:06

这是一种tidyverse方法:

代码语言:javascript
复制
my.data <- data.frame(
  ID=c( "", "","C8477","C5273","C5566"),
  LR=c("2012Y","State:FL",5,6,8),
  LR=c("2012Y","State:AZ",5,8,10),
  LR=c("2011Y","State:FL",7,2,1)
)

我的代码:

代码语言:javascript
复制
library(tidyverse)
year <- as.matrix(my.data[1, -1])
year <- str_split(year, "Y", simplify = T)[,1]
state <-as.matrix(my.data[2, -1])
both<-paste(state, year, sep = "_")
mydata1<-my.data[-c(1, 2), ]
colnames(mydata1) <-c("ID", both)
long <-pivot_longer(mydata1, 
             cols = starts_with("state"),
             names_to = "State_year",
             values_to = "LR")
long %>%
  transmute(
    ID, LR, 
    state = str_split(State_year, "_", simplify = T)[, 1],
    state = str_split(state, ":", simplify = T)[, 2], 
    year = str_split(State_year, "_", simplify = T)[, 2]
)

我们得到:

代码语言:javascript
复制
  ID    LR    state year 
1 C8477 5     FL    2012 
2 C8477 5     AZ    2012 
3 C8477 7     FL    2011 
4 C5273 6     FL    2012 
5 C5273 8     AZ    2012 
6 C5273 2     FL    2011 
7 C5566 8     FL    2012 
8 C5566 10    AZ    2012 
9 C5566 1     FL    2011  
票数 2
EN

Stack Overflow用户

发布于 2019-10-13 21:50:19

以下是实现这一目标的reshape2tidyr版本:

代码语言:javascript
复制
library(tidyr)
library(reshape2)

my.data <- data.frame(
  ID=c( "", "","C8477","C5273","C5566"),
  LR=c("2012Y","State:FL",5,6,8),
  LR=c("2012Y","State:AZ",5,8,10),
  LR=c("2011Y","State:FL",7,2,1)
)

# Combine first two rows as column names
colnames(my.data) <- paste(unlist(my.data[2, ]), unlist(my.data[1, ]), sep = "|")
# Remove first two rows from data
my.data <- my.data[-c(1:2), ] # negative index removes rows

# Melt data
my.data.long <- melt(
  my.data, 
  id.vars = 1L, # would be better to have explicit col name
  value.name = "LR" 
)
colnames(my.data.long) <- c("ID", "state_year", "LR")

# Split state_year column into two columns:
my.data.long <- separate(
  my.data.long, 
  state_year, 
  into = c("State", "Year"), 
  sep = "\\|" # note this is a regex
)

这个想法是借用这里的。

票数 2
EN

Stack Overflow用户

发布于 2019-10-14 06:53:46

使用reshape的一种可能的基本R方法

代码语言:javascript
复制
## 1) extract the State and Year rows in a separate data.frame
dat.meta <- data.frame(
    Year = unlist(dat[1, -1]),
    State = sub("State:", "", dat[2, -1]),
    LR.id = colnames(dat)[-1]
)

## 2) reshape the data without State and Year rows into long format
dat.long <- reshape(
    data = dat[-c(1, 2), ], 
    varying = 2:4,
    direction = "long",
    v.names = "LR",
    timevar = "LR.id",
    times = colnames(dat)[-1]
)

dat.long$id <- NULL  ## remove id-column   

## 3) merge the reshaped data with the extracted State and Year data
merge(dat.long, dat.meta, by = "LR.id")

#>   LR.id    ID LR  Year State
#> 1    LR C8477  5 2012Y    FL
#> 2    LR C5273  6 2012Y    FL
#> 3    LR C5566  8 2012Y    FL
#> 4  LR.1 C8477  5 2012Y    AZ
#> 5  LR.1 C5273  8 2012Y    AZ
#> 6  LR.1 C5566 10 2012Y    AZ
#> 7  LR.2 C8477  7 2011Y    FL
#> 8  LR.2 C5273  2 2011Y    FL
#> 9  LR.2 C5566  1 2011Y    FL

数据

代码语言:javascript
复制
dat <- data.frame(
    ID=c( "", "","C8477","C5273","C5566"),
    LR=c("2012Y","State:FL",5,6,8),
    LR=c("2012Y","State:AZ",5,8,10),
    LR=c("2011Y","State:FL",7,2,1),
    stringsAsFactors = FALSE
)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58367281

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档