我有一个神经外科病人的数据集,我正在为它创建生存曲线。我试图调整我的曲线,以匹配年龄-性别分布的2000年美国人口,这是包括在R生存包。这个'uspop2‘数据集是一个带有和日历年的数组。首先,我只看50岁及以上的年龄,所以我将使用相同的年龄阈值,在组内创建一个表“tab100”,显示观察到的年龄/性别计数。新的权重是= pi.us/tab100 100的值。
下面是我编写的第一段代码(注意,我在中使用了rpy2中的R):
%%R
#Reweighting
mydata$group <- factor(1 + 1*(mydata$Drill.Plunge..mm. > 2) + 1*(mydata$Drill.Plunge..mm. > 4), levels=1:3,labels=c("Plunge <= 2 mm", "Plunge 2 - 4 mm", "Plunge > 4 mm"))
refpop <- uspop2[as.character(50:100),c("female", "male"), "2000"]
pi.us <- refpop/sum(refpop)
age100 <- factor(ifelse(mydata$Age..yrs. >100, 100, mydata$Age..yrs.), levels=50:100)
tab100 <- with(mydata, table(age100, mydata$Sex, mydata$group))/ nrow(mydata)
us.wt <- rep(pi.us, 3)/ tab100 #new weights by age,sex, group
range(us.wt)这将产生0.006709405到无限的范围!这种无穷无尽的重量之所以发生,是因为美国人口中的所有年龄-性别组合都有代表性,但我的神经外科患者数据集却并非如此。为了摆脱这些无限的重量,我试图把美国人口分成不同的年龄组.
%%R
mydata$group <- factor(1 + 1*(mydata$Drill.Plunge..mm. > 2) + 1*(mydata$Drill.Plunge..mm. > 4), levels=1:3,labels=c("Plunge <= 2 mm", "Plunge 2 - 4 mm", "Plunge > 4 mm"))
temp <- as.numeric(cut(50:100, c(49, 54, 59, 64, 69, 74, 79, 89, 110)+.5))
pi.us<- tapply(refpop, list(temp[row(refpop)], col(refpop)), sum)/sum(refpop)
print(pi.us)
tab2 <- with(mydata, table(mydata$Age..yrs., mydata$Sex, mydata$group))/nrow(mydata)
print(tab2)
us.wt <- rep(pi.us, 3)/tab2
print(range(us.wt))
index <- with(mydata, cbind(mydata$Age..yrs., mydata$Sex,
as.numeric(mydata$group)))
mydata$uswt <- us.wt[index]
sfit3a <-survfit(Surv(Patient.LOS..days., Events) ~ group, data=mydata, weight=uswt)打印pi.us和tab2告诉我,我确实成功地将年龄分为8组。然而,当我设置us.wt <- rep(pi.us,3)/tab2时,us.wt仍然和以前完全一样!不会改变的。你可以在下面看到输出的范围有一个不同的下限,但仍然一直到无穷远。不足为奇的是,我得到了下一行代码的下标超出界限错误。到底是怎么回事?
[1] 0.4655699 Inf
R[write to console]: Error in `[.default`(us.wt, index) : subscript out of bounds
Error in `[.default`(us.wt, index) : subscript out of bounds顺便说一句,我的代码完全是从R论文的第7页开始的:https://cran.r-project.org/web/packages/survival/vignettes/adjcurve.pdf
我做错了什么?:(谢谢你的帮助!
发布于 2022-05-17 19:47:22
这是你问题的答案,但不是你问题的解决方案。查看index和us.wt对象。显然,us.wt数组边距的命名与index第三列中的值不匹配。
str(us.wt)
'table' num [1:48, 1:3, 1:3] Inf Inf Inf Inf Inf ...
- attr(*, "dimnames")=List of 3
..$ : chr [1:48] "2" "3" "4" "5" ...
..$ : chr [1:3] "" "F" "M"
..$ : chr [1:3] "Plunge <= 2 mm" "Plunge 2 - 4 mm" "Plunge > 4 mm"
> str(index)
chr [1:240, 1:3] "2" "7" "11" "75" "59" "3" "88" "13" "75" "80" "5" "3" "65" "66" "93" "45" ...
> head(index)
[,1] [,2] [,3]
[1,] "2" "M" "1"
[2,] "7" "M" "3"
[3,] "11" "M" "1"
[4,] "75" "M" "3"
[5,] "59" "M" "1"
[6,] "3" "M" "3" 我还认为us.wt的数组构造搞砸了。由于在构建过程中没有描述逻辑或目标,所以我并不试图读懂您的想法并提供建议。下面我们来看看为什么我觉得事情搞砸了:
> Hmisc::describe(us.wt)
us.wt
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75
432 0 32 0.691 Inf NaN 4.264 7.864 16.032 Inf Inf
.90 .95
Inf Inf
lowest : 0.5586839 1.1173678 3.1599027 3.4755957 4.2639763
highest: 20.3399270 21.7412450 27.0462314 28.2128223 Inf
Warning message:
In w * sort(x - mean(x)) :
longer object length is not a multiple of shorter object length
# Notice that more than half of the values are Inf
> head(us.wt)
, , = Plunge <= 2 mm
F M
2 Inf 7.053206 7.053206
3 Inf 10.870622 10.870622
4 Inf Inf Inf
5 Inf 15.922230 15.922230
6 Inf Inf Inf
7 Inf 13.581011 13.581011
, , = Plunge 2 - 4 mm
F M
2 Inf 14.106411 14.106411
3 Inf Inf Inf
4 Inf 17.682214 17.682214
5 Inf Inf Inf
6 Inf Inf Inf
7 Inf Inf Inf
, , = Plunge > 4 mm
F M
2 Inf Inf Inf
3 Inf 10.870622 10.870622
4 Inf 17.682214 17.682214
5 Inf Inf Inf
6 Inf 15.348446 15.348446
7 Inf 13.581011 13.581011https://stackoverflow.com/questions/72255340
复制相似问题