我有一个生物化学问题,可以简化为一个双掷骰子实验(我认为……)。
假设有一个有10个面孔的不均匀骰子,即单个面孔的概率不是1/10。我们想知道这些概率。
然而,我们拥有的给定数据集是两次掷(相同)骰子的相加表面的直方图。因此,观察到的bin的范围是2-20 (2 = 1+1;3= 1+2,2+1,4= 2+2,1+3,3+1;等等)。
总和面的概率是各个概率的乘积(s:总和面的观察概率;p:单个面的概率),可以写成如下:
s2 ~ p1^2
s3 ~ 2*p1*p2
s4 ~ 2*p1*p3 + p2^2
s5 ~ 2*p1*p4 + 2*p2*p3
s6 ~ 2*p1*p5 + 2*p2*p4 + p3^2
s7 ~ 2*p1*p6 + 2*p2*p5 + 2*p3*p4
s8 ~ 2*p1*p7 + 2*p2*p6 + 2*p3*p5 + p4^2
s9 ~ 2*p1*p8 + 2*p2*p7 + 2*p3*p6 + 2*p4*p5
s10 ~ 2*p1*p9 + 2*p2*p8 + 2*p3*p7 + 2*p4*p6 + p5^2
s11 ~ 2*p1*p10 + 2*p2*p9 + 2*p3*p8 + 2*p4*p7 + 2*p5*p6
s12 ~ 2*p2*p10 + 2*p3*p9 + 2*p4*p8 + 2*p5*p7 + p6^2
s13 ~ 2*p3*p10 + 2*p4*p9 + 2*p5*p8 + 2*p6*p7
s14 ~ 2*p4*p10 + 2*p5*p9 + 2*p6*p8 + p7^2
s15 ~ 2*p5*p10 + 2*p6*p9 + 2*p7*p8
s16 ~ 2*p6*p10 + 2*p7*p9 + p8^2
s17 ~ 2*p7*p10 + 2*p8*p9
s18 ~ 2*p8*p10 + p9^2
s19 ~ 2*p9*p10
s20 ~ p10^2在这种情况下,有20-1=19个已知变量和10个未知变量,因此系统被过度确定。它也很容易用代数手工解决。据我所知:二次项将导致每个单独的脸2个可能的解决方案。概率总是正的,所以实际上应该有一种解决方案。对吗?
有没有办法在R中解决这个系统?我熟悉R中的线性反问题,但我不知道如何解决这个问题(二次?)R。
以下是模拟该问题的一些代码:
options(stringsAsFactors = FALSE)
library(gtools)
library(dplyr)
dice <- data.frame(face = 1:10)
### functions
split_dice_faces <- function(summed_face){
face_face <- strsplit(x = as.character(summed_face),split = "[/_\\|]")[[1]]
names(face_face) <- c("face1","face2")
as.numeric(face_face)
}
sum_dice_faces <- function(face_face){
sapply(face_face, function(face_face_i){
face1 <- split_dice_faces(face_face_i)[1]
face2 <- split_dice_faces(face_face_i)[2]
sum(c(face1[1], face2[1]))
})
}
simulate_2_rolls <- function(dice_pool){
dice_perm <- data.frame(permutations(n = dim(dice_pool)[1], r = 2, v = as.character(dice_pool$face), repeats.allowed = T ))
dice_perm$face_face <- paste(dice_perm[[1]],"|",dice_perm[[2]], sep = "")
dice_perm$prob <- dice_pool$prob[match(dice_perm[[1]], dice_pool$face)]*dice_pool$prob[match(dice_perm[[2]], dice_pool$face)]
dice_perm$summed_face <- sum_dice_faces(dice_perm$face_face)
dice_perm <- dice_perm %>% arrange(summed_face) %>% select(one_of(c("face_face", "summed_face","prob")))
dice_perm
}
summarise_2_rolls_experiment <- function(simulate_2_rolls_df){
simulate_2_rolls_df %>% group_by(summed_face) %>% summarise(prob = sum(prob))
}
from_face_probs_to_summed_observations <- function(face_probs){
face_probs %>%
data.frame(face = dice$face, prob = .) %>%
simulate_2_rolls() %>%
summarise_2_rolls_experiment() %>%
pull(prob)
}
generate_formulas <- function() {
output <-
dice_sum_probs %>% group_by(summed_face) %>% group_split() %>%
sapply(function(i){
left_hand <- paste("s",i$summed_face[1],sep="")
right_hand <-
sapply(strsplit(i$face_face, "\\|") , function(row){
row_i <- as.numeric(row)
row_i <- row_i[order(row_i)]
row_i <- paste("p",row_i,sep = "")
if(row_i[1] == row_i[2]){
paste(row_i[1],"^2",sep="")
} else {
paste(row_i,collapse="*")
}
})
right_hand <-
paste(sapply(unique(right_hand), function(right_hand_i){
fact <- sum(right_hand == right_hand_i)
if(fact > 1){fact <- paste(fact,"*",sep = "")} else {fact <- ""}
paste(fact,right_hand_i,sep = "")
}), collapse = " + ")
paste(left_hand, "~", right_hand)
})
return(output)
}要模拟数据集,请执行以下操作:
### random individual probabilites
dice_probs <- data.frame(face = dice$face,
prob = runif(n = dim(dice)[1]) %>% (function(x){x / sum(x)}))
dice_probs
### simulate infinite amount of trials, observations expressed as probabilities
dice_sum_probs <- simulate_2_rolls(dice_probs)
dice_sum_probs
### sum experiment outcomes with the same sum
dice_sum_probs_summary <- dice_sum_probs %>% group_by(summed_face) %>% summarise(prob = sum(prob))
### plot, this is the given dataset
with(data = dice_sum_probs_summary, barplot(prob, names.arg = summed_face))
### how to calculate / approach p1, p2, ..., p10?谢谢!
发布于 2019-11-26 23:43:19
如果我们创建概率outer(p, p)的乘法表,然后使用tapply对outer(1:10, 1:10, "+")的常量求和,我们得到以下非线性回归问题:
nls(prob ~ tapply(outer(p, p), outer(1:10, 1:10, `+`), sum),
dice_sum_probs_summary, algorithm = "port",
start = list(p = sqrt(dice_sum_probs_summary$prob[seq(1, 19, 2)])),
lower = numeric(10), upper = rep(1, 10))给予:
Nonlinear regression model
model: prob ~ tapply(outer(p, p), outer(1:10, 1:10, `+`), sum)
data: dice_sum_probs_summary
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
0.06514 0.04980 0.14439 0.06971 0.06234 0.19320 0.09491 0.01237 0.11936 0.18878
residual sum-of-squares: 1.33e-30这与
> dice_probs
face prob
1 1 0.06513537
2 2 0.04980455
3 3 0.14438749
4 4 0.06971313
5 5 0.06234477
6 6 0.19319613
7 7 0.09491289
8 8 0.01236557
9 9 0.11936244
10 10 0.18877766我们可以交替地将其表示如下,其中X是具有19×100维度的0和1的矩阵,使得每行对应于滚动两个骰子(即2:20)的可能结果,并且每列对应于1:10和1:10的一对索引。如果列对等于由其行表示的两个面的和,则条目等于1,否则等于0。
g <- c(outer(1:10, 1:10, `+`))
X <- + outer(2:20, g, `==`)
nls(prob ~ X %*% kronecker(p, p), dice_sum_probs_summary, alg = "port",
start = list(p = sqrt(dice_sum_probs_summary$prob[seq(1, 19, 2)])),
lower = numeric(10), upper = rep(1, 10))https://stackoverflow.com/questions/59053325
复制相似问题