创建数据
REPC <-
tibble::tribble(
~OrganisationID, ~ProjectID,
22905, 494993,
9341, 494993,
877, 504562,
9221, 504874,
867, 488182,
238989, 488182,
296, 488182,
270858, 488182,
867, 181688,
22905, 181688,
877, 181688,
867, 504387,
877, 504387,
)我目前正在努力计算组织的重复协作,为此我需要两个值。我首先打算计算唯一协作伙伴的数量和组织与每个独特协作伙伴协作的次数。然后,我要计算重复合作,它可以用数学表示为[(∏Ri )]^(1/N),其中Ri是重点公司与其i^th R&D合作伙伴的R&D联盟的数目,N是研发合作伙伴的总数。例如,一家公司与3个独特的合作伙伴合作,他与合作伙伴1,3次,合作伙伴2,4次,合作伙伴3,5次。这将导致重复协作(3*4*5)^1/3 = 3.91。
我想要实现的
REPC2 <-
tibble::tribble(
~OrganisationID, ~NoOfUniqueCollabPartners, ~NoOfCollabswith22905, ~NoOfCollabswith9341, ~NoOfCollabswith877, ~NoOfCollabswith9221, ~NoOfCollabswith867, ~NoOfCollabswith238989, ~NoOfCollabswith270858, ~NoOfCollabswith9221, ~RepeatedCollaboration,
22905, 3, NA, 1, 1, 0, 1, 0, 0, 0, 0.33,
9341, 1, 1, NA, 0, 0, 0, 0, 0, 0, 1,
877, 2, 1, 0, NA, 0, 2, 0, 0, 0, 1.41,
9221, 0, 0, 0, 0, NA, 0, 0, 0, 0, NA,
867, 5, 1, 0, 2, 0, NA, 1, 1, 1, 1.15,
238989, 3, 0, 0, 0, 0, 1, NA, 1, 1, 0.33,
270858, 3, 0, 0, 0, 0, 1, 1, NA, 1, 0.33,
296, 3, 0, 0, 0, 0, 1, 1, 1, NA, 0.33,
)因为一家公司不能与自己合作,所以它的价值应该是NA或零。
发布于 2020-04-19 16:12:16
因为您只是要求这两个值,所以我跳过了协作矩阵。
我还假设您的tbl中没有重复的行。
library(data.table)
library(magrittr)
REPC <- as.data.table(REPC)
arr_org_ID <- unique(REPC[, OrganisationID]) # Array of unique organizations
N_org <- length(arr_org_ID) # Number of unique organizations
lst_proj <- lapply(arr_org_ID, function(id) {REPC[OrganisationID == id, unique(ProjectID)]}) # List of projects involved for each organization
lst_colab <- sapply((seq(N_org)), function(i) { # List of colaborators for each organization (including repeats)
REPC[
ProjectID %in% lst_proj[[i]] & OrganisationID != arr_org_ID[i],
OrganisationID
]
})
dt_res <- lapply(seq(N_org), function(i) {
arr_unique_colab = lst_colab[[i]] %>% unique
N_unique_colab = arr_unique_colab %>% length # Number of unique collaborators
# Calculating repeated collaboration ----
{
Rep_count = lapply(seq(N_unique_colab), function(j) {
return(length(which(lst_colab[[i]] == arr_unique_colab[j])))
}) %>% do.call(prod, .) %>% raise_to_power(1/N_unique_colab)
}
# Returning table ----
{
return(data.table(OrganisationID = arr_org_ID[i], N_unique_colab, Rep_count))
}
}) %>% rbindlist我更适合data.table语法,所以我已经转换了您的tbl。
https://stackoverflow.com/questions/61306985
复制相似问题