我完全迷失在试图根据一个变量来计算我的疾病流行率(在我的例子中是邮政编码)。我什么都试过了,但似乎什么都没有用。
我知道疾病流行率很容易计算(疾病总数除以总人口),但它不让我把病例和人口按邮政编码相加,然后再把它们分开。
我试图计算流行率的列称为"Lyme“,这是一个逻辑变量(0=negative,1=positive)。然后"FSA“栏是我的邮政编码。请帮帮我!
这是我的代码:
Data.All.df <- data.frame(Data.All) ## Create Data Frame from Data file
Data.All.df.2008 <- subset(Data.All.df, Year=="2008") ##only use 2008
library(dplyr)
Data.All.df.2008 <- Data.All.df.2008 %>%
group_by(FSA) %>%
mutate_each(funs(Cases = ((Lyme=="1")/((Lyme=="0")+(Lyme=="1")))))```
X.1 X Source Patient Accession Customer Year Date Country City Province Postal Name Age Gender Species Breed SNAP Apspp Ehrspp HW Lyme Coinfections dupID FSA
1710 4913 4913 Veterinary Clinic Bronson Sprartacus796575981360 7.97e+13 79657 2008 2008-01-08 Canada WINDSOR ON N8N 3T4 Bronson Sprartacus 132 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8N
1711 4915 4915 Veterinary Clinic Scotty9233669481432 9.23e+13 92336 2008 2008-01-08 Canada WINDSOR ON N8R 1A5 Scotty 84 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8R
1712 4916 4916 Veterinary Clinic Hershey9233683161435 9.23e+13 92336 2008 2008-01-08 Canada WINDSOR ON N8R 1A5 Hershey 48 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8R
1713 4918 4918 Veterinary Clinic Brandy7965736441362 7.97e+13 79657 2008 2008-01-09 Canada WINDSOR ON N8N 3T4 Brandy 156 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8N
1714 4919 4919 Veterinary Clinic Trish9233699481443 9.23e+13 92336 2008 2008-01-10 Canada WINDSOR ON N8R 1A5 Trish 132 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8R
1715 4929 4929 Veterinary Clinic Lexie8001685020761364 8.00e+13 80016 2008 2008-01-17 Canada HALIFAX NS B3L 2C2 Lexie 29 Spayed Canine Non-Sporting 4Dx 0 0 0 0 0 TRUE B3L
1716 4937 4937 Veterinary Clinic CUBBIE79700431 7.97e+12 79700 2008 2008-01-21 Canada DARTMOUTH NS B2W 2N3 CUBBIE 118 Spayed Canine Non-Sporting 4Dx 0 0 0 0 0 TRUE B2W
1717 4945 4945 Veterinary Clinic Stevie7965765291433 7.97e+13 79657 2008 2008-01-25 Canada WINDSOR ON N8N 3T4 Stevie 36 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8N
1718 4947 4947 Veterinary Clinic Bailey9233644191501 9.23e+13 92336 2008 2008-01-25 Canada WINDSOR ON N8R 1A5 Bailey 132 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8R
1719 4948 4948 Veterinary Clinic ZAK925369448482 9.25e+12 92536 2008 2008-01-25 Canada HUNTSVILLE ON P1H 1B5 ZAK 96 Neutered Canine Hound 4Dx 0 0 0 0 0 TRUE P1H
17发布于 2020-01-28 06:16:08
使用以下最小示例数据:
# Generate data.
set.seed(0934)
Data.All.df.2008 <- data.frame(FSA = sample(c("N8N", "N8R", "B3L", "P1H"), 50, T),
Lyme = sample(0:1, 50, T),
stringsAsFactors = F)
# First 10 observations.
head(Data.All.df.2008)
# FSA Lyme
# 1 N8N 1
# 2 P1H 1
# 3 N8N 0
# 4 P1H 0
# 5 N8N 1
# 6 N8N 1患病率可计算为阳性诊断数除以观察总数,即sum(Lyme)/n()。适当的函数是summarise
library(dplyr)
Data.All.df.2008 %>%
group_by(FSA) %>%
summarise(Prevalence = sum(Lyme)/n())
# # A tibble: 4 x 2
# FSA Prevalence
# <chr> <dbl>
# 1 B3L 0.778
# 2 N8N 0.571
# 3 N8R 0.583
# 4 P1H 0.467https://stackoverflow.com/questions/59941654
复制相似问题