首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >聚类每周数据,通过在相邻周之间保持某种连续性来检测疫情

聚类每周数据,通过在相邻周之间保持某种连续性来检测疫情
EN

Data Science用户
提问于 2016-01-16 16:50:36
回答 2查看 220关注 0票数 2

我有一些健康机构的数据。这些数据包含了52周的疟疾病例信息。该数据集有52列,每周一列,每家医院约16行,报告该医院诊断的某一周的病例数。包含9周条目的数据集示例:-

代码语言:javascript
复制
SAD Lakhwar 0   0   0   0   0   1   4   3   1
Rural Health Center 2   0   0   6   0   2   0   2   2
Herbertpur Christian Hospital   1   0   1   0   2   0   1   0   1

我使用了分层聚类和K-均值聚类来确定医院群和周群,但我的真正目标是以这样一种方式进行聚类,即可以使用连续几周的数据来检测暴发,同时也发现了疫情发生的医院群。

到目前为止,我使用的技术找到了我的集群,在某些情况下,几个星期是彼此分开的,例如,第7周和第37周属于同一个集群,如下所示,但我希望在几个星期内实现连续性,因为我理解我正在获得的结果的原因,但我需要连续性,如果有人能提供帮助的话。

使用k均值将周聚成4组的结果。

代码语言:javascript
复制
Week No x
1   2
2   4
3   4
4   2
5   4
6   2
7   1
8   2
9   2
10  2
11  2
12  4
13  1
14  4
15  4
16  4
17  1
18  4
19  1
20  4
21  1
22  1
23  1
24  1
25  1
26  1
27  1
28  1
29  1
30  1
31  1
32  3
33  1
34  1
35  1
36  1
37  1
38  3
39  3
40  3
41  3
42  3
43  3
44  3
45  3
46  3
47  3
48  3
49  3
50  1
51  4
52  4

数据的输出

代码语言:javascript
复制
structure(list(V1 = structure(c(13L, 15L, 6L, 10L, 3L, 17L, 12L, 
1L, 2L, 11L, 4L, 14L, 8L, 9L, 7L, 5L), .Label = c("CHC Sahaspur", 
"Comb. Hosp. Premnagar", "Doon Hospital", "FRI Hospital", "Herbertpur                     Christian Hospital", 
"HIHT Jollygrant", "Kalindi Hospital", "MAX Hospital", "PHC Herbatpur ", 
"PHC Kalsi", "PHC Rajawala", "Rural Health Center", "SAD Lakhwar", 
"Shubharti Hospital", "SPS Rishikesh", "Total", "Vaish Nursing Home"
), class = "factor"), V2 = c(0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 
0, 0, 0, 0, 0), V3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0), V4 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0), V5 = c(0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0), V6 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V7 = c(1, 0, 0, 
0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0), V8 = c(4, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), V9 = c(3, 0, 0, 0, 0, 0, 2, 
0, 0, 0, 0, 0, 1, 0, 0, 0), V10 = c(1, 0, 0, 0, 0, 1, 2, 0, 0, 
0, 0, 0, 0, 0, 0, 0), V11 = c(0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 
0, 0, 0, 0, 0), V12 = c(0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 
0, 0, 0), V13 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0), V14 = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), 
V15 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V16 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V17 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V18 = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1), 
V19 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V20 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), 
V21 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V22 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2), 
V23 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0), 
V24 = c(2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), 
V25 = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0), 
V26 = c(0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1), 
V27 = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V28 = c(0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0), 
V29 = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0), 
V30 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), 
V31 = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V32 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), 
V33 = c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), 
V34 = c(5, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0), 
V35 = c(1, 0, 22, 0, 1, 1, 0, 0, 0, 0, 0, 38, 0, 2, 0, 0), 
V36 = c(0, 2, 4, 2, 1, 0, 0, 0, 0, 0, 0, 23, 0, 2, 0, 1), 
V37 = c(0, 0, 10, 0, 2, 0, 0, 0, 0, 0, 0, 10, 0, 2, 0, 7), 
V38 = c(1, 2, 2, 1, 2, 0, 0, 0, 0, 0, 0, 16, 2, 2, 0, 7), 
V39 = c(0, 1, 9, 0, 28, 2, 0, 0, 0, 0, 8, 12, 0, 1, 0, 2), 
V40 = c(1, 0, 2, 0, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V41 = c(0, 0, 3, 0, 10, 0, 0, 0, 1, 0, 0, 18, 0, 0, 0, 1), 
V42 = c(0, 0, 1, 0, 8, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0), 
V43 = c(0, 1, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V44 = c(1, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V45 = c(0, 0, 9, 0, 6, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1), 
V46 = c(0, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V47 = c(0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V48 = c(0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), 
V49 = c(0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V50 = c(0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V51 = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V52 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
V53 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names =                 c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", 
"V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", 
"V21", "V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", 
"V30", "V31", "V32", "V33", "V34", "V35", "V36", "V37", "V38", 
"V39", "V40", "V41", "V42", "V43", "V44", "V45", "V46", "V47", 
"V48", "V49", "V50", "V51", "V52", "V53"), row.names = c(NA, 
16L), class = "data.frame")
EN

回答 2

Data Science用户

发布于 2016-01-17 11:12:16

你真的想在这里进行聚类吗?

相反,我会看分段和变化检测。

例如,您可以计算从一行到另一行的绝对(或相对)更改,并在最大的更改点进行分段。

票数 0
EN

Data Science用户

发布于 2016-01-18 20:19:09

如果您试图同时找到爆发疫情的周群和医院群,那么您可能会更成功地使用

1)一个简单的移动平均滤波器,使用平均周数--尝试3-5周作为平均周的窗口。

2)现在寻找在同一周内具有较高移动平均率的医院

票数 0
EN
页面原文内容由Data Science提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://datascience.stackexchange.com/questions/9810

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档