我有一些健康机构的数据。这些数据包含了52周的疟疾病例信息。该数据集有52列,每周一列,每家医院约16行,报告该医院诊断的某一周的病例数。包含9周条目的数据集示例:-
SAD Lakhwar 0 0 0 0 0 1 4 3 1
Rural Health Center 2 0 0 6 0 2 0 2 2
Herbertpur Christian Hospital 1 0 1 0 2 0 1 0 1我使用了分层聚类和K-均值聚类来确定医院群和周群,但我的真正目标是以这样一种方式进行聚类,即可以使用连续几周的数据来检测暴发,同时也发现了疫情发生的医院群。
到目前为止,我使用的技术找到了我的集群,在某些情况下,几个星期是彼此分开的,例如,第7周和第37周属于同一个集群,如下所示,但我希望在几个星期内实现连续性,因为我理解我正在获得的结果的原因,但我需要连续性,如果有人能提供帮助的话。
使用k均值将周聚成4组的结果。
Week No x
1 2
2 4
3 4
4 2
5 4
6 2
7 1
8 2
9 2
10 2
11 2
12 4
13 1
14 4
15 4
16 4
17 1
18 4
19 1
20 4
21 1
22 1
23 1
24 1
25 1
26 1
27 1
28 1
29 1
30 1
31 1
32 3
33 1
34 1
35 1
36 1
37 1
38 3
39 3
40 3
41 3
42 3
43 3
44 3
45 3
46 3
47 3
48 3
49 3
50 1
51 4
52 4数据的输出
structure(list(V1 = structure(c(13L, 15L, 6L, 10L, 3L, 17L, 12L,
1L, 2L, 11L, 4L, 14L, 8L, 9L, 7L, 5L), .Label = c("CHC Sahaspur",
"Comb. Hosp. Premnagar", "Doon Hospital", "FRI Hospital", "Herbertpur Christian Hospital",
"HIHT Jollygrant", "Kalindi Hospital", "MAX Hospital", "PHC Herbatpur ",
"PHC Kalsi", "PHC Rajawala", "Rural Health Center", "SAD Lakhwar",
"Shubharti Hospital", "SPS Rishikesh", "Total", "Vaish Nursing Home"
), class = "factor"), V2 = c(0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0,
0, 0, 0, 0, 0), V3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0), V4 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0), V5 = c(0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0), V6 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V7 = c(1, 0, 0,
0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0), V8 = c(4, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), V9 = c(3, 0, 0, 0, 0, 0, 2,
0, 0, 0, 0, 0, 1, 0, 0, 0), V10 = c(1, 0, 0, 0, 0, 1, 2, 0, 0,
0, 0, 0, 0, 0, 0, 0), V11 = c(0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0,
0, 0, 0, 0, 0), V12 = c(0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0,
0, 0, 0), V13 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0), V14 = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
V15 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V16 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V17 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V18 = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1),
V19 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V20 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
V21 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V22 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2),
V23 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0),
V24 = c(2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
V25 = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0),
V26 = c(0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1),
V27 = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V28 = c(0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0),
V29 = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
V30 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
V31 = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V32 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
V33 = c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
V34 = c(5, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0),
V35 = c(1, 0, 22, 0, 1, 1, 0, 0, 0, 0, 0, 38, 0, 2, 0, 0),
V36 = c(0, 2, 4, 2, 1, 0, 0, 0, 0, 0, 0, 23, 0, 2, 0, 1),
V37 = c(0, 0, 10, 0, 2, 0, 0, 0, 0, 0, 0, 10, 0, 2, 0, 7),
V38 = c(1, 2, 2, 1, 2, 0, 0, 0, 0, 0, 0, 16, 2, 2, 0, 7),
V39 = c(0, 1, 9, 0, 28, 2, 0, 0, 0, 0, 8, 12, 0, 1, 0, 2),
V40 = c(1, 0, 2, 0, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V41 = c(0, 0, 3, 0, 10, 0, 0, 0, 1, 0, 0, 18, 0, 0, 0, 1),
V42 = c(0, 0, 1, 0, 8, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
V43 = c(0, 1, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V44 = c(1, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V45 = c(0, 0, 9, 0, 6, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1),
V46 = c(0, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V47 = c(0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V48 = c(0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0),
V49 = c(0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V50 = c(0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V51 = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V52 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
V53 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11",
"V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20",
"V21", "V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29",
"V30", "V31", "V32", "V33", "V34", "V35", "V36", "V37", "V38",
"V39", "V40", "V41", "V42", "V43", "V44", "V45", "V46", "V47",
"V48", "V49", "V50", "V51", "V52", "V53"), row.names = c(NA,
16L), class = "data.frame")发布于 2016-01-17 11:12:16
你真的想在这里进行聚类吗?
相反,我会看分段和变化检测。
例如,您可以计算从一行到另一行的绝对(或相对)更改,并在最大的更改点进行分段。
发布于 2016-01-18 20:19:09
如果您试图同时找到爆发疫情的周群和医院群,那么您可能会更成功地使用
1)一个简单的移动平均滤波器,使用平均周数--尝试3-5周作为平均周的窗口。
2)现在寻找在同一周内具有较高移动平均率的医院
https://datascience.stackexchange.com/questions/9810
复制相似问题