我有超过4天收集的数据的多个数据帧。每个数据框看起来像这样(非常简单):
Lat Long PM
-33.9174 151.2263 8
-33.9175 151.2264 10
-33.9176 151.2265 9
-33.9177 151.2266 8我想根据匹配的长度值和纬度值合并多个数据帧,以求取特定位置的所有'PM‘值的平均值。最终结果将如下所示(2月13日至16日):
Lat Long PM.13th Feb PM.14th Feb PM.15th Feb **Mean**
-33.9174 151.2263 8 9 11 9.33
-33.9175 151.2264 10 11 12 11
-33.9176 151.2265 9 14 13 12
-33.9177 151.2266 8 10 11 9.66我知道合并2个数据帧很容易:
df = merge(data1, data2, by.x = c("Lat", "Long"), by.y = c("Lat", "Long"))但是如何根据匹配的经度和纬度值合并多个数据帧呢?
另外,有没有一种方法可以过滤数据,使其匹配相互之间在0.001经度/经度范围内的数据?(目前我正在将经度/经度数据四舍五入到小数点后3位,但它正在复制我的数据)。
发布于 2017-07-22 00:36:20
为了匹配,也许是来自dplyr的inner_join?
library(dplyr)
df1 <- data.frame(
lat = c(-33.9174, -33.9175, -33.9176, -33.9177, -33.9171),
long = c(151.2263, 151.2264, 151.2265, 151.2266, -140.54),
PM = c(8, 10, 9, 8, 55)
)
df2 <- data.frame(
lat = c(-33.9174, -33.9175, -33.9176, -33.9177, -31),
long = c(151.2263, 151.2264, 151.2265, 151.2266, 134),
PM = c(12, 15, 11, 3, 18)
)
library(dplyr)
inner_join(df1, df2, by = c("lat", "long"))
lat long PM.x PM.y
1 -33.9174 151.2263 8 12
2 -33.9175 151.2264 10 15
3 -33.9176 151.2265 9 11
4 -33.9177 151.2266 8 3发布于 2017-07-24 02:02:05
这可能是一个答案,尽管它有点冗长,对于大量的数据帧来说不是很好:
library(tidyverse)
feb_13 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177),
long = c(151.2263, 151.2264,151.2265,151.2266),
pm = c(8,10,9,8))
feb_14 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177),
long = c(151.2263, 151.2264,151.2265,151.2266),
pm = c(7,3,4,5))
feb_15 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177),
long = c(151.2263, 151.2264,151.2265,151.2266),
pm = c(1,4,10,12))这是第一个技巧。简单,但取平均值在这里是丑陋的.
df <- left_join(feb_13, feb_14, by = c("lat", "long")) %>%
left_join(feb_15, by = c("lat", "long")) %>%
rename(
pm_feb13 = pm.x,
pm_feb14 = pm.y,
pm_feb15 = pm
) %>%
mutate(
mean = c((pm_feb13[1] + pm_feb14[1] + pm_feb15[1])/3,
(pm_feb13[2] + pm_feb14[2] + pm_feb15[2])/3,
(pm_feb13[3] + pm_feb14[3] + pm_feb15[3])/3,
(pm_feb13[4] + pm_feb14[4] + pm_feb15[4])/3)
)下面是第二种选择,它有很多管道,但使用了summarize
df_2 <- left_join(feb_13, feb_14, by = c("lat", "long")) %>%
left_join(feb_15, by = c("lat", "long")) %>%
group_by(lat, long) %>%
summarise(
mean = mean(c(pm.x, pm.y, pm), na.rm=T)
) %>%
full_join(feb_13, by = c("lat", "long")) %>%
full_join(feb_14, by = c("lat", "long")) %>%
full_join(feb_15, by = c("lat", "long")) %>%
rename(
pm_feb13 = pm.x,
pm_feb14 = pm.y,
pm_feb15 = pm
) %>%
arrange(long)https://stackoverflow.com/questions/45242360
复制相似问题