我希望合并两个数据集。第一组数据包含特定时间仪器中的CO2浓度信息。第二个数据集包含有关何时开始和结束度量的信息。
我想将这两个数据组合起来,以便数据集2中的数据集1在时间间隔内(start_date_time到end_date_time)的时间结束在同一行。这一点并没有得到很好的解释,所以也许下面的数据更容易理解。
我是新来的,所以,任何帮助都是非常感谢的。
数据看起来类似于下面所示。
Data set 1
co2 date_time
0.2 2012-05-01 10:23:20
0.2 2012-05-01 10:23:23
0.1 2012-05-01 10:23:26
0.6 2012-05-01 10:23:29
0.1 2012-05-01 10:23:32
0.1 2012-05-01 10:23:35
0.6 2012-05-01 10:23:38
0.6 2012-05-01 10:23:41
0.2 2012-05-01 10:23:44
0.2 2012-05-01 10:23:47
0.3 2012-05-01 10:23:50
0.3 2012-05-01 10:23:53
0.4 2012-05-01 10:23:56
0.4 2012-05-01 10:23:59
0.3 2012-05-01 10:24:02
0.3 2012-05-01 10:24:05
0.4 2012-05-01 10:24:08
0.4 2012-05-01 10:24:11
0.6 2012-05-01 10:24:14
0.6 2012-05-01 10:24:17
0.7 2012-05-01 10:24:20
0.7 2012-05-01 10:24:23
0.3 2012-05-01 10:24:26
0.7 2012-05-01 10:24:29
0.3 2012-05-01 10:24:32
0.4 2012-05-01 10:24:35
0.2 2012-05-01 10:24:38
0.3 2012-05-01 10:24:41
0.3 2012-05-01 10:24:44
0.3 2012-05-01 10:24:47
0.4 2012-05-01 10:24:50
0.1 2012-05-01 10:24:53
0.1 2012-05-01 10:24:56
Data set 2
jar start_date_time end_date_time
1 2012-05-01 10:23:18 2012-05-01 10:23:30
2 2012-05-01 10:23:31 2012-05-01 10:23:40
3 2012-05-01 10:23:41 2012-05-01 10:23:52
4 2012-05-01 10:23:53 2012-05-01 10:24:01
5 2012-05-01 10:24:02 2012-05-01 10:24:12
6 2012-05-01 10:24:13 2012-05-01 10:24:20
7 2012-05-01 10:24:30 2012-05-01 10:23:40
8 2012-05-01 10:24:41 2012-05-01 10:24:52
Expected data set
jar start_date_time end_date_time co2 date_time
1 05-01-12 10:23:18 05-01-12 10:23:30 0.2 05-01-12 10:23:20
1 05-01-12 10:23:18 05-01-12 10:23:30 0.2 05-01-12 10:23:23
1 05-01-12 10:23:18 05-01-12 10:23:30 0.1 05-01-12 10:23:26
1 05-01-12 10:23:18 05-01-12 10:23:30 0.6 05-01-12 10:23:29
2 05-01-12 10:23:31 05-01-12 10:23:40 0.1 05-01-12 10:23:32
2 05-01-12 10:23:31 05-01-12 10:23:40 0.1 05-01-12 10:23:35
2 05-01-12 10:23:31 05-01-12 10:23:40 0.6 05-01-12 10:23:38
3 05-01-12 10:23:41 05-01-12 10:23:52 0.6 05-01-12 10:23:41
3 05-01-12 10:23:41 05-01-12 10:23:52 0.2 05-01-12 10:23:44
3 05-01-12 10:23:41 05-01-12 10:23:52 0.2 05-01-12 10:23:47
3 05-01-12 10:23:41 05-01-12 10:23:52 0.3 05-01-12 10:23:50
4 05-01-12 10:23:53 05-01-12 10:24:01 0.3 05-01-12 10:23:53
4 05-01-12 10:23:53 05-01-12 10:24:01 0.4 05-01-12 10:23:56
4 05-01-12 10:23:53 05-01-12 10:24:01 0.4 05-01-12 10:23:59
5 05-01-12 10:24:02 05-01-12 10:24:12 0.3 05-01-12 10:24:02
5 05-01-12 10:24:02 05-01-12 10:24:12 0.3 05-01-12 10:24:05
5 05-01-12 10:24:02 05-01-12 10:24:12 0.4 05-01-12 10:24:08
5 05-01-12 10:24:02 05-01-12 10:24:12 0.4 05-01-12 10:24:11
6 05-01-12 10:24:13 05-01-12 10:24:20 0.6 05-01-12 10:24:14
6 05-01-12 10:24:13 05-01-12 10:24:20 0.6 05-01-12 10:24:17
6 05-01-12 10:24:13 05-01-12 10:24:20 0.7 05-01-12 10:24:20
7 05-01-12 10:24:30 05-01-12 10:24:40 0.3 05-01-12 10:24:32
7 05-01-12 10:24:30 05-01-12 10:24:40 0.4 05-01-12 10:24:35
7 05-01-12 10:24:30 05-01-12 10:24:40 0.2 05-01-12 10:24:38
8 05-01-12 10:24:41 05-01-12 10:24:50 0.3 05-01-12 10:24:41
8 05-01-12 10:24:41 05-01-12 10:24:50 0.3 05-01-12 10:24:44
8 05-01-12 10:24:41 05-01-12 10:24:50 0.3 05-01-12 10:24:47
8 05-01-12 10:24:41 05-01-12 10:24:50 0.4 05-01-12 10:24:50到目前为止,我已经尝试了以下几点:
sqldf("select * from df1 inner join df2 on (df1.date_time between df2.start_date_time and df2.end_date_time)")R消息: match.fun(asfn)中的错误:'c("as.hms","as.difftime")‘不是函数、字符或符号
和
sqldf("select * from df1 date_time inner join df2 start_date_time on (df1.date_time > df2.start_date_time and d1.date_time <= df2.end_date_time)")R消息:result_create中的错误(conn@ptr,语句):没有这样的列: ch4.new_date_time
和
do.call(rbind, apply(df1, 1, function(x) {
if(length(idx <- which(x["date_time"] >= df2$start_date_time & x["date_time"] <= df2$end_date_time)) > 0) {
cbind(rbind(x), df2[idx,])
}
}))正如AEF所建议的
crossing(df1, df2) %>%
filter(date_time >= start_date_time, date_time < end_date_time)R消息:1 date_time ch4_ppm co2_ppm jar_camp
6 jar campaign1年月日
11日期bvoc启动date_time bvoc结束bvoc启动时间bvoc结束时间
16 start_date_time end_date_time ch4启动时间ch4结束时间o2 start date_time
21 o2 end date_time o2开始时间o2结束时间注释
<0 rows> (或0-加长row.names)
发布于 2019-07-18 10:42:56
我认为你可以做到以下几点:
阅读您的数据:
library(tidyverse)
library(lubridate)
ds1 <-
str_replace_all(" co2_conc. date_time
0.2 2012-05-01 10:23:20
0.2 2012-05-01 10:23:23
0.1 2012-05-01 10:23:26
0.6 2012-05-01 10:23:29
0.1 2012-05-01 10:23:32
0.1 2012-05-01 10:23:35
0.6 2012-05-01 10:23:38
0.6 2012-05-01 10:23:41
0.2 2012-05-01 10:23:44
0.2 2012-05-01 10:23:47
0.3 2012-05-01 10:23:50
0.3 2012-05-01 10:23:53
0.4 2012-05-01 10:23:56
0.4 2012-05-01 10:23:59
0.3 2012-05-01 10:24:02
0.3 2012-05-01 10:24:05
0.4 2012-05-01 10:24:08
0.4 2012-05-01 10:24:11
0.6 2012-05-01 10:24:14
0.6 2012-05-01 10:24:17
0.7 2012-05-01 10:24:20
0.7 2012-05-01 10:24:23
0.3 2012-05-01 10:24:26
0.7 2012-05-01 10:24:29
0.3 2012-05-01 10:24:32
0.4 2012-05-01 10:24:35
0.2 2012-05-01 10:24:38
0.3 2012-05-01 10:24:41
0.3 2012-05-01 10:24:44
0.3 2012-05-01 10:24:47
0.4 2012-05-01 10:24:50
0.1 2012-05-01 10:24:53
0.1 2012-05-01 10:24:56", " {2,}", ";") %>%
read_delim(delim = ";", trim_ws = TRUE)
ds2 <-
str_replace_all("jar_no start_date_time end_date_time
1 2012-05-01 10:23:18 2012-05-01 10:23:30
2 2012-05-01 10:23:31 2012-05-01 10:23:40
3 2012-05-01 10:23:41 2012-05-01 10:23:52
4 2012-05-01 10:23:53 2012-05-01 10:24:01
5 2012-05-01 10:24:02 2012-05-01 10:24:12
6 2012-05-01 10:24:13 2012-05-01 10:24:20
7 2012-05-01 10:24:30 2012-05-01 10:23:40
8 2012-05-01 10:24:41 2012-05-01 10:24:52", " {2,}", ";") %>%
read_delim(delim = ";", trim_ws = TRUE)实际计算:计算行和滤波器的笛卡儿积
crossing(ds1, ds2) %>%
filter(date_time >= start_date_time,
date_time < end_date_time)然而,结果与你发布的“预期结果”不完全一样。我不确定我是否误解了,或者你的例子是否不正确。我不明白预期结果的第4行从何而来。数据集中2中似乎没有相应的开始/结束日期。
https://stackoverflow.com/questions/57092137
复制相似问题