文章/答案/技术大牛

发布

问按两列中的数据设置
EN

Stack Overflow用户

提问于 2019-02-20 16:03:37

回答 2查看 829关注 0票数 5

我有以下结构的数据集

     site    block treatment date insect1 insect2 insect3 insect4 ...
1  location1     a  chemical1 date1  0     0      10       1          
2  location1     a  chemical2 date1  1     0       2       0   
3  location1     a  chemical3 date1  0     0      23       1   
4  location1     a  chemical4 date1  0     0       5       0   
5  location1     a  chemical5 date1  0     0       9       0   
6  location1     b  chemical1 date1  0     1       5       0   
7  location1     b  chemical2 date1  1     0       5       1   
8  location1     b  chemical3 date1  0     0       4       0   
9  location1     b  chemical4 date1  0     0       5       0   
10 location1     b  chemical5 date1  3     0      12       0   
11 location1     c  chemical1 date1  0     0       2       1   
12 location1     c  chemical2 date1  0     0       0       0   
13 location1     c  chemical3 date1  0     0       4       0   
14 location1     c  chemical4 date1  0     0       2       7   
15 location1     c  chemical5 date1  2     0       5       0   
16 location1     d  chemical1 date1  0     0       8       1   
17 location1     d  chemical2 date1  0     0       3       0   
18 location1     d  chemical3 date1  0     0      10       0   
19 location1     d  chemical4 date1  0     0       2       0   
20 location1     d  chemical5 date1  0     1       7       0
       .         .     .        .    .     .       .       .   
       .         .     .        .    .     .       .       .   
       .         .     .        .    .     .       .       .

这个数据集是我进行的一个实验的结果，我测试了不同的五种不同的化学处理(化学物质1-5)对一些不同种类昆虫(这里的昆虫1-4)在一个田间地点(location1)的吸引力的影响。本实验在田间不同地点被阻断4次( and )，在不同时间复制5次(仅显示date1 )。所有这些信息都存储在数据集的前四列中。

下一列列(我有46列，但我只显示4列)表示不同种类的昆虫，以及我在每次处理x块x日期组合(=每一行)中用特定化学物质捕获的昆虫数量。

作为我分析的一部分，我想要遍历这个数据集，并为每个没有捕捉到昆虫的昆虫找到块x日期的组合。例如，我在insect2的块a或c中没有捕获到date1的任何个人，因此我希望将其从我的最后数据集中删除以进行分析。

我花了很多时间在代码中完成这个任务，但是昨晚我发现我的代码并没有像我想的那样工作，我正在努力想办法解决这个问题。下面是到目前为止的代码(我已经列出了解决问题的所有步骤，这样人们就可以看到问题可能是在哪里提出的，或者建议一种更好的方法.)：

创建一个列表，以便每种昆虫(这里的第5-8列)都有自己的数据。

sticky.list = lapply(sticky[-c(1:4,50)], function(i)data.frame(site=sticky$site, 
                                                          block=sticky$block,
                                                          treatment=sticky$treatment,
                                                          date=sticky$date,
                                                          number=as.numeric(i)))

作为我的列表的一部分创建的数据文件的一部分的示例

$insect1
       site    block     treatment date     number
1  location1     a       chemical1 date1      0
2  location1     a       chemical2 date1      1
3  location1     a       chemical3 date1      0
4  location1     a       chemical4 date1      0
5  location1     a       chemical5 date1      0

然后在具有dataframe名称(即昆虫名称)的列表中的每个dataframe中添加一个新列。

temp.list = Map(cbind, sticky.list, morphotype = names(sticky.list))  

       site    block   treatment date     number morphotype
1  location1     a     chemical1 date1      0      insect1
2  location1     a     chemical2 date1      1      insect1      
3  location1     a     chemical3 date1      0      insect1
4  location1     a     chemical4 date1      0      insect1
5  location1     a     chemical5 date1      0      insect1

通过垂直组合生成一个更大的数据集，然后将每个列表元素(即，创建一个大的数据基)平平。这将我上一个列表中的所有数据都放在一个dataframe中。

sticky.list.combined.df <- temp.list %>% bind_rows(temp.list) %>% # make larger sample data
  mutate_if(is.list, simplify_all) %>% # flatten each list element internally 
  unnest()

分组和形态类型，并在此分组的基础上查找数字和。然后，将这个sum列添加到我们刚刚创建的主要的大型数据格式中，即sticky.list.combined.df，使用内部连接。

sticky.list.combined.df.sum<- sticky.list.combined.df %>%
  group_by(date, block, morphotype) %>%
  summarize(sum = sum(number))

# A tibble: 855 x 4
# Groups:   date, block [?]
   date            block morphotype    sum
   <fct>           <fct> <chr>       <dbl>
 1 date1 a     insect1     0
 2 date1 a     insect2     0
 3 date1 a     insect3     0
 4 date1 a     insect4     0
# … with 845 more rows

然后

sticky.list.analysis<-left_join(sticky.list.combined.df,sticky.list.combined.df.sum, by=c("date"="date",
                                                                                          "morphotype"="morphotype"))

这是一个只显示insect1的输出示例。决定每个区块是否保留5行的决定因素是后两列:块.y和和，它们表示每个块( and )捕获的所有昆虫的总和。

      site       block.x    treatment date    number     morphotype block.y sum
1   location1       a       chemical1 date1      0         insect1       a   2
2   location1       a       chemical1 date1      0         insect1       b   8
3   location1       a       chemical1 date1      0         insect1       c   4
4   location1       a       chemical1 date1      0         insect1       d   0
5   location1       a       chemical2 date1      0         insect1       a   2
6   location1       a       chemical2 date1      0         insect1       b   8
7   location1       a       chemical2 date1      0         insect1       c   4
8   location1       a       chemical2 date1      0         insect1       d   0
9   location1       a       chemical3 date1      0         insect1       a   2
10  location1       a       chemical3 date1      0         insect1       b   8
11  location1       a       chemical3 date1      0         insect1       c   4
12  location1       a       chemical3 date1      0         insect1       d   0
13  location1       a       chemical4 date1      0         insect1       a   2
14  location1       a       chemical4 date1      0         insect1       b   8
15  location1       a       chemical4 date1      0         insect1       c   4
16  location1       a       chemical4 date1      0         insect1       d   0
17  location1       a       chemical5 date1      0         insect1       a   2
18  location1       a       chemical5 date1      0         insect1       b   8
19  location1       a       chemical5 date1      0         insect1       c   4
20  location1       a       chemical5 date1      0         insect1       d   0

，这就是我认为的问题产生的地方，

过滤和> 0的行。

对于捕获日期(例如，date1)和形态类型的每个组合，删除在该块中具有零捕获形态类型的行(即块and )。这是典型的诱捕实验(在汉克斯实验室的统计实践常见)下降或不包括日期，没有捕获我们的目标昆虫。这可能与非生物因素(例如太冷/热、下雨)或与昆虫有关的物候因素有关。将这些零保留在我们的数据中会减少我们在数据中发现重大影响的机会，因此我们将排除它们。

sticky.list.analysis.reduced<- sticky.list.analysis %>% 
  filter(sum > 0)

下面缩短的输出表明，对于insect1，我们应该保留块a。保留哪些块将根据昆虫所观察到的不同而有所不同。现在，我要做的是从block.y获取这些数据，并使用它为这些块删除行。

不幸的是，这不是我想要的输出。R已经根据sum列删除了一行。现在我们看到，块d是根据块.y列删除的。不幸的是，我们需要删除46-60行。

输出：

       site block.x treatment date number morphotype block.y sum
1    location1   a    chemical1 date1   0      insect1    a   2
2    location1   a    chemical1 date1   0      insect1    b   8
3    location1   a    chemical1 date1   0      insect1    c   4
4    location1   a    chemical2 date1   0      insect1    a   2
5    location1   a    chemical2 date1   0      insect1    b   8
6    location1   a    chemical2 date1   0      insect1    c   4
7    location1   a    chemical3 date1   0      insect1    a   2
8    location1   a    chemical3 date1   0      insect1    b   8
9    location1   a    chemical3 date1   0      insect1    c   4
10   location1   a    chemical4 date1   0      insect1    a   2
11   location1   a    chemical4 date1   0      insect1    b   8
12   location1   a    chemical4 date1   0      insect1    c   4
13   location1   a    chemical5 date1   0      insect1    a   2
14   location1   a    chemical5 date1   0      insect1    b   8
15   location1   a    chemical5 date1   0      insect1    c   4
16   location1   b    chemical1 date1   0      insect1    a   2
17   location1   b    chemical1 date1   0      insect1    b   8
18   location1   b    chemical1 date1   0      insect1    c   4
19   location1   b    chemical2 date1   0      insect1    a   2
20   location1   b    chemical2 date1   0      insect1    b   8
21   location1   b    chemical2 date1   0      insect1    c   4
22   location1   b    chemical3 date1   0      insect1    a   2
23   location1   b    chemical3 date1   0      insect1    b   8
24   location1   b    chemical3 date1   0      insect1    c   4
25   location1   b    chemical4 date1   0      insect1    a   2
26   location1   b    chemical4 date1   0      insect1    b   8
27   location1   b    chemical4 date1   0      insect1    c   4
28   location1   b    chemical5 date1   0      insect1    a   2
29   location1   b    chemical5 date1   0      insect1    b   8
30   location1   b    chemical5 date1   0      insect1    c   4
31   location1   c    chemical1 date1   0      insect1    a   2
32   location1   c    chemical1 date1   0      insect1    b   8
33   location1   c    chemical1 date1   0      insect1    c   4
34   location1   c    chemical2 date1   0      insect1    a   2
35   location1   c    chemical2 date1   0      insect1    b   8
36   location1   c    chemical2 date1   0      insect1    c   4
37   location1   c    chemical3 date1   0      insect1    a   2
38   location1   c    chemical3 date1   0      insect1    b   8
39   location1   c    chemical3 date1   0      insect1    c   4
40   location1   c    chemical4 date1   0      insect1    a   2
41   location1   c    chemical4 date1   0      insect1    b   8
42   location1   c    chemical4 date1   0      insect1    c   4
43   location1   c    chemical5 date1   0      insect1    a   2
44   location1   c    chemical5 date1   0      insect1    b   8
45   location1   c    chemical5 date1   0      insect1    c   4
46   location1   d    chemical1 date1   0      insect1    a   2
47   location1   d    chemical1 date1   0      insect1    b   8
48   location1   d    chemical1 date1   0      insect1    c   4
49   location1   d    chemical2 date1   0      insect1    a   2
50   location1   d    chemical2 date1   0      insect1    b   8
51   location1   d    chemical2 date1   0      insect1    c   4
52   location1   d    chemical3 date1   0      insect1    a   2
53   location1   d    chemical3 date1   0      insect1    b   8
54   location1   d    chemical3 date1   0      insect1    c   4
55   location1   d    chemical4 date1   0      insect1    a   2
56   location1   d    chemical4 date1   0      insect1    b   8
57   location1   d    chemical4 date1   0      insect1    c   4
58   location1   d    chemical5 date1   0      insect1    a   2
59   location1   d    chemical5 date1   0      insect1    b   8
60   location1   d    chemical5 date1   0      insect1    c   4

期望产出：

       site block.x treatment date number morphotype block.y sum
1    location1   a    chemical1 date1   0      insect1    a   2
2    location1   a    chemical1 date1   0      insect1    b   8
3    location1   a    chemical1 date1   0      insect1    c   4
4    location1   a    chemical2 date1   0      insect1    a   2
5    location1   a    chemical2 date1   0      insect1    b   8
6    location1   a    chemical2 date1   0      insect1    c   4
7    location1   a    chemical3 date1   0      insect1    a   2
8    location1   a    chemical3 date1   0      insect1    b   8
9    location1   a    chemical3 date1   0      insect1    c   4
10   location1   a    chemical4 date1   0      insect1    a   2
11   location1   a    chemical4 date1   0      insect1    b   8
12   location1   a    chemical4 date1   0      insect1    c   4
13   location1   a    chemical5 date1   0      insect1    a   2
14   location1   a    chemical5 date1   0      insect1    b   8
15   location1   a    chemical5 date1   0      insect1    c   4
16   location1   b    chemical1 date1   0      insect1    a   2
17   location1   b    chemical1 date1   0      insect1    b   8
18   location1   b    chemical1 date1   0      insect1    c   4
19   location1   b    chemical2 date1   0      insect1    a   2
20   location1   b    chemical2 date1   0      insect1    b   8
21   location1   b    chemical2 date1   0      insect1    c   4
22   location1   b    chemical3 date1   0      insect1    a   2
23   location1   b    chemical3 date1   0      insect1    b   8
24   location1   b    chemical3 date1   0      insect1    c   4
25   location1   b    chemical4 date1   0      insect1    a   2
26   location1   b    chemical4 date1   0      insect1    b   8
27   location1   b    chemical4 date1   0      insect1    c   4
28   location1   b    chemical5 date1   0      insect1    a   2
29   location1   b    chemical5 date1   0      insect1    b   8
30   location1   b    chemical5 date1   0      insect1    c   4
31   location1   c    chemical1 date1   0      insect1    a   2
32   location1   c    chemical1 date1   0      insect1    b   8
33   location1   c    chemical1 date1   0      insect1    c   4
34   location1   c    chemical2 date1   0      insect1    a   2
35   location1   c    chemical2 date1   0      insect1    b   8
36   location1   c    chemical2 date1   0      insect1    c   4
37   location1   c    chemical3 date1   0      insect1    a   2
38   location1   c    chemical3 date1   0      insect1    b   8
39   location1   c    chemical3 date1   0      insect1    c   4
40   location1   c    chemical4 date1   0      insect1    a   2
41   location1   c    chemical4 date1   0      insect1    b   8
42   location1   c    chemical4 date1   0      insect1    c   4
43   location1   c    chemical5 date1   0      insect1    a   2
44   location1   c    chemical5 date1   0      insect1    b   8
45   location1   c    chemical5 date1   0      insect1    c   4

一旦这个问题解决了，我想把每个昆虫从它的列中分类(我知道如何手动完成，但不是对所有的昆虫，但这是一个完全不同的问题)，然后运行广义线性混合模型来评估治疗对捕获每种昆虫的影响，日期和位置是随机效应。

我很欣赏对这件事的任何见解。如果我需要编辑这个来添加任何额外的信息，请告诉我，我已经尽我最大的努力使我的数据和问题的结构清楚。谢谢。

dataframe

dplyr

subset

list

回答 2

Stack Overflow用户

发布于 2019-02-20 17:00:09

你试过subset函数了吗？它是在base R包(链接)下定义的。

您可以执行以下操作：

filtered.sticky.list.analysis <- subset(sticky.list.analysis, block.x == "a" || block.x == "b" || block.x == "c")

另一件能起作用的是：

filtered.sticky.list.analysis <- subset(sticky.list.analysis, block.x != "d")

密码很清楚。第一个选项选择block.x等于a、b或c的所有内容。第二个选项选择与d不同的所有内容。

票数 0

Stack Overflow用户

发布于 2022-09-06 20:25:46

3种想法：

1下面的代码创建了一个包含数据的列表，为每个昆虫物种创建一个数据格式，其中只包含您观察到的行。我明白你的要求了吗？我担心您可能仍然希望将"0"-rows从至少有一个观察的块/日期组合中.？如果我在正确的轨道上，请告诉我！(在下面的代码中，'1input_data.csv‘只是从OP中的示例数据中生成的csv。)

# Read data:
rawdata <- read.csv(file = '1input_data.csv')

# Make an empty list, which will later contain one dataset for each insect:
list_of_dfs <- list()

# Loop over all insects:
for(insect in colnames(rawdata)[5:ncol(rawdata)]) {
  solo_df <- rawdata[,c('site', 'block', 'treatment', 'date', insect)] # create a df with just one insect species' data
  list_of_dfs[[insect]] <- solo_df[solo_df$insect != 0,] # Subset this solo dataframe to contain only nonzero rows
}

2还会查看您选择的GLM函数是否内置了“drop零行”或“drop NA”选项？

3将整个数据集中的所有0替换为NA？

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54790572

复制

相似问题

问按两列中的数据设置
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问按两列中的数据设置EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问按两列中的数据设置
EN