对于A、C、D、E、F和G列的任何组合,当B列在D列值的5%以内时,我试图找到它。一旦发现,我想将该值粘贴到一个新的列中,无论组合在哪里。
下面是我正在处理的数据示例:
structure(list(A = c(500L, 10000L, 5000L, 500L, 100L, 500L, 1000L,
10000L, 5000L, 1000L, 500L, 5000L, 100L, 5000L, 500L, 500L, 500L,
1000L, 10000L, 500L), B = c(1.53147891704226, 5.51999984066968,
1.69897000433602, 3.49996186559619, 2.8668778143375, 2.27415784926368,
2.69983772586725, 4.30000820255381, 4.28000895310819, 1.14612803567824,
3.40001963506516, 4.88000138832177, 2.3747483460101, 4, 3.03342375548695,
3.04999285692014, 2.59988307207369, 3.51666755909904, 4.40000234592796,
2.82477646247555), C = c(0.118917162666339, 32.46875, 0.00120927734375,
6.69645182291667e-06, 38.1009114583333, 0.03888505859375, 0.984812890625,
181.953125, 0.0079256796875, 0.0397203010315885, 1.693359375,
0.25630859375, 0.00419210611979167, 1.4658203125, 0.00764973958333333,
0.294973113716194, 8.8974609375, 0.0014642802734375, 67.609375,
0.00205580344395639), D = c(4.63125661725864, 34.1632795742744,
0.262987871586425, 9.53427792464916e-06, 38.7106620745277, 0.187395038620314,
0.99014163328848, 211.108639904501, 0.0108561099088211, 9.82604248822947,
1.95692192890506, 0.262987871586425, 0.00616933538501461, 2.23297962243741,
0.020686261349356, 0.53228350287947, 26.4570757028734, 0.00221508528097736,
68.1735822402243, 0.00495578134094092), E = c(2, 2, 2, 100, 2,
100, 2, 2, 100, 2, 2, 2, 2, 100, 100, 2, 2, 100, 100, 2), F = c(1e-05,
1e-06, 1e-07, 1e-08, 1e-05, 1e-06, 1e-04, 1e-05, 1e-06, 1e-05,
1e-06, 1e-07, 1e-07, 1e-07, 1e-08, 1e-06, 1e-06, 1e-06, 1e-05,
1e-08), G = c("Effective Number of Haplotypes", "Number of Polymorphic Sites",
"Gene Diversity", "Nucleotide Diversity", "Number of Heterozygotes",
"Gene Diversity", "Gene Diversity", "Number of Polymorphic Sites",
"Nucleotide Diversity", "Effective Number of Haplotypes", "Number of Haplotypes",
"Gene Diversity", "Gene Diversity", "Number of Haplotypes", "Number of Polymorphic Sites",
"Effective Number of Haplotypes", "Number of Heterozygotes",
"Nucleotide Diversity", "Number of Heterozygotes", "Effective Number of Haplotypes"
)), .Names = c("A", "B", "C", "D", "E", "F", "G"), row.names = c("11025",
"13649", "37612", "178511", "9864", "15883", "2469", "7104",
"15089", "11140", "18719", "47812", "36151", "31315", "66810",
"17609", "16501", "14975", "10860", "45318"), class = "data.frame")我的工作代码是:
min(df[which(df$C>=(0.05*df$D) & df$G == 'Nucleotide Diversity' & df$F==1e-6 & df$A==5000 & df$E==100),]$B),它返回我想要的A、C、D、E、F和G列组合的一个数字。
问题/问题1: I在一个新列H中粘贴这个数字,其中可以找到A、C、D、E、F和G的所有组合。
问题/问题2:是否有一种不插入df$G==、df$F==、df$A==和df$E==值的自动化方法?
理想输出
A B C D E F G H
500 1.531479 1.189172e-01 4.631257e+00 2 1e-05 Effective Number of Haplotypes
10000 5.520000 3.246875e+01 3.416328e+01 2 1e-06 Number of Polymorphic Sites
5000 1.698970 1.209277e-03 2.629879e-01 2 1e-07 Gene Diversity
5000 3.499962 6.696452e-06 9.534278e-06 100 1e-06 Nucleotide Diversity 4.280009
100 2.866878 3.810091e+01 3.871066e+01 2 1e-05 Number of Heterozygotes
500 2.274158 3.888506e-02 1.873950e-01 100 1e-06 Gene Diversity
1000 2.699838 9.848129e-01 9.901416e-01 2 1e-04 Gene Diversity
10000 4.300008 1.819531e+02 2.111086e+02 2 1e-05 Number of Polymorphic Sites
5000 4.280009 7.925680e-03 1.085611e-02 100 1e-06 Nucleotide Diversity 4.280009
...我认为有办法做到这一点,但我应该搜索的术语对我来说还不清楚。
发布于 2020-01-22 01:34:48
你是说像这样的事吗?
library(dplyr)
df %>% group_by(A, G) %>% mutate(H = min(B[C >= 0.05 * D]))这将返回B的最小值,其中C值大于D的5%。也许您想在group_by中添加更多的变量。
https://stackoverflow.com/questions/59851407
复制相似问题