我从两个实验中获得了一些数据,参与者听了成对的音频,现在我试图获得一个较小的成对列表,其中片段只出现一次。以下是我的数据示例,其中每一行表示一对数据:
data <- structure(c("38", "39", "48", "50", "55", "68", "143", "'00123_16_02 Firestarter_timbre.txt'",
"'00123_16_02 Firestarter_timbre.txt'", "'00123_16_02 Firestarter_timbre.txt'",
"'00123_16_02 Firestarter_timbre.txt'", "'00133_10_02 Loner_timbre.txt'",
"'00133_10_02 Loner_timbre.txt'", "'00371_17_05 - Original_timbre.txt'",
"'00133_10_02 Loner_timbre.txt'", "'00030_11_01 Get Your Snack On_timbre.txt'",
"'00845_03_11 - Flying Lotus - Parisian Goldfish_timbre.txt'",
"'01249_17_UMEK - Efortil_timbre.txt'", "'00030_11_01 Get Your Snack On_timbre.txt'",
"'01300_08_02 - Clipper_timbre.txt'", "'01300_08_02 - Clipper_timbre.txt'",
"MRHT", "MRHT", "MRHT", "MRHT", "MRHT", "MRHT", "MRHT", "12",
"9", "14", "11", "14", "15", "12", "11", "12", "14", "15", "14",
"14", "11", "2.75", "2.22222222222222", "2.21428571428571", "2.54545454545455",
"2.28571428571429", "2.53333333333333", "2.25", "2.81818181818182",
"3.25", "3.14285714285714", "2.93333333333333", "3.14285714285714",
"3.07142857142857", "2.90909090909091", "0.621581560508061",
"0.97182531580755", "1.25137287246211", "1.21355975243384", "0.994490316197694",
"0.743223352957207", "1.05528970602217", "0.873862897505303",
"0.753778361444409", "0.662993544131796", "1.03279555898864",
"0.662993544131796", "0.997248963150875", "1.04446593573419"), .Dim = c(7L,
10L), .Dimnames = list(NULL, c("pair.number", "Segment1", "Segment2",
"category", "Rhythm.n", "Timbre.n", "Rhythm.mean", "Timbre.mean",
"Rhythm.sd", "Timbre.sd")))有没有办法得到一组对,其中片段不会在"Segment1“和"Segment2”之间重复?下面是它可能的样子:
structure(c("48", "55", "143", "'00123_16_02 Firestarter_timbre.txt'",
"'00133_10_02 Loner_timbre.txt'", "'00371_17_05 - Original_timbre.txt'",
"'00845_03_11 - Flying Lotus - Parisian Goldfish_timbre.txt'",
"'00030_11_01 Get Your Snack On_timbre.txt'", "'01300_08_02 - Clipper_timbre.txt'",
"MRHT", "MRHT", "MRHT", "14", "14", "12", "14", "14", "11", "2.21428571428571",
"2.28571428571429", "2.25", "3.14285714285714", "3.14285714285714",
"2.90909090909091", "1.25137287246211", "0.994490316197694",
"1.05528970602217", "0.662993544131796", "0.662993544131796",
"1.04446593573419"), .Dim = c(3L, 10L), .Dimnames = list(NULL,
c("pair.number", "Segment1", "Segment2", "category", "Rhythm.n",
"Timbre.n", "Rhythm.mean", "Timbre.mean", "Rhythm.sd", "Timbre.sd"
)))谢谢!
发布于 2014-04-23 19:49:30
编辑:第二行代码现在确保Segment1列中的任何内容都不会出现在Segment2列中。请注意,此解决方案可能返回的行数少于可能的最大行数。
这确保了Segement1的值是唯一的:
data <- data[!duplicated(data[, "Segment1"]),]然后,您可以运行此命令来删除Segment2列中的重复项;此操作还将删除Segment1列中任何位置出现Segment2的任何行:
data <- data[!duplicated(data[, "Segment2"]) & !(data[, "Segment2"] %in% data[, "Segment1"]),]发布于 2014-04-23 20:33:51
这听起来像是你想要一个所谓的“匹配图”--你的顶点是轨迹,如果你把它们成对地听,那么它们之间就会有一条边。然后,您需要找到一组不包含公共顶点的边(匹配)-理想情况下可能是这样的最大集(最大匹配)。
在R的igraph包中有一个名为maximum.bipartite.matching的函数可以帮助实现这一点--您需要将segment1和segment2放入图形表示中才能调用该函数。大致是这样的:
seg1 <-df$Segment1
seg2 <- df$Segment2
levs <- unique(c(seg1, seg2))
seg1 <- as.integer(factor(seg1, levels=levs))
seg2 <- as.integer(factor(seg2, levels=levs))
library(igraph)
reord <- order(c(1:length(seg1), 1:length(seg2)))
gr <- graph(c(seg1, seg2)[reord])
maximum.bipartite.matching(gr)这主要是为了以正确的格式获取顶点:我们将它们转换为具有公共级别的因子,然后将它们转换为整数。我们将它们交织在一起形成(seg1_1,seg2_1,seg1_2,seg2_2,seg1_3,seg2_3,...)给出成对的顶点,然后用它们创建一个图形对象。最后一行的输出将找到最大数量的音轨对,使得它们都不会重叠。您需要提取这些数据,并将它们映射回原始数据集。
https://stackoverflow.com/questions/23243042
复制相似问题