我有大量的实验结果数据,我需要对这些数据进行分类,以消除多个标准上的“主导”主题。下面的“玩具”数据反映了总体结构,但不一定是“实验”数据的尺寸。
df = pd.DataFrame({'Subject': ['Alpha', 'Bravo', 'Charlie'],
'A': [6, 7, 8],
'B': [11, 7, 12],
'C': [13, 6, 6],
'D': [5, 9, 4],
'E': [11, 9, 5],
'F': [9, 10, 3],
'G': [2, 6, 5],
'H': [8, 12, 11]})
Subject A B C D E F G H
0 Alpha 6 11 13 5 11 9 2 8
1 Bravo 7 7 6 9 9 10 6 12
2 Charlie 8 12 6 4 5 3 5 11如何使用“小于”成对比较生成以下结果。
[0, 1]: w=5, l=3, d=0
[0, 2]: w=4, l=4, d=0
[1, 2]: w=2, l=5, d=1并将它们与以下伪码结合起来,创建主导主题“Bravo”的子集,并将其从原始数据中删除?
tx = 3
i = 0
subject[0]='Alpha'
subject[1]='Bravo'
if w > l and l < tx
then y[i] = subject[0]
z[i] = subject[1]
elseif w < l and w < tx
then y[i] = subject[1]
z[i] = subject[0]
i += 1请指点?
发布于 2022-11-15 07:36:40
下面的代码似乎工作正常
def pairwise_compare(dfq, pairs, tx):
winners = []
losers = []
for pair in pairs:
w = 0
l = 0
x = 0
for i in dfq['Subject']:
for j in dfq['Subject']:
if i == pair[0] and j == pair[1]:
alt_first = dfq.loc[dfq['Subject'] == i, 'A':'H'].values
alt_second = dfq.loc[dfq['Subject'] == j, 'A':'H'].values
diffs = (alt_first - alt_second).astype(int)
w = np.sum(diffs < 0)
l = np.sum(diffs > 0)
x = np.sum(diffs == 0)
if w > l and l < tx:
winners.append(i)
losers.append(j)
elif w < l and w < tx:
winners.append(j)
losers.append(i)
return winners, losers
pair_order_list = itertools.combinations(df['Subject'],2)
pairs = list(pair_order_list)
print('')
tx = 3
winners, losers = pairwise_compare(df, pairs, tx)
print('')
for winner, loser in zip(winners, losers):
df.drop(df[df['Subject'] == loser].index, inplace=True)
print(f'{loser} is dominated by {winner}')
df.set_index('Subject', inplace=True)
print('')
print(df)并产生所需的输出。
Bravo is dominated by Charlie
A B C D E F G H
Subject
Alpha 6 11 13 5 11 9 2 8
Charlie 8 12 6 4 5 3 5 11如果有一位“熊猫”专家能制作出一个更地道的版本,我将不胜感激!
https://stackoverflow.com/questions/74413380
复制相似问题