我有16栏。我想将每个count列除以其各自的dc(uid)列。
+------------------------+------------------------------+--------------------------+------------------------------------+-------------------------------------+------------------------+---------------------+--------------------------+--------------------------------+----------------------------+--------------------------------------+---------------------------------------+--------------------------+-----------------------+
| count: interaction_eis | count: interaction_eis_reply | count: interaction_match | count: interaction_single_message_ | count: interaction_single_message_1 | count: interaction_yes | count: revenue_sale | dc(uid): interaction_eis | dc(uid): interaction_eis_reply | dc(uid): interaction_match | dc(uid): interaction_single_message_ | dc(uid): interaction_single_message_1 | dc(uid): interaction_yes | dc(uid): revenue_sale |
+------------------------+------------------------------+--------------------------+------------------------------------+-------------------------------------+------------------------+---------------------+--------------------------+--------------------------------+----------------------------+--------------------------------------+---------------------------------------+--------------------------+-----------------------+我知道我能做到
pre_purch_m['interaction_eis_rate'] = pre_purch_m['count: interaction_eis'] / pre_purch_m['dc(uid): interaction_eis']
pre_purch_m['interaction_eis_reply_rate'] = pre_purch_m['count: interaction_eis_reply'] / pre_purch_m['dc(uid): interaction_eis_reply']但是,8次这样做似乎是多余和费力的。
是否有熊猫的功能或范例可以更有效地完成这样的事情?
发布于 2015-04-15 17:50:02
假设您的列是一致的。这里有一个方法。
从dataframe df获取列。
cols = df.columns通过去掉count:和dc(uid):并获取唯一列表来获取唯一列。
uniq_cols = list(set([x.split(': ')[1] for x in cols]))现在,通过创建新列循环。
for col in uniq_cols:
df[col + '_rate'] = df['count: ' + col] / df['dc(uid): ' + col]而且,如果数据最初是通过存储这些uniq_cols来填充的话,就会容易得多。
发布于 2015-04-15 19:34:22
这16列是连续的,因此有一种方法可以这样做:
newdF = df[range(8)]/df[range(8, 16)].values使用.values防止重新编制索引问题。
然后重命名列:
newdF.rename(columns = lambda x : x.replace(x, x[6:] + '_rate'), inplace='True')https://stackoverflow.com/questions/29657088
复制相似问题