首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在pandas数据框中按单组值划分多行

在pandas数据框中按单组值划分多行
EN

Stack Overflow用户
提问于 2020-05-18 02:57:55
回答 2查看 82关注 0票数 0

我有一个与下面类似的数据框架(但它有数百只股票,而不是A和B)。我也不知道数据框中会有多少股票。我正在尝试通过按日期列匹配的所有股票对指数行进行分红(2020年5月15日的股票A按2020年5月15日的指数分红,2020年5月16日的股票A除以2020年5月16日的指数,等等,然后是2020年5月15日的股票B按2020年5月15日的指数分红,依此类推)。我在所需的列中添加了我想要的答案,但不知道如何获得它。

代码语言:javascript
复制
d = {'Stock' : pd.Series(['A', 'A', 'A','B', 'B', 'B', 'INDEX', 'INDEX', 'INDEX']), 
 'Date' : pd.Series(['5/15/2020', '5/16/2020', '5/17/2020','5/15/2020', \
                     '5/16/2020', '5/17/2020','5/15/2020','5/16/2020','5/17/2020']),
 'Price' : pd.Series([10,20,30,20,40,60,2,5,10]),
'DESIRED' : pd.Series([5,4,3,10,8,6,1,1,1])}
df = pd.DataFrame(d) 
df 
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-05-18 03:35:34

代码语言:javascript
复制
import pandas as pd

d = {'Stock' : pd.Series(['A', 'A', 'A','B', 'B', 'B', 'INDEX', 'INDEX', 'INDEX']), 
 'Date' : pd.Series(['5/15/2020', '5/16/2020', '5/17/2020','5/15/2020', \
                     '5/16/2020', '5/17/2020','5/15/2020','5/16/2020','5/17/2020']),
 'Price' : pd.Series([10,20,30,20,40,60,2,5,10]),
'DESIRED' : pd.Series([5,4,3,10,8,6,1,1,1])}
df = pd.DataFrame(d) 

这里有一个可能的解决方案:

代码语言:javascript
复制
#First we build a dataframe containing only index rows
df_index = df[df.Stock == 'INDEX']

#and we get rid of those rows from the original dataframe
df = df[df.Stock != 'INDEX']

#now we merge them
df = df.merge(df_index[['Date', 'Price']], on='Date', suffixes = ['', '_index'])

#and we simply create the new column
df['hooray!'] = df.Price/df.Price_index

#If you want you can delete the column
#del df['Price_index']

输出:

代码语言:javascript
复制
    Stock   Date    Price   DESIRED     Price_index     hooray!
0   A   5/15/2020   10  5   2   5.0
1   B   5/15/2020   20  10  2   10.0
2   A   5/16/2020   20  4   5   4.0
3   B   5/16/2020   40  8   5   8.0
4   A   5/17/2020   30  3   10  3.0
5   B   5/17/2020   60  6   10  6.0
票数 0
EN

Stack Overflow用户

发布于 2020-05-18 03:29:13

这应该能起到作用:

代码语言:javascript
复制
import pandas as pd

#data (NOTE: i've removed the desired column)
d = {'Stock' : pd.Series(['A', 'A', 'A','B', 'B', 'B', 'INDEX', 'INDEX', 'INDEX']), 
'Date' : pd.Series(['5/15/2020', '5/16/2020', '5/17/2020','5/15/2020', \
                 '5/16/2020', '5/17/2020','5/15/2020','5/16/2020','5/17/2020']),
'Price' : pd.Series([10,20,30,20,40,60,2,5,10])}

#create dataframe
df = pd.DataFrame(d)
#create emoty desired column
df['DESIRED'] = ''

#create sub dataframes for stocks and indices
stocksDf = df.loc[df['Stock'] != 'INDEX'].reset_index(drop=True)
indexDf = df.loc[df['Stock'] == 'INDEX'].reset_index(drop=True)
#loop over stocks dataframe
for i, row in stocksDf.iterrows():
    #define needed values
    stocks = stocksDf.at[i, 'Stock']
    price = stocksDf.at[i, 'Price']
    date = stocksDf.at[i, 'Date']
    #get index matching date of stock
    matchingIndex = indexDf.loc[indexDf['Date'] == date].reset_index(drop=True)
    #if doesn't exists just print no matching index
    if len(matchingIndex)==0:
        df['DESIRED'].loc[(df['Stock'] == stocks) & (df['Price'] == price) & (df['Date'] == date)] = 'No Matching Index'
    else:
        #if exists calculate Desired as Price of stock / price of index
        indexPrice = matchingIndex.at[0,'Price']
        df['DESIRED'].loc[(df['Stock'] == stocks) & (df['Price'] == price) & (df['Date'] == date)] = df['Price'] / indexPrice

#for indices just set desired as 1
df['DESIRED'].loc[df['Stock'] == 'INDEX'] =   1 

print(df)

   Stock       Date  Price DESIRED
0      A  5/15/2020     10       5
1      A  5/16/2020     20       4
2      A  5/17/2020     30       3
3      B  5/15/2020     20      10
4      B  5/16/2020     40       8
5      B  5/17/2020     60       6
6  INDEX  5/15/2020      2       1
7  INDEX  5/16/2020      5       1
8  INDEX  5/17/2020     10       1
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61857176

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档