我有两个与.csv文件不同的.csv
文件中的列:
Index(['App', 'Category', 'Rating', 'Reviews', 'Size_MBs', 'Installs', 'Type','Price', 'Content_Rating', 'Genres', 'Last_Updated','Android_Ver'],dtype='object')第一项:
category_installs=df_apps_clean.groupby('Category').agg({'Installs':pd.Series.sum})
category_installs.sort_values('Installs', ascending=True, inplace=True)以下列方式提供产出:
**Category---------------Installs**
VIDEO_PLAYERS-------3916897200
FAMILY-----------------4437554490
PHOTOGRAPHY--------4649143130
SOCIAL-----------------5487841475
PRODUCTIVITY---------5788070180
TOOLS------------------8099724500
COMMUNICATION------11039241530
GAME-------------------13858762717第二项:
app_installs = df_apps_clean.groupby('Category').agg({'App':pd.Series.count})
app_installs.sort_values('App', ascending=False)以下列方式提供产出:
**Category--------------App**
FAMILY----------------1606
GAME-------------------910
TOOLS------------------719
PRODUCTIVITY----------301
PERSONALIZATION------298
LIFESTYLE---------------297
FINANCE----------------296
MEDICAL----------------292
PHOTOGRAPHY---------263
BUSINESS--------------262
SPORTS----------------260
COMMUNICATION------257但是当我用熊猫合并它们的时候,像这样:
cat_merged_df = pd.merge(app_installs, category_installs,on='Category', how='inner')
cat_merged_df.sort_values('Installs', ascending=False)我得到的输出如下:
**Category----------App_x----------Installs----------App_y**
GAME----------------910----------13858762717--------Ra Ga BaMu.F.O.Brick Breaker BR211:CK
COMMUNICATION----257---------11039241530---------EJ messengerBest Browser BD social networkingD...
TOOLS----------------719------------8099724500--------ei CalcBM speed testCZ Kompasap,wifi testing,i...
PRODUCTIVITY--------301--------5788070180-----------ER AssistBAMMS for BM SQDL Image ManagerEB Sca...
SOCIAL--------------203------------5487841475---------CB HeroesDN BlogHum Ek Hain 2.02UP EB Bill Pay...为什么我得到3列与应用程序列被拆分为App_x和App_y?我正在处理的文件中没有这样的数据。
发布于 2022-02-22 12:22:47
如果我明白你想要什么也许这会有帮助。
制作df1
import pandas as pd
col1 = ['VIDEO_PLAYERS', 'FAMILY', 'PHOTOGRAPHY', 'SOCIAL', 'PRODUCTIVITY', 'TOOLS', 'COMMUNICATION', 'GAME']
col2 = [3916897200, 4437554490, 4649143130, 5487841475, 5788070180, 8099724500, 11039241530, 13858762717]
d = {'Category':col1, 'Installs':col2}
df1 = pd.DataFrame(d)
Category Installs
0 VIDEO_PLAYERS 3916897200
1 FAMILY 4437554490
2 PHOTOGRAPHY 4649143130
3 SOCIAL 5487841475
4 PRODUCTIVITY 5788070180
5 TOOLS 8099724500
6 COMMUNICATION 11039241530
7 GAME 13858762717制作df2
col1 = ['FAMILY', 'GAME', 'TOOLS', 'PRODUCTIVITY', 'PERSONALIZATION', 'LIFESTYLE', 'FINANCE', 'MEDICAL', 'PHOTOGRAPHY', 'BUSINESS', 'SPORTS', 'COMMUNICATION']
col2 = [1606, 910, 719, 301, 298, 297, 296, 292, 263, 262, 260, 257]
d = {'Category':col1, 'App':col2}
df2 = pd.DataFrame(d)
Category App
0 FAMILY 1606
1 GAME 910
2 TOOLS 719
3 PRODUCTIVITY 301
4 PERSONALIZATION 298
5 LIFESTYLE 297
6 FINANCE 296
7 MEDICAL 292
8 PHOTOGRAPHY 263
9 BUSINESS 262
10 SPORTS 260
11 COMMUNICATION 257在Category上合并两个帧
pd.merge(left=df1, right=df2, on='Category')
Category Installs App
0 FAMILY 4437554490 1606
1 PHOTOGRAPHY 4649143130 263
2 PRODUCTIVITY 5788070180 301
3 TOOLS 8099724500 719
4 COMMUNICATION 11039241530 257
5 GAME 13858762717 910如果这不是您想要的,请显示您希望输出的外观,我将更新。您可以通过how=更改联接类型。
https://stackoverflow.com/questions/71219794
复制相似问题