我需要获取每一行的最小值(rank),并在数据帧中的特定位置为其创建一个新列。
values_from_api_call = [
{
"rank": 5,
"keyword": "brillenreiniger ultraschallbad",
"volume": 566,
"asin": "B08LCB95V8",
"parent_asin": "B08LCD86RD",
},
{
"rank": 20,
"keyword": "brillenreiniger ultraschallbad",
"volume": 566,
"asin": "B08LCB95V8",
"parent_asin": "B08LCD86RD",
},
{
"rank": 4,
"keyword": "classic ultraschallreiniger",
"volume": None,
"asin": "B073WWXTRF",
"parent_asin": "B073WW53SR",
},
{
"rank": 23,
"keyword": "classic ultraschallreiniger",
"volume": None,
"asin": "B073WWXTRF",
"parent_asin": "B073WW53SR",
},
{
"rank": 7,
"keyword": "classic ultraschallreiniger",
"volume": None,
"asin": "B09K7F2YN6",
"parent_asin": "B09K7BW5KH",
},
{
"rank": 17,
"keyword": "classic ultraschallreiniger",
"volume": None,
"asin": "B09K7F2YN6",
"parent_asin": "B09K7BW5KH",
},
{
"rank": 1,
"keyword": "classic ultraschallreiniger",
"volume": None,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 13,
"keyword": "classic ultraschallreiniger",
"volume": None,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 11,
"keyword": "flüssigkeit für ultraschallreiniger",
"volume": 500,
"asin": "B073WWXTRF",
"parent_asin": "B073WW53SR",
},
{
"rank": 24,
"keyword": "flüssigkeit für ultraschallreiniger",
"volume": 500,
"asin": "B073WWXTRF",
"parent_asin": "B073WW53SR",
},
{
"rank": 7,
"keyword": "flüssigkeit für ultraschallreiniger",
"volume": 500,
"asin": "B09K7F2YN6",
"parent_asin": "B09K7BW5KH",
},
{
"rank": 18,
"keyword": "flüssigkeit für ultraschallreiniger",
"volume": 500,
"asin": "B09K7F2YN6",
"parent_asin": "B09K7BW5KH",
},
{
"rank": 1,
"keyword": "flüssigkeit für ultraschallreiniger",
"volume": 500,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 14,
"keyword": "flüssigkeit für ultraschallreiniger",
"volume": 500,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 1,
"keyword": "konzentrat für ultraschallreiniger",
"volume": None,
"asin": "B073WWXTRF",
"parent_asin": "B073WW53SR",
},
{
"rank": 15,
"keyword": "konzentrat für ultraschallreiniger",
"volume": None,
"asin": "B073WWXTRF",
"parent_asin": "B073WW53SR",
},
{
"rank": 3,
"keyword": "konzentrat für ultraschallreiniger",
"volume": None,
"asin": "B09K7F2YN6",
"parent_asin": "B09K7BW5KH",
},
{
"rank": 16,
"keyword": "konzentrat für ultraschallreiniger",
"volume": None,
"asin": "B09K7F2YN6",
"parent_asin": "B09K7BW5KH",
},
{
"rank": 2,
"keyword": "konzentrat für ultraschallreiniger",
"volume": None,
"asin": "B08LCB95V8",
"parent_asin": "B08LCD86RD",
},
{
"rank": 14,
"keyword": "konzentrat für ultraschallreiniger",
"volume": None,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 1,
"keyword": "reiniger ultraschallgerät",
"volume": None,
"asin": "B073WWXTRF",
"parent_asin": "B073WW53SR",
},
{
"rank": 14,
"keyword": "reiniger ultraschallgerät",
"volume": None,
"asin": "B073WWXTRF",
"parent_asin": "B073WW53SR",
},
{
"rank": 5,
"keyword": "reiniger ultraschallgerät",
"volume": None,
"asin": "B09K7F2YN6",
"parent_asin": "B09K7BW5KH",
},
{
"rank": 13,
"keyword": "reiniger ultraschallgerät",
"volume": None,
"asin": "B09K7F2YN6",
"parent_asin": "B09K7BW5KH",
},
{
"rank": 3,
"keyword": "reiniger ultraschallgerät",
"volume": None,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 16,
"keyword": "reiniger ultraschallgerät",
"volume": None,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 2,
"keyword": "brille reinigen ultraschall",
"volume": None,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 14,
"keyword": "brille reinigen ultraschall",
"volume": None,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 5,
"keyword": "brillen ultraschallreiniger",
"volume": None,
"asin": "B09K7F2YN6",
"parent_asin": "B09K7BW5KH",
},
{
"rank": 19,
"keyword": "brillen ultraschallreiniger",
"volume": None,
"asin": "B09K7F2YN6",
"parent_asin": "B09K7BW5KH",
},
{
"rank": 4,
"keyword": "brillen ultraschallreiniger",
"volume": None,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 13,
"keyword": "brillen ultraschallreiniger",
"volume": None,
"asin": "B08L8GKY47",
"parent_asin": "B08LCD86RD",
},
{
"rank": 2,
"keyword": "schmuck reiniger",
"volume": None,
"asin": "B08LCB95V8",
"parent_asin": "B08LCD86RD",
},
{
"rank": 17,
"keyword": "schmuck reiniger",
"volume": None,
"asin": "B08LCB95V8",
"parent_asin": "B08LCD86RD",
},
]到目前为止我尝试过的是:
df_sponsored = pd.DataFrame(values_from_api_call)
df_sponsored = df_sponsored.drop_duplicates(subset=["asin", "keyword"], keep="first")
if not df_sponsored.empty:
df_sponsored[lists] = df_sponsored[lists].fillna(0)
df_sponsored = df_sponsored.assign(
rank=df_sponsored["rank"].astype(str)
).pivot_table(
index=["keyword", "volume"],
columns=["parent_asin", "asin"],
values="rank",
aggfunc="min",
fill_value=0,
margins=True,
margins_name="Best_ranking",
)
print(df_sponsored)输出(为了节省空间,我刚刚显示了结果格式):
parent_asin xxxxxx xxxxxx xxxxxx Best_ranking
keyword asin xxxxxx xxxxxx xxxxxx 2
somekeywords volume 1 3 3 1
Best_ranking 1 3 3 1 正如您在输出中看到的,边距Best_ranking出现了两次,一次出现在最后一列,一次作为值出现在第一列中。如果数据帧值较大,边距计算有时是错误的或不准确的,它不显示最小值,而是显示最大值。
我想要的产出是:
parent_asin Best_ranking xxxxxx xxxxxx xxxxxx
keyword asin xxxxxx xxxxxx xxxxxx
somekeywords volume 1 3 3 1发布于 2022-07-03 16:54:09
如果我没有弄错的话,边距会创建一个总计列,而不是您所期望的最低栏。
有一种方法可以做到:
df = (
pd.DataFrame(values_from_api_call)
.drop_duplicates(subset=["asin", "keyword"], keep="first")
.fillna(0)
.pivot_table(
index=["keyword", "volume"],
columns=["parent_asin", "asin"],
fill_value=0,
values="rank",
).assign(best_ranking=lambda df: df.apply(min, axis=1))
)print(df)
# Output

大多数最低等级为0,因为这是您选择的填充值时,您支点的表。
如果您想获得除0之外的最低级别,下面是您可以做的事情:
df = (
pd.DataFrame(values_from_api_call)
.drop_duplicates(subset=["asin", "keyword"], keep="first")
.fillna(0)
.pivot_table(
index=["keyword", "volume"],
columns=["parent_asin", "asin"],
fill_value=999_999,
values="rank",
)
.assign(best_ranking=lambda df: df.apply(min, axis=1))
.replace(999_999, 0)
)然后,您可以像这样移动新列以获得最后的dataframe:
df = df.reindex(
columns=pd.MultiIndex.from_tuples(
[("best_ranking", "")]
+ [col for col in df.columns.tolist() if col != ("best_ranking", "")],
names=["parent_asin", "asin"],
)
)print(df)
# Output

https://stackoverflow.com/questions/72780042
复制相似问题