如果我有一个数据帧,比如:
data = [[10,"Apple"],[4,"Banana"],[3,"Strawberry"],[15,"Chocolate"],[5,"Kiwi"],[75,"Apple"],[4,"Potato"],[6,"Apple"],[45,"Banana"],[10,"Strawberry"],[10,"Apple"],]
df = pd.DataFrame(data, columns=('Cost', 'Fruit'))产生:
Cost Fruit
0 10 Apple
1 4 Banana
2 3 Strawberry
3 15 Chocolate
4 5 Kiwi
5 75 Apple
6 4 Potato
7 6 Apple
8 45 Banana
9 10 Strawberry
10 10 Apple我还有一个字典,里面有指定了某些值的店铺名称:
shop_names = {
0 : "Shop 1",
4 : "Shop 2",
7 : "Shop 3",
}产生:
{0: 'Shop 1', 4: 'Shop 2', 7: 'Shop 3'}我想创建一个" shop“列,并根据键为字典中的每个商店赋值。因此,第0-3行= shop 1,第4-6行= Shop 2,第7-10行= Shop 3。字典只将第一个索引作为每个商店的关键字。
所需输出:
Cost Fruit Shop
0 10 Apple Shop 1
1 4 Banana Shop 1
2 3 Strawberry Shop 1
3 15 Chocolate Shop 1
4 5 Kiwi Shop 2
5 75 Apple Shop 2
6 4 Potato Shop 2
7 6 Apple Shop 3
8 45 Banana Shop 3
9 10 Strawberry Shop 3
10 10 Apple Shop 3 这是一个示例数据集-实际数据集有10,000行-但我有一个字典,其中包含基于原始数据帧的等效索引位置的购物间隔。
我很难根据我的字典中的索引范围来赋值。任何帮助都将不胜感激。
非常感谢
发布于 2020-10-22 19:28:06
使用ffill通过Index.map创建Series,用于正向填充缺少的值:
df['Shop'] = pd.Series(df.index.map(shop_names), index=df.index).ffill()
print (df)
Cost Fruit Shop
0 10 Apple Shop 1
1 4 Banana Shop 1
2 3 Strawberry Shop 1
3 15 Chocolate Shop 1
4 5 Kiwi Shop 2
5 75 Apple Shop 2
6 4 Potato Shop 2
7 6 Apple Shop 3
8 45 Banana Shop 3
9 10 Strawberry Shop 3
10 10 Apple Shop 3发布于 2020-10-22 19:28:21
首先,使用Index.map()将商店名称映射到位置。然后用最后一个非NaN值填充该列(灵感来自this answer)
df["Shop"] = df.index.map(shop_names).values
df["Shop"] = df["Shop"].fillna(method="ffill")
df
Out[36]:
Cost Fruit Shop
0 10 Apple Shop 1
1 4 Banana Shop 1
2 3 Strawberry Shop 1
3 15 Chocolate Shop 1
4 5 Kiwi Shop 2
5 75 Apple Shop 2
6 4 Potato Shop 2
7 6 Apple Shop 3
8 45 Banana Shop 3
9 10 Strawberry Shop 3
10 10 Apple Shop 3https://stackoverflow.com/questions/64481108
复制相似问题