首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何匹配字符串并相应安排数据?

如何匹配字符串并相应安排数据?
EN

Stack Overflow用户
提问于 2022-01-07 14:39:26
回答 1查看 64关注 0票数 1

输入df1和df2

df1:

代码语言:javascript
复制
Subcategory_Desc    Segment_Desc    Flow            Side        Row_no
APPLE               APPLE LOOSE     Apple Kanzi     Front       Row 1
APPLE               APPLE LOOSE     Apple Jazz      Front       Row 1
CITRUS              ORANGES LOOSE   Orange Navel    Front       Row 1
PEAR                PEARS LOOSE     Lemon           Right End   Row 1
AVOCADOS            AVOCADOS LOOSE  Avocado         Back        Row 1
TROPICAL FRUIT      KIWI FRUIT      Kiwi Gold       Back        Row 1
TROPICAL FRUIT      KIWI FRUIT      Kiwi Green      Left End    Row 1

df2:

代码语言:javascript
复制
Subcategory_Desc    Segment_Desc    Flow
TROPICAL FRUIT      KIWI FRUIT      5pk Kids Kiwi
APPLE               APPLE LOOSE     Apple GoldenDel
AVOCADOS            AVOCADOS LOOSE  Avocado Tray

场景:考虑到以下条件,应将 Dataframe df2行插入到dataframe df1:

  1. 检查df1中类似的Subcategory_Desc和Segment_Desc,并将df2行插入到该特定侧的末尾(前/后)。
  2. 也需要考虑Row_no列,因为原始数据集包含n个Row_no数,这里只为示例数据提供了第1行。

预期输出:

代码语言:javascript
复制
Subcategory_Desc    Segment_Desc    Flow            Side        Row_no
APPLE               APPLE LOOSE     Apple Kanzi     Front       Row 1
APPLE               APPLE LOOSE     Apple Jazz      Front       Row 1
CITRUS              ORANGES LOOSE   Orange Navel    Front       Row 1
APPLE               APPLE LOOSE     Apple GoldenDel Front       Row 1
PEAR                PEARS LOOSE     Lemon           Right End   Row 1
AVOCADOS            AVOCADOS LOOSE  Avocado         Back        Row 1
TROPICAL FRUIT      KIWI FRUIT      Kiwi Gold       Back        Row 1
TROPICAL FRUIT      KIWI FRUIT      5pk Kids Kiwi   Back        Row 1
AVOCADOS            AVOCADOS LOOSE  Avocado Tray    Back        Row 1
TROPICAL FRUIT      KIWI FRUIT      Kiwi Green      Left End    Row 1

不确定什么简单的逻辑可以用于此目的。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-01-08 18:03:18

因此,给定以下数据格式:

代码语言:javascript
复制
import pandas as pd

df1 = pd.DataFrame(
    {
        "Subcategory_Desc": {
            0: "APPLE",
            1: "APPLE",
            2: "CITRUS",
            3: "PEAR",
            4: "AVOCADOS",
            5: "TROPICAL FRUIT",
            6: "TROPICAL FRUIT",
        },
        "Segment_Desc": {
            0: "APPLE LOOSE",
            1: "APPLE LOOSE",
            2: "ORANGES LOOSE",
            3: "PEARS LOOSE",
            4: "AVOCADOS LOOSE",
            5: "KIWI FRUIT",
            6: "KIWI FRUIT",
        },
        "Flow": {
            0: "Apple Kanzi",
            1: "Apple Jazz",
            2: "Orange Navel",
            3: "Lemon",
            4: "Avocado",
            5: "Kiwi Gold",
            6: "Kiwi Green",
        },
        "Side": {
            0: "Front",
            1: "Front",
            2: "Front",
            3: "Right_End",
            4: "Back",
            5: "Back",
            6: "Left_End",
        },
        "Row_no": {
            0: "Row 1",
            1: "Row 1",
            2: "Row 1",
            3: "Row 1",
            4: "Row 1",
            5: "Row 1",
            6: "Row 1",
        },
    }
)

df2 = pd.DataFrame(
    {
        "Subcategory_Desc": {0: "TROPICAL FRUIT", 1: "APPLE", 2: "AVOCADOS"},
        "Segment_Desc": {0: "KIWI FRUIT", 1: "APPLE LOOSE", 2: "AVOCADOS LOOSE"},
        "Flow": {0: "5pk Kids Kiwi", 1: "Apple GoldenDel", 2: "Avocado Tray"},
    }
)

你可以试试这个:

代码语言:javascript
复制
# Initialize new column
df2["idx"] = ""

# Find indice of first match in df1
for _, row2 in df2.iterrows():
    for i, row1 in df1.iterrows():
        if i + 1 >= df1.shape[0]:
            break
        if (
            row1["Subcategory_Desc"] == row2["Subcategory_Desc"]
            and row1["Segment_Desc"] == row2["Segment_Desc"]
        ):
            row2["idx"] = i

df2 = df2.sort_values(by="idx").reset_index(drop=True)

# Starting from previous indice, find insertion indice in df1
for i, idx in enumerate(df2["idx"]):
    side_of_idx = df1.loc[idx, "Side"]
    df2.loc[i, "pos"] = df1.index[df1["Side"] == side_of_idx].to_list()[-1] + 1
positions = df2["pos"].astype("int").to_list()

# Clean up df2
df2 = df2.drop(columns=["idx", "pos"])
df2["Side"] = df2["Row_no"] = ""

# Iterate on df1 to insert new rows
for i, pos in enumerate(positions):

    # Fill missing values
    df2.loc[i, "Side"] = df1.loc[pos - 1, "Side"]
    df2.loc[i, "Row_no"] = df1.loc[pos, "Row_no"]

    # Insert row
    df1 = pd.concat(
        [df1.iloc[:pos], pd.DataFrame([df2.iloc[i]]), df1.iloc[pos:]], ignore_index=True
    ).reset_index(drop=True)

    # Increment next position since df1 has changed
    if i < len(positions) - 1:
        positions[i + 1] += 1

因此:

代码语言:javascript
复制
print(df1)
# Outputs
  Subcategory_Desc    Segment_Desc             Flow       Side Row_no
0            APPLE     APPLE LOOSE      Apple Kanzi      Front  Row 1
1            APPLE     APPLE LOOSE       Apple Jazz      Front  Row 1
2           CITRUS   ORANGES LOOSE     Orange Navel      Front  Row 1
3            APPLE     APPLE LOOSE  Apple GoldenDel      Front  Row 1
4             PEAR     PEARS LOOSE            Lemon  Right_End  Row 1
5         AVOCADOS  AVOCADOS LOOSE          Avocado       Back  Row 1
6   TROPICAL FRUIT      KIWI FRUIT        Kiwi Gold       Back  Row 1
7   TROPICAL FRUIT      KIWI FRUIT    5pk Kids Kiwi       Back  Row 1
8         AVOCADOS  AVOCADOS LOOSE     Avocado Tray       Back  Row 1
9   TROPICAL FRUIT      KIWI FRUIT       Kiwi Green   Left_End  Row 1
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70622836

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档