我有这样的样本数据:
col1 col2 col3
PYTHON RD APT 3 NaN
STACK AVE APT 2-3 APT 2-3 NaN
OVER ST 1/2 UNIT 1/2 UNIT 1/2
FLOW RD NaN NaN我想要创建一个新的领域:
col1 col2 col3 COMBINED
PYTHON RD APT 3 NaN PYTHON RD APT 3
STACK AVE APT 2-3 APT 2-3 NaN STACK AVE APT 2-3
OVER ST 1/2 UNIT 1/2 UNIT 1/2 OVER ST 1/2 UNIT 1/2
FLOW RD NaN NaN FLOW RD我试过:
columns = ["col1", "col2", "col3"]
COMBINED = ''
for col in columns:
df[col] = df[col].fillna("")
COMBINED = COMBINED + df[col].str.strip() + ' '
df['COMBINED'] = COMBINED.str.strip()以上一个可以结合,但与第二次观测重复STACK AVE APT 2-3 APT 2-3。
有什么建议吗?
发布于 2021-04-16 21:47:51
print(
df[["col1", "col2"]]
.fillna("")
.apply(
lambda x: x.loc["col1"]
if x.loc["col2"] in x.loc["col1"]
else x.loc["col1"] + " " + x.loc["col2"],
axis=1,
)
)指纹:
col1 col2 COMBINED
0 PYTHON RD APT 3 PYTHON RD APT 3
1 STACK AVE APT 2-3 APT 2-3 STACK AVE APT 2-3
2 OVER ST 1/2 UNIT 1/2 OVER ST 1/2 UNIT 1/2
3 FLOW RD NaN FLOW RD编辑:对于许多列:
def combine(x):
out = []
for word in x:
if word and not any(word in w for w in out):
out.append(word)
return " ".join(out)
columns = ["col1", "col2", "col3"]
df["COMBINED"] = df[columns].fillna("").apply(combine, axis=1)
print(df)指纹:
col1 col2 col3 COMBINED
0 PYTHON RD APT 3 NaN PYTHON RD APT 3
1 STACK AVE APT 2-3 APT 2-3 NaN STACK AVE APT 2-3
2 OVER ST 1/2 UNIT 1/2 UNIT 1/2 OVER ST 1/2 UNIT 1/2
3 FLOW RD NaN NaN FLOW RD发布于 2021-04-16 22:02:39
不确定这是否涵盖了你所有的原因:
def combine(row):
row = row.fillna("")
result = row["col1"]
for col in ["col2", "col3"]:
if not row[col] in result:
result += " " + row[col]
return result
df["COMBINED"] = df.apply(combine, axis=1)发布于 2021-04-16 22:22:47
让我们尝试独树一帜,然后加入。
df['col4']=df.fillna('').apply(lambda X:",".join(X.unique()).strip('\,$'),axis=1)
col1 col2 col3 col4
0 PYTHON RD APT 3 NaN PYTHON RD,APT 3
1 STACK AVE APT 2-3 APT 2-3 NaN STACK AVE APT 2-3,APT 2-3
2 OVER ST 1/2 UNIT 1/2 UNIT 1/2 OVER ST 1/2,UNIT 1/2
3 FLOW RD NaN NaN FLOW RDhttps://stackoverflow.com/questions/67132476
复制相似问题