这是我的数据
import pandas as pd
df = pd.DataFrame({'a': ['axy a', 'xyz b'], 'b': ['obj e', 'oaw r']})我有一个字符串列表:
s1 = 'lorem obj e'
s2 = 'lorem obj e lorem axy a'
s3 = 'lorem xyz b lorem oaw r'
s4 = 'lorem lorem oaw r'
s5 = 'lorem lorem axy a lorem obj e'
s_all = [s1, s2, s3, s4, s5]现在,我想取每一行,并检查该行的两列是否都存在于s_all中的任何字符串中。例如,对于第一行,我选择axy_a和obj_e,并检查它们是否都存在于s_all的字符串中。它们都存在于s2和s5中。
我想要的结果看起来是这样的:
a b c
0 axy a obj e lorem obj e lorem axy a
1 axy a obj e lorem lorem axy a lorem obj e
2 xyz b oaw r lorem xyz b lorem oaw r这是我的尝试,但没有奏效:
l = []
for sentence in s_all:
for i in range(len(df)):
if df.a.values[i] in sentence and df.b.values[i] in sentence:
l.append(sentence)
else:
l.append(np.nan)我试图将结果添加到列表中,然后使用该列表创建我想要的c列,但它没有工作。
发布于 2022-08-05 08:12:45
您可以使用apply、explode和concat创建一个新的系列对象,并使用DataFrame
match_series = df.apply(lambda row: [s for s in s_all if row['a'] in s and row['b'] in s], axis=1).explode()
pd.concat([df, match_series], axis=1)输出
a b 0
0 axy a obj e lorem obj e lorem axy a
0 axy a obj e lorem lorem axy a lorem obj e
1 xyz b oaw r lorem xyz b lorem oaw r发布于 2022-08-05 08:03:42
您可以编写一个小助手函数,并将该函数逐行应用于df:
def func(row):
out = []
a, b = row
for s in s_all:
if all([a in s, b in s]):
out.append(s)
return out
# if you have more than 2 columns or don't know how many, here more general approach
# other than that, same function as above
def func(row):
out = []
for s in s_all:
if all([string in s for string in row.tolist()]):
out.append(s)
return out
df['c'] = df.apply(func, axis=1)或者是具有lambda函数的一行:
df['c'] = df.apply(lambda row: [s for s in s_all if all(string in s for elem in row.tolist() for string in elem)], axis=1)函数返回一个带有结果的列表。要使每个列表元素成为自己的行,我们使用explode
df = df.explode(column='c')
print(df)输出:
a b c
0 axy a obj e lorem obj e lorem axy a
0 axy a obj e lorem lorem axy a lorem obj e
1 xyz b oaw r lorem xyz b lorem oaw rhttps://stackoverflow.com/questions/73246166
复制相似问题