假设“标记”列为商店,如下所示;如何将“标记”列拆分为多列或设置为一个列表?
期望“合并为列表和筛选出的复制”
"Tags"
['Saudi', 'law', 'Saudi Arabia', 'rules']
['Hindi', 'Tamil', 'imposition', 'cbse', 'neet', 'Tamil Nadu', 'India']
['Stephen', 'Hawkins', 'Tamil', 'predictions', 'future', 'science', 'scientist', 'top 5', 'five']
['Bigg Boss', 'Tamil', 'Kamal', 'big', 'boss']
['Mary', 'real', 'story', 'Tamil', 'history']
['football', 'Tamil', 'FIFA', '2018', 'world cup', 'MG', 'top', '10', 'ten']
['India', 'Tamil', 'poor', 'rich', 'money', 'MG', 'why', 'Indians']发布于 2021-12-22 08:54:29
尝试:
df["Tags"].explode().unique()或者:
np.unique(df["Tags"].sum())编辑:
也许你需要:
import ast
df["Tags"].apply(ast.literal_eval).explode().unique()发布于 2021-12-22 08:54:55
如果需要列出没有重复项的列表,那么如果性能很重要,可以使用set来理解集合:
L = list(set(y for x in df['Tags'] for y in x))如果可能的话,可以像字符串一样保存list:
import ast
L = list(set(y for x in df['Tags'].dropna() for y in ast.literal_eval(x)))print (L)
['FIFA', 'Mary', 'world cup', 'rich', 'story', 'Tamil', 'rules', 'neet', 'money', 'Kamal', 'Hindi', 'big', 'cbse', 'imposition', 'football', 'MG', 'history', 'predictions', 'why', 'Tamil Nadu', 'top 5', 'ten', '10', 'Bigg Boss', 'India', 'Stephen', 'top', 'poor', 'law', 'Saudi', 'real', 'Indians', 'future', 'boss', 'five', '2018', 'scientist', 'Saudi Arabia', 'science', 'Hawkins']https://stackoverflow.com/questions/70446378
复制相似问题