如何添加目标列中存在的字符串计数。
data = [{'target': ['Aging','Brain', 'Neurons', 'Genetics']},
{'target': ['Dementia', 'Genetics']},
{'target': ['Brain','Dementia', 'Genetics']}]
df = pd.DataFrame(data)数据帧
target
0 [Aging, Brain, Neurons, Genetics]
1 [Dementia, Genetics]
2 [Brain, Dementia, Genetics]唯一标签
target = []
for sublist in df['target'].values:
tmp_list = [x.strip() for x in sublist]
target.extend(tmp_list)
target = list(set(target))
# ['Brain', 'Neurons', 'Aging', 'Genetics', 'Dementia']预期输出如下所示

发布于 2019-07-12 15:04:11
如果需要指示符列(仅限0或1):
使用MultiLabelBinarizer
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df1 = pd.DataFrame(mlb.fit_transform(df['target']),columns=mlb.classes_)
print (df1)
Aging Brain Dementia Genetics Neurons
0 1 1 0 1 1
1 0 0 1 1 0
2 0 1 1 1 0或者使用Series.str.get_dummies的Series.str.join --但它更慢:
df1 = df['target'].str.join('|').str.get_dummies()如果需要对列表中的值进行计数:
data = [{'target': ['Neurons','Brain', 'Neurons', 'Neurons']},
{'target': ['Dementia', 'Genetics']},
{'target': ['Brain','Brain', 'Genetics']}]
df = pd.DataFrame(data)
from collections import Counter
df = pd.DataFrame([Counter(x) for x in df['target']]).fillna(0).astype(int)
print (df)
Brain Dementia Genetics Neurons
0 1 0 0 3
1 0 1 1 0
2 2 0 1 0发布于 2019-07-12 19:26:25
也许这会有帮助
# Instead of creation of target list ,
# Convert list of str to one single str
list_to_str = [" ".join(tags['target']) for tags in data]
##
#['Aging Brain Neurons Genetics',
# 'Dementia Genetics',
# 'Brain Dementia Genetics',
# 'Neurons Brain Neurons Neurons'
# ]
# Using CountVector
from sklearn.feature_extraction.text import CountVectorizer
text_data = np.array(list_to_str)
# Create the bag of words feature matrix
count = CountVectorizer()
bag_of_words = count.fit_transform(text_data) # needs to coverted to array
# Get feature names
feature_names = count.get_feature_names()
# Create df
df1 = pd.DataFrame(bag_of_words.toarray(), columns=feature_names)
print(df1)
## Output
aging brain dementia genetics neurons
0 1 1 0 1 1
1 0 0 1 1 0
2 0 1 1 1 0
3 0 1 0 0 3https://stackoverflow.com/questions/57001725
复制相似问题