我所处的情况是,我正在使用huggingface transformers,并对此有了一些见解。我正在使用facebook/bart-large-cnn模型为我的项目执行文本摘要,到目前为止,我正在使用以下代码进行一些测试:
text = """
Justin Timberlake and Jessica Biel, welcome to parenthood.
The celebrity couple announced the arrival of their son, Silas Randall Timberlake, in
statements to People."""
from transformers import pipeline
smr_bart = pipeline(task="summarization", model="facebook/bart-large-cnn")
smbart = smr_bart(text, max_length=150)
print(smbart[0]['summary_text'])这一小段代码实际上给了我一个很好的文本摘要。但我的问题是,如何在我的数据框列顶部应用相同的预训练模型。我的数据帧如下所示:
ID Lang Text
1 EN some long text here...
2 EN some long text here...
3 EN some long text here...……对于50K行,依此类推
现在,我想将预先训练好的模型应用于列文本,从它生成一个新的列df‘’summary‘,生成的dataframe应该如下所示:
ID Lang Text Summary
1 EN some long text here... Text summary goes here...
2 EN some long text here... Text summary goes here...
3 EN some long text here... Text summary goes here...我如何才能做到这一点?任何帮助都将不胜感激。
发布于 2021-02-26 04:23:13
您可以始终使用dataframe apply函数:
df = pd.DataFrame([('EN',text)]*10, columns=['Lang','Text'])
df['summary'] = df.apply(lambda x: smr_bart(x['Text'], max_length=150)[0]['summary_text'] , axis=1)
df.head(3)输出:
Lang Text summary
0 EN \nJustin Timberlake and Jessica Biel, welcome ... The celebrity couple announced the arrival of ...
1 EN \nJustin Timberlake and Jessica Biel, welcome ... The celebrity couple announced the arrival of ...
2 EN \nJustin Timberlake and Jessica Biel, welcome ... The celebrity couple announced the arrival of ...这有点低效,因为每行都会调用管道(执行时间为2分16秒)。因此,我建议将Text列转换为列表,并将其直接传递给流水线(执行时间为41秒):
df = pd.DataFrame([('EN',text)]*10, columns=['Lang','Text'])
df['summary'] = [x['summary_text'] for x in smr_bart(df['Text'].tolist(), max_length=150)]
df.head(3)输出:
Lang Text summary
0 EN \nJustin Timberlake and Jessica Biel, welcome ... The celebrity couple announced the arrival of ...
1 EN \nJustin Timberlake and Jessica Biel, welcome ... The celebrity couple announced the arrival of ...
2 EN \nJustin Timberlake and Jessica Biel, welcome ... The celebrity couple announced the arrival of ...https://stackoverflow.com/questions/66372741
复制相似问题