我有这样的短信清单:
inp = """Something at the beginning
References
1. Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia.
2. Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions.
3. Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.
Other References
1. Tarelli, E. (2003), “How to transfer responsibilities from expatriates to local nationals”.
2. Riusala, K. and Suutari, V. (2004), “International knowledge transfers through expatriates”.
3. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something at the end
12. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something else at the end"""“其他参考”部分出现在一些文本中,在另一些文本中,下一部分以“好的参考资料”开始。同样,文本中的任何地方都可能出现类似的字符串。所有引用字符串有时用'\n‘分隔,有时仅用空格分隔。此外,'\n‘可能发生在文本中的任何地方,就在引用字符串的中间。
我需要regex在re.findall中使用,并在如下字符串列表中的“引用”之后返回所有字符串:
['Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia.', 'Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions.', 'Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.']但只在“引用”之后,而不是在前面或后面的任何地方。
有人建议我使用这个正则表达式:
refs = re.findall(r'^References\s+((?:\d+\.\s*.*?\n)+)', inp, flags=re.M|re.S)
data = ''.join(refs)
output = re.findall(r'\d+\.\s*(.*?)\n', data)
print(output)但是,只有当引用字符串被'\n‘分隔时,它才能工作--这在某些文本中不是这样。文本中的任何地方都可能出现“\n”。我根本不需要这些“\n”,这样它们就可以从文本中删除了。
当建议的regex不起作用时,示例:
inp = """Something at the beginning
References 1. Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia. Additional Fields. 2. Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions. 3. Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field. Other References 1. Tarelli, E. (2003), “How to transfer responsibilities from expatriates to local nationals”.
2. Riusala, K. and Suutari, V. (2004), “International knowledge transfers through expatriates”.
3. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something at the end
12. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something else at the end"""有谁能给我建议一个代码来帮助我获得参考的列表吗?
发布于 2022-09-14 09:28:01
我认为这个代码可以解决你的问题
refs = re.findall(r'(?<=References\s)+((?:\d+\.\s*.*?\n)+)[^.\s]*', inp, flags=re.M|re.S)
data = ''.join(refs)
output = re.findall(r'\d+\.\s*(.*?)\n', data)
print(output)https://stackoverflow.com/questions/73714319
复制相似问题