我有这样的短信清单:
Something at the beginning
References
1. Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia.
2. Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions.
3. Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.
Other References
1. Tarelli, E. (2003), “How to transfer responsibilities from expatriates to local nationals”.
2. Riusala, K. and Suutari, V. (2004), “International knowledge transfers through expatriates”.
3. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something at the end
12. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something else at the end“其他参考”部分在一些文本中存在,而在另一些文本中不存在。同样,文本中的任何地方都可能出现类似的字符串。
我需要regex在re.findall中使用,并在这样的字符串列表中,在“引用”之后返回所有字符串。
['Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia.', 'Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions.', 'Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.']但只在“引用”之后,而不是在前面或后面的任何地方。
我已经试过这个准则了
r = 'References\s*(\d+[.].*[.])'但是它只返回第一个字符串出现,我需要所有
有谁能建议一种更好的正则表达式吗?
发布于 2022-09-13 09:51:03
你可以用re.findall,两次。下面的策略是首先将所有引用块匹配为单独的字符串。然后我们将所有这样的字符串连接在一起,然后使用re.findall查找所有引用。
inp = """Something at the beginning
References
1. Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia.
2. Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions.
3. Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.
Other References
1. Tarelli, E. (2003), “How to transfer responsibilities from expatriates to local nationals”.
2. Riusala, K. and Suutari, V. (2004), “International knowledge transfers through expatriates”.
3. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something at the end
12. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something else at the end"""
refs = re.findall(r'^References\n((?:\d+\.\s*.*?\n)+)', inp, flags=re.M)
data = ''.join(refs)
output = re.findall(r'\d+\.\s*(.*?)\n', data)
print(output)这些指纹:
[
'Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia. ',
'Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions. ',
'Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.'
]发布于 2022-09-14 11:04:26
这是对linked question的回答,它是作为副本关闭的;所以我不能在那里回答。这个重复的问题更多地是这个问题的扩展,因为它使输入复杂化。
用作输入示例
text = """Something at the beginning
References 1. Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia. Additional Fields. 2. Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions. 3. Wallace, J. (2001), “The benefits of mentoring for female lawyers”. 315 – 326. DOI: doi.org/10.2224/sbp.2008.36.3.315 4. Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field. Other References 1. Tarelli, E. (2003), “How to transfer responsibilities from expatriates to local nationals”.
2. Riusala, K. and Suutari, V. (2004), “International knowledge transfers through expatriates”.
3. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something at the end
12. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something else at the end"""并根据以下假设开展工作:
“引用”(带有大写"R")一词可区分为“adjust).
一个或多个数字,后面是句号,后面是空格,后面是非数字字符(字母、逗号、-符号等),后面是开头括号(表示年份)。
然后,以下代码就可以工作了:
# split the sections
pattern = r'[A-Z][a-z\s]+References'
sections = re.split(pattern, text, flags=re.M)
# split the individual references, by conditions as mentioned in point 2 above
pattern = r'\s*(\d+\.\s+\D+\()'
# The first section is blank (`''`), so `sections[1]` is
# the first actual reference section, "References"
parts = re.split(, sections[1], flags=re.M)
# the split includes the part before the year, and the year + rest.
# We need to concatenate those items for each reference.
# Also here, the first group is blank, so skip that
refs = [part1 + part2 for part1, part2 in zip(parts[1::2], parts[2::2])]
# Show the result
for ref in refs:
print(ref)产额
1. Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia. Additional Fields.
2. Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions.
3. Wallace, J. (2001), “The benefits of mentoring for female lawyers”. 315 – 326. DOI: doi.org/10.2224/sbp.2008.36.3.315
4. Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.https://stackoverflow.com/questions/73700753
复制相似问题