首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >解析python中的大型字符串

解析python中的大型字符串
EN

Stack Overflow用户
提问于 2015-01-28 11:40:10
回答 2查看 1.2K关注 0票数 1

我正试图解析一个字符串,提取特定的单词。

代码语言:javascript
复制
{{About|the ALGOL-like programming language|the scripting language formerly named Small|Pawn (scripting language)}}

'''SMALL''', Small Machine Algol Like Language, is a [[computer programming|programming]] [[programming language|language]] developed by Dr. [[Nevil Brownlee]] of [[Auckland University]].

==History==
The aim of the language was to enable people to write [[ALGOL]]-like code that ran on a small machine.  It also included the '''string''' type for easier text manipulation.

SMALL was used extensively from about 1980 to 1985 at [[Auckland University]] as a programming teaching aid, and for some internal projects.  Originally written to run on a [[Burroughs Corporation]] B6700 [[Main frame]] in [[Fortran]] IV, subsequently rewritten in SMALL and ported to a DEC [[PDP-10]] Architecture (on the [[Operating System]] [[TOPS-10]]) and IBM S360 Architecture (on the Operating System VM/[[Conversational Monitor System|CMS]]).

About 1985, SMALL had some [[Object-oriented programming|object-oriented]] features added to handle structures (that were missing from the early language), and to formalise file manipulation operations.

==See also==
*[[ALGOL]]
*[[Lua (programming language)]]
*[[Squirrel (programming language)]]

==References==
*[http://www.caida.org/home/seniorstaff/nevil.xml Nevil Brownlee]

[[Category:Algol programming language family]]
[[Category:Systems programming languages]]
[[Category:Procedural programming languages]]
[[Category:Object-oriented programming languages]]
[[Category:Programming languages created in the 1980s]] 

我想从另一节中提取ALGOL、Lua (编程语言)、Squirrel(编程语言)。(确切地说,这些词不加括号或星号。)

我尝试过这些方法

字符串拆分正则表达式。我仍然无处可寻,帮助感激。

我用的代码

代码语言:javascript
复制
import urllib.request,json,re

url = "http://en.wikipedia.org/w/api.php?format=json&action=query&titles=SMALL&prop=revisions&rvprop=content"
response = urllib.request.urlopen(url)
str_response = response.readall().decode('utf-8')
obj = json.loads(str_response)
a=str(obj['query']['pages']['1808130']['revisions'][0]['*'])
print(a)

字符串存储在。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-01-28 12:06:00

如果我正确理解,您需要==See also====References==之间的字符(不包括这些*[] )。我将您的初始字符串命名为my_string

代码语言:javascript
复制
import re

# Sliced_string will only contain the characters between '==See also==' and '==References=='
sliced_string = re.findall(r'==See also==(.*?)==References==', my_string, re.DOTALL)[-1]

# Removes stars and brackets
for unwanted_char in '[]*':
    sliced_string = sliced_string.replace(unwanted_char, '')

# Creates a list of strings (also removes empty strings)
final_list = sliced_string.split('\n')
final_list = [elem for elem in final_list if elem != '']

print(final_list)

编辑:将字符串转换为列表。

假定给定字符串中只有==See also====References==的一次出现,代码就能正确工作。

票数 1
EN

Stack Overflow用户

发布于 2015-01-28 12:32:15

代码语言:javascript
复制
print  re.findall(r"\*\[\[([^\]]*)\]\]",re.findall(r"==See also==((?:\s+\*\[\[(?:[^\]]*)\]\])+)",x)[0])

直接应用这一点,并发送存储在x中的字符串。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/28191367

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档