我对编程很陌生。我用Python编写了以下基于控制台的应用程序。这是一款英语词典应用程序。如果用户输入他/她想要搜索的单词,那么这个应用程序应该能够显示搜索单词的定义、示例(用于句子)、同义词、词源、语音和词类。它有点像PyDictionary,它是一个包;另一方面,它是一个功能齐全的(到目前为止.)应用程序和它使用不同的在线资源。这个应用程序依赖于BeautifulSoup和请求包。我希望有人能指点我,使它更好,更短,如果在代码中有任何错误。
import requests as r
from bs4 import BeautifulSoup as BS
def server_one(url): # Oxford online dictionary
""" Server_one scrapping algorithm to get essential information regarding the searched word """
soup = get_soup(url)
try:
definition = soup.select_one('.ind').get_text(strip=True) # Scraps first definition
except AttributeError:
definition = 'NOT FOUNT'
try:
example = soup.select_one('.exg').get_text(strip=True) # Scraps first example
except AttributeError:
example = 'NOT FOUNT'
try:
parts_of_speech = soup.select_one('.pos').get_text(strip=True).capitalize() # Gets the POS
except AttributeError:
parts_of_speech = 'NOT FOUNT'
try:
synonyms = soup.select_one('.exs').get_text(strip=True).split(', ') # Collects all the synonyms
except AttributeError:
synonyms = 'NOT FOUNT'
if 'NOT FOUNT' in synonyms:
synonyms_tobe_sent = ''
else:
synonyms_tobe_sent = [cap.capitalize() for cap in synonyms] # Capitalizes all the elements in the list
try:
origins = []
for ori in soup.select('.senseInnerWrapper'): # Scraps all the origins in a list
if len(ori.text) < 400:
origins.append(ori.text)
except AttributeError:
origins = 'NOT FOUNT'
try:
phonetics = soup.select_one('.phoneticspelling').get_text(strip=True) # Gets the phonetics
except AttributeError:
phonetics = 'NOT FOUNT'
try:
next_definitions = []
for tag in soup.select('.ind'): # Gathers all possible definitions of the searched word
next_definitions.append(tag.text)
try:
next_definitions.pop(0) # Removes the first definition since it's already used before
except IndexError:
pass
except AttributeError:
next_definitions = 'NOT FOUNT'
if example != 'NOT FOUNT':
example_tobe_sent = example[1:-1].capitalize() + '.' # Removes colons(') from the string and adds a (.)
else:
example_tobe_sent = 'NOT FOUNT'
return definition, example_tobe_sent, next_definitions, parts_of_speech, synonyms_tobe_sent, origins, phonetics
def search_word():
""" This function returns the searched word in lower format """
word = input('Word that you wish to search\n>>> ').lower()
return word
def url_server_one():
""" A function to get correct URL for the server_one """
w = search_word()
new_w = w.replace(' ', '_') # If a word has space then it gets _ instead, e.g. 'look up' turns into 'look_up'
if ' ' in w:
url = 'https://en.oxforddictionaries.com/definition/%s' % new_w # URL with underscore(_)
else:
url = 'https://en.oxforddictionaries.com/definition/%s' % w # URL without underscore
data = (url, w) # To make two variables as a tuple
return data
def get_soup(url):
""" This returns the bs4 soup object that will be used for scrapping """
source_code = r.get(url)
plain_text = source_code.text
soup_data = BS(plain_text, 'html.parser')
return soup_data
def display_dict():
""" A function to display all the collected information in a desired format, currently supports only server_one """
url, title = url_server_one()
def_, exam, next_defs, part_sp, synos, ori, phonet = server_one(url)
print('\nDefinition of %s:\n%s\n' % (title.capitalize(), def_))
print('Parts of Speech:\n%s\n' % part_sp)
print('Phonetics of %s:\n%s\n' % (title.capitalize(), phonet))
print('Example of %s as follows:\n%s\n' % (title.capitalize(), exam))
print('Synonyms:')
if synos != '' or synos != []:
print(', '.join(synos))
print()
if len(synos) < 1:
print('NOT FOUND\n')
print('Origins:')
if ori != '' or ori != []:
for i in ori:
if i != '' or i != 'NOT FOUNT':
print(i, sep=' ')
print()
if len(ori) < 1:
print('NOT FOUND\n')
print('Some other definition(s) of %s:' % title.capitalize())
if next_defs != '' or next_defs != []:
for i in next_defs:
if i != '' or i != 'NOT FOUNT':
print('* ' + i.capitalize() + '\n')
if len(next_defs) < 1:
print('NOT FOUND\n')
def try_():
""" Part of repeat() function that returns integer value """
while True:
try:
x = int(input('Press 1 to search again\n'
'Press 2 to quit'))
if x in {1, 2}:
return x
print('Either press 1 or 2')
except ValueError:
print('Enter an integer')
def repeat():
""" A function to repeat the process of searching over again and quit from the app """
_try = try_()
if _try == 1:
display_dict()
else:
quit()
def main():
print('Welcome to PyEngDict V1.0 by AJ\n')
display_dict() # Initially launches the app's search option
while True:
repeat()
if __name__ == '__main__':
main()
发布于 2017-09-18 00:17:14
下面是我要做的一些事情:
i (注意,它不是用作索引变量,而是作为下一个来源或定义)或ori,part_sp不够清楚--理解它们用于什么需要时间。当变量影响可读性时,缩短变量并不是正确的动机。if ori != '' or ori != []:可以用if ori:代替,if next_defs != '' or next_defs != []:可以用if next_defs:代替w和new_w在url_server_one()函数中--对于每个搜索字符串: def url_server_one():“”一个函数来获得server_one的正确URL,您可以用一个undescore替换一个空格。“word = search_word().replace(‘','_') url = 'https://en.oxforddictionaries.com/definition/’+单词返回(url,word)AttributeError是没有意义的--它不会被.select()或.text抛出,因为如果没有找到匹配的标记,.select()将返回一个空列表,以及一个所有都具有text属性的Tag实例列表。换句话说,AttributeError不会被抛出:soup.select(‘.senseInnerWrapper’)中ori的: try: ori.text= [] (‘.senseInnerWrapper’):#在len(ori.text) < 400: origins.append(ori.text)列表中删除所有的起源,除了AttributeError: origins.append= 'NOT‘(收集next_definitions的块也是如此)。origins列表:源= ori.text for ori in soup.select('.senseInnerWrapper') if len(ori.text) < 400print调用。还有这些try和except AttributeError:重复块。有多种方法可以解决这一问题。一种是切换到基于字典的方法,首先定义字段名和CSS选择器之间的映射,如下所示:
selectors = {
'definition': '.ind',
'example': '.exg',
'parts_of_speech': '.pos',
'phoneticspelling': '.phoneticspelling',
# ...
}
field_values = {}
for field, selector in selectors.items():
tag = soup.select_one(selector)
field_values[field] = tag.get_text() if tag is not None else 'NOT FOUND'https://codereview.stackexchange.com/questions/175883
复制相似问题