此作业摘自伯克利的CS61A页面这里。

在这个项目中,您将开发一个美国各地twitter数据的地理可视化。您需要使用字典、列表和数据抽象技术来创建一个模块化的程序。上面显示的地图描述了不同州的人们对德克萨斯的感受。
python trends.py -m texas。此图像是由下列人员生成的:
下面的作业是这个项目的第三阶段。第一阶段可以找到这里。第二阶段可以找到这里。
第三阶段:国家的心情--名字
us_states--被绑定到一本包含每个美国州形状的字典中,该词典由两个字母的邮政编码组成。您可以使用本词典的键来遍历所有的美国州。在此阶段,您将编写函数来确定tweet来自的状态,按状态对tweet进行分组,并计算与状态相关的所有tweet中的平均正或负面感觉。问题8(1pt)。实现find_closest_state,它返回最接近推特位置的州的两个字母邮政编码。使用geo_distance函数(在geo.py中提供)计算两个位置之间的最短距离(以英里为单位)。当您完成此问题时,find_closest_state的doctest应该会通过。python3 trends.py -t find_closest_state问题9(1pt)。实现group_tweets_by_state,它获取一个tweet列表并返回一个字典。返回的字典的键是州名(两个字母的邮政编码),值是比任何其他值都更靠近州中心的tweet列表。当您完成此问题时,group_tweets_by_state的doctest应该会通过。python3 trends.py -t group_tweets_by_state问题10 (1pt)。作为练习,实现most_talkative_state,它返回包含给定术语的最多tweet的状态。当您完成此问题时,most_talkative_state的doctest应该会通过。python3 trends.py -t most_talkative_state问题11 (2 pt)。实现average_sentiments。此函数接受group_tweets_by_state返回的字典,并返回字典。返回的字典的键是状态名称(两个字母的邮政编码),值是状态中所有tweet的平均情感值。如果一个州没有包含情绪值的tweet,那么就把它完全排除在返回的字典中。不要使用零情感值包括没有情感的状态。零代表中立的情绪,而不是未知的情绪。情绪未知的国家将出现灰色,而中立情绪的国家将显示为白色。现在,您应该能够绘制出与包含给定术语的tweet相对应的情绪颜色的地图。python3 trends.py -m三明治python3 trends.py -m python3 trends.py -m python3 python3 trends.py -m my life如果你下载了这个小版本的项目,你只能映射这四个术语。如果要映射任何术语,则需要下载此Twitter数据文件并将其放在项目的数据目录中。
第三阶段的解决办法:
from data import word_sentiments, load_tweets
from datetime import datetime
from doctest import run_docstring_examples
from geo import us_states, geo_distance, make_position, longitude, latitude
from maps import draw_state, draw_name, draw_dot, wait, message
from string import ascii_letters
from ucb import main, trace, interact, log_current_line
# Phase 3: The Mood of the Nation
def find_closest_state(tweet, state_centers):
"""Return the name of the state closest to the given tweet's location.
Use the geo_distance function (already provided) to calculate distance
in miles between two latitude-longitude positions.
Arguments:
tweet -- a tweet abstract data type
state_centers -- a dictionary from state names to positions.
>>> us_centers = {n: find_center(s) for n, s in us_states.items()}
>>> sf = make_tweet("Welcome to San Francisco", None, 38, -122)
>>> ny = make_tweet("Welcome to New York", None, 41, -74)
>>> find_closest_state(sf, us_centers)
'CA'
>>> find_closest_state(ny, us_centers)
'NJ'
"""
best_distance = None
closest_state = None
for state, centre_position_of_state in state_centers.items():
if best_distance == None:
best_distance = geo_distance(centre_position_of_state, tweet_location(tweet))
closest_state = state
continue
else:
distance = geo_distance(centre_position_of_state, tweet_location(tweet))
if distance < best_distance:
best_distance = distance
closest_state = state
return closest_state
def group_tweets_by_state(tweets):
"""Return a dictionary that aggregates tweets by their nearest state center.
The keys of the returned dictionary are state names, and the values are
lists of tweets that appear closer to that state center than any other.
tweets -- a sequence of tweet abstract data types
>>> sf = make_tweet("Welcome to San Francisco", None, 38, -122)
>>> ny = make_tweet("Welcome to New York", None, 41, -74)
>>> ca_tweets = group_tweets_by_state([sf, ny])['CA']
>>> tweet_string(ca_tweets[0])
'"Welcome to San Francisco" @ (38, -122)'
"""
tweets_by_state = {}
USA_states_center_position = {n: find_center(s) for n, s in us_states.items()}
for tweet in tweets:
state_name_key = find_closest_state(tweet, USA_states_center_position)
tweets_by_state.setdefault(state_name_key, []).append(tweet)
return tweets_by_state
def most_talkative_state(term):
"""Return the state that has the largest number of tweets containing term.
>>> most_talkative_state('texas')
'TX'
>>> most_talkative_state('sandwich')
'NJ'
"""
tweets = load_tweets(make_tweet, term) # A list of tweets containing term
tweets_by_state = group_tweets_by_state(tweets)
talkative_states = us_states.fromkeys(us_states, 0)
for state, tweet_list in tweets_by_state.items():
for tweet in tweet_list:
for word in tweet_words(tweet):
if word == term:
talkative_states[state] += 1
best_count = None
most_talkative_state = None
for state, count_term in talkative_states.items():
if best_count == None:
best_count = count_term
most_talkative_state = state
continue
else:
if count_term > best_count:
best_count = count_term
most_talkative_state = state
return most_talkative_state
def average_sentiments(tweets_by_state):
"""Calculate the average sentiment of the states by averaging over all
the tweets from each state. Return the result as a dictionary from state
names to average sentiment values (numbers).
If a state has no tweets with sentiment values, leave it out of the
dictionary entirely. Do NOT include states with no tweets, or with tweets
that have no sentiment, as 0. 0 represents neutral sentiment, not unknown
sentiment.
tweets_by_state -- A dictionary from state names to lists of tweets
"""
averaged_state_sentiments = {}
total_sentiment_for_state = 0
count_sentiment = 0
for state, tweet_list in tweets_by_state.items():
for tweet in tweet_list:
sentiment = analyze_tweet_sentiment(tweet)
if not ((sentiment == 0) or (sentiment == None)):
total_sentiment_for_state += sentiment
count_sentiment += 1
if total_sentiment_for_state != 0:
averaged_state_sentiments[state] = (total_sentiment_for_state / count_sentiment)
total_sentiment_for_state = 0
count_sentiment = 0
return averaged_state_sentiments根据第三阶段问题中给出的测试指令,对该解决方案进行了相应的测试。
for循环)。这是由于选择了list和dict类型的数据模型吗?发布于 2015-05-22 08:38:18
就像在其他类似的问题中一样,我发现已经存在的代码有点奇怪,但我只对您编写的代码进行评论。
tweet_location(tweet)。在设计得更好的代码中,find_closest_state可能会将位置而不是tweet作为第一个参数。continue,因为不管怎么说,在这个迭代中不会发生任何其他事情。geo_distanceis与None进行比较,按PEP8进行比较。它可以以更简洁的方式编写,代码如下:
def find_closest_state(tweet, state_centers):
best_distance = None
closest_state = None
location = tweet_location(tweet)
for state, centre_position_of_state in state_centers.items():
distance = geo_distance(centre_position_of_state, location)
if best_distance is None or distance < best_distance:
best_distance = distance
closest_state = state
return closest_state或者,这可以(而且很可能应该)使用min内置编写。
上面的许多注释都适用于most_talkative_state,它可以很容易地用max重写。
https://codereview.stackexchange.com/questions/91383
复制相似问题