首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >国家情绪-推特潮流

国家情绪-推特潮流
EN

Code Review用户
提问于 2015-05-21 09:01:42
回答 1查看 527关注 0票数 -1

此作业摘自伯克利的CS61A页面这里

在这个项目中,您将开发一个美国各地twitter数据的地理可视化。您需要使用字典、列表和数据抽象技术来创建一个模块化的程序。上面显示的地图描述了不同州的人们对德克萨斯的感受。python trends.py -m texas。此图像是由下列人员生成的:

  • 收集带有地理位置标记的公共推特帖子(Tweet),并过滤包含“德州”查询词的帖子,
  • 根据每条推文所包含的所有词语,为每条推文分配一种情绪(积极或消极),
  • 用最近的地理中心聚合推特,最后
  • 根据其推文的总体情绪对每个州进行着色。红色代表积极情绪,蓝色代表消极情绪。

下面的作业是这个项目的第三阶段。第一阶段可以找到这里。第二阶段可以找到这里

第三阶段:国家的心情--名字us_states --被绑定到一本包含每个美国州形状的字典中,该词典由两个字母的邮政编码组成。您可以使用本词典的键来遍历所有的美国州。在此阶段,您将编写函数来确定tweet来自的状态,按状态对tweet进行分组,并计算与状态相关的所有tweet中的平均正或负面感觉。问题8(1pt)。实现find_closest_state,它返回最接近推特位置的州的两个字母邮政编码。使用geo_distance函数(在geo.py中提供)计算两个位置之间的最短距离(以英里为单位)。当您完成此问题时,find_closest_state的doctest应该会通过。python3 trends.py -t find_closest_state问题9(1pt)。实现group_tweets_by_state,它获取一个tweet列表并返回一个字典。返回的字典的键是州名(两个字母的邮政编码),值是比任何其他值都更靠近州中心的tweet列表。当您完成此问题时,group_tweets_by_state的doctest应该会通过。python3 trends.py -t group_tweets_by_state问题10 (1pt)。作为练习,实现most_talkative_state,它返回包含给定术语的最多tweet的状态。当您完成此问题时,most_talkative_state的doctest应该会通过。python3 trends.py -t most_talkative_state问题11 (2 pt)。实现average_sentiments。此函数接受group_tweets_by_state返回的字典,并返回字典。返回的字典的键是状态名称(两个字母的邮政编码),值是状态中所有tweet的平均情感值。如果一个州没有包含情绪值的tweet,那么就把它完全排除在返回的字典中。不要使用零情感值包括没有情感的状态。零代表中立的情绪,而不是未知的情绪。情绪未知的国家将出现灰色,而中立情绪的国家将显示为白色。现在,您应该能够绘制出与包含给定术语的tweet相对应的情绪颜色的地图。python3 trends.py -m三明治python3 trends.py -m python3 trends.py -m python3 python3 trends.py -m my life如果你下载了这个小版本的项目,你只能映射这四个术语。如果要映射任何术语,则需要下载此Twitter数据文件并将其放在项目的数据目录中。

第三阶段的解决办法:

代码语言:javascript
复制
from data import word_sentiments, load_tweets
from datetime import datetime
from doctest import run_docstring_examples
from geo import us_states, geo_distance, make_position, longitude, latitude
from maps import draw_state, draw_name, draw_dot, wait, message
from string import ascii_letters
from ucb import main, trace, interact, log_current_line

# Phase 3: The Mood of the Nation

def find_closest_state(tweet, state_centers):
    """Return the name of the state closest to the given tweet's location.

    Use the geo_distance function (already provided) to calculate distance
    in miles between two latitude-longitude positions.

    Arguments:
    tweet -- a tweet abstract data type
    state_centers -- a dictionary from state names to positions.

    >>> us_centers = {n: find_center(s) for n, s in us_states.items()}
    >>> sf = make_tweet("Welcome to San Francisco", None, 38, -122)
    >>> ny = make_tweet("Welcome to New York", None, 41, -74)
    >>> find_closest_state(sf, us_centers)
    'CA'
    >>> find_closest_state(ny, us_centers)
    'NJ'
    """
    best_distance = None
    closest_state = None    
    for state, centre_position_of_state in state_centers.items():
        if best_distance == None:
            best_distance =  geo_distance(centre_position_of_state, tweet_location(tweet))
            closest_state = state           
            continue
        else:
            distance = geo_distance(centre_position_of_state, tweet_location(tweet)) 
            if  distance < best_distance:
                best_distance = distance
                closest_state = state           
    return closest_state

def group_tweets_by_state(tweets):
    """Return a dictionary that aggregates tweets by their nearest state center.

    The keys of the returned dictionary are state names, and the values are
    lists of tweets that appear closer to that state center than any other.

    tweets -- a sequence of tweet abstract data types

    >>> sf = make_tweet("Welcome to San Francisco", None, 38, -122)
    >>> ny = make_tweet("Welcome to New York", None, 41, -74)
    >>> ca_tweets = group_tweets_by_state([sf, ny])['CA']
    >>> tweet_string(ca_tweets[0])
    '"Welcome to San Francisco" @ (38, -122)'
    """
    tweets_by_state = {}
    USA_states_center_position = {n: find_center(s) for n, s in us_states.items()}
    for tweet in tweets:
        state_name_key = find_closest_state(tweet, USA_states_center_position)
        tweets_by_state.setdefault(state_name_key, []).append(tweet)
    return tweets_by_state

def most_talkative_state(term):
    """Return the state that has the largest number of tweets containing term.

    >>> most_talkative_state('texas')
    'TX'
    >>> most_talkative_state('sandwich')
    'NJ'
    """
    tweets = load_tweets(make_tweet, term)  # A list of tweets containing term
    tweets_by_state = group_tweets_by_state(tweets)
    talkative_states = us_states.fromkeys(us_states, 0)
    for state, tweet_list in tweets_by_state.items():
        for tweet in tweet_list:
            for word in tweet_words(tweet):
                if word == term:
                    talkative_states[state] += 1

    best_count = None
    most_talkative_state = None
    for state, count_term in talkative_states.items():
        if best_count == None:
            best_count = count_term
            most_talkative_state = state
            continue
        else:
            if count_term > best_count:
                best_count = count_term
                most_talkative_state = state
    return most_talkative_state

def average_sentiments(tweets_by_state):
    """Calculate the average sentiment of the states by averaging over all
    the tweets from each state. Return the result as a dictionary from state
    names to average sentiment values (numbers).

    If a state has no tweets with sentiment values, leave it out of the
    dictionary entirely.  Do NOT include states with no tweets, or with tweets
    that have no sentiment, as 0.  0 represents neutral sentiment, not unknown
    sentiment.

    tweets_by_state -- A dictionary from state names to lists of tweets
    """
    averaged_state_sentiments = {}
    total_sentiment_for_state = 0
    count_sentiment = 0
    for state, tweet_list in tweets_by_state.items():
        for tweet in tweet_list:
            sentiment = analyze_tweet_sentiment(tweet)
            if not ((sentiment == 0) or (sentiment == None)):
                total_sentiment_for_state += sentiment
                count_sentiment += 1
        if total_sentiment_for_state != 0:
            averaged_state_sentiments[state] = (total_sentiment_for_state / count_sentiment)
        total_sentiment_for_state = 0
        count_sentiment = 0
    return averaged_state_sentiments

根据第三阶段问题中给出的测试指令,对该解决方案进行了相应的测试。

  1. 从性能角度看,上述解决方案中函数的时间复杂度为多项式(嵌套的for循环)。这是由于选择了listdict类型的数据模型吗?
  2. 如何改进命名约定?
EN

回答 1

Code Review用户

发布于 2015-05-22 08:38:18

就像在其他类似的问题中一样,我发现已经存在的代码有点奇怪,但我只对您编写的代码进行评论。

find_closest_state

  • 您不需要多次调用tweet_location(tweet)。在设计得更好的代码中,find_closest_state可能会将位置而不是tweet作为第一个参数。
  • 您不需要那个continue,因为不管怎么说,在这个迭代中不会发生任何其他事情。
  • 您可以在一个地方调用geo_distance
  • 您应该使用isNone进行比较,按PEP8进行比较。
  • 考虑到以前的注释之后,如果geo_distance为None: best_distance = best_distance closest_state = state距离< best_distance: best_distance =距离closest_state = state,则为:best_distance=best_distance(centre_position_of_state,location)

它可以以更简洁的方式编写,代码如下:

代码语言:javascript
复制
def find_closest_state(tweet, state_centers):
    best_distance = None
    closest_state = None
    location = tweet_location(tweet)
    for state, centre_position_of_state in state_centers.items():
        distance = geo_distance(centre_position_of_state, location)
        if best_distance is None or distance < best_distance:
            best_distance = distance
            closest_state = state
    return closest_state

或者,这可以(而且很可能应该)使用min内置编写。

most_talkative_state

上面的许多注释都适用于most_talkative_state,它可以很容易地用max重写。

票数 1
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/91383

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档