首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用Python在给定的lat/long找到所有餐厅

用Python在给定的lat/long找到所有餐厅
EN

Code Review用户
提问于 2015-12-11 20:37:57
回答 2查看 2.2K关注 0票数 4

这是我写的一个程序,用来抓取谷歌、Yelp和Foursquare的餐馆/酒吧。然后,它根据评级、评级数量和使用贝叶斯平均值的数据源数量对它们进行更有效的排序。我猜主要的方法可能会被分解成更多的函数。我也猜我错过了一些方便的列表理解技巧。有什么建议吗?

main.py

代码语言:javascript
复制
import csv
import time

import foursquare
import yelp
import google


def bayesian(R, v, m, C):
    """
    Computes the Bayesian average for the given parameters

    :param R: Average rating for this business
    :param v: Number of ratings for this business
    :param m: Minimum ratings required
    :param C: Mean rating across the entire list
    :returns: Bayesian average
    """

    # Convert to floating point numbers
    R = float(R)
    v = float(v)
    m = float(m)
    C = float(C)

    return ((v / (v + m)) * R + (m / (v + m)) * C)


def remove_duplicate_names(full_list):
    """
    Fixes issue with multiple API calls returning the same businesses

    :param R: The entire unfiltered list
    :returns: Filtered list
    """

    names = set()
    filtered_list = []
    for business in full_list:
        if business.name not in names:
            filtered_list.append(business)
            names.add(business.name)

    return filtered_list


def main():
    """
    Finds all the bars/restaurants in the given area. Use different
    lat/long points to cover entire town since API calls have length limits.
    """

    input_value = ''
    locations = []

    distance = input('Search Radius (meters): ')
    while input_value is not 'n':
        lat = input('Lat: ')
        lng = input('Long: ')
        locations.append((lat, lng))
        input_value = raw_input('Would you like more points? (y/n) ')

    venues, businesses, places = [], [], []

    for lat,lng in locations:

        # Retrieve all businesses for all sources
        print 'Searching lat: {} long: {} ...'.format(lat, lng)
        venues.extend(foursquare.search(lat, lng, distance))
        businesses.extend(yelp.search(lat, lng, distance))
        places.extend(google.search(lat, lng, distance))

        # Rate-limit API calls
        time.sleep(1.0)

    # Remove duplicates from API call overlap
    venues = remove_duplicate_names(venues)
    businesses = remove_duplicate_names(businesses)
    places = remove_duplicate_names(places)

    # Calculate low threshold and average ratings
    fs_low = min(venue.rating_count for venue in venues)
    fs_avg = sum(venue.rating for venue in venues) / len(venues)

    yp_low = min(business.rating_count for business in businesses)
    yp_avg = sum(business.rating for business in businesses) / len(businesses)

    gp_low = min(place.rating_count for place in places)
    gp_avg = sum(place.rating for place in places) / len(places)

    # Add bayesian estimates to business objects
    for v in venues:
        v.bayesian = bayesian(v.rating, v.rating_count, fs_low, fs_avg)
    for b in businesses:
        b.bayesian = bayesian(b.rating * 2, b.rating_count, yp_low, yp_avg * 2)
    for p in places:
        p.bayesian = bayesian(p.rating * 2, p.rating_count, gp_low, gp_avg * 2)

    # Combine all lists into one
    full_list = []
    full_list.extend(venues)
    full_list.extend(businesses)
    full_list.extend(places)
    print 'Found {} total businesses!'.format(len(full_list))
    
    # Combine ratings of duplicates
    seen_addresses = set()
    filtered_list = []
    for business in full_list:
        if business.address not in seen_addresses:
            filtered_list.append(business)
            seen_addresses.add(business.address)
        else: 
            # Find duplicate in list
            for b in filtered_list:
                if b.address == business.address:
                    # Average bayesian ratings and update source count
                    new_rating = (b.bayesian + business.bayesian) / 2.0
                    b.bayesian = new_rating
                    b.source_count = b.source_count + 1
                 
    # Sort by Bayesian rating
    filtered_list.sort(key=lambda x: x.bayesian, reverse=True)

    # Write to .csv file
    with open('data.csv', 'w') as csvfile:

        categories = ['Name', 'Rating', 'Number of Ratings', 'Checkins', 'Sources']
        writer = csv.DictWriter(csvfile, fieldnames=categories)

        writer.writeheader()
        for venue in filtered_list:
            writer.writerow({'Name': venue.name.encode('utf-8'),
                             'Rating': '{0:.2f}'.format(venue.bayesian),
                             'Number of Ratings': venue.rating_count,
                             'Checkins': venue.checkin_count,
                             'Sources': venue.source_count})


if __name__ == '__main__':
    main()
EN

回答 2

Code Review用户

回答已采纳

发布于 2015-12-11 21:38:12

描述名称

函数签名是:

代码语言:javascript
复制
bayesian(R, v, m, C)

但是,您可以在docstring中很长一段时间描述这些单个字母参数:

代码语言:javascript
复制
:param R: Average rating for this business
:param v: Number of ratings for this business
:param m: Minimum ratings required
:param C: Mean rating across the entire list

大多数情况下,描述性代码比描述性注释/文档字符串更可取,原因很简单:有两件事(代码/注释)而不是一件(代码)会使维护工作加倍,如果代码和注释不同步,代码就会变得非常混乱。

内置

代码语言:javascript
复制
names = set()
filtered_list = []
for business in full_list:
    if business.name not in names:
        filtered_list.append(business)
        names.add(business.name)

return filtered_list

变成:

代码语言:javascript
复制
return list(set(business))

据我所知,代码并不关心餐馆的顺序,所以set更改顺序这一事实不应该是个问题。

输入

函数

获取用户输入是一个细节,在main中查看程序的主要结构时,我们并不关心它,所以只需使用函数即可。

代码语言:javascript
复制
while input_value is not 'n':
    lat = input('Lat: ')
    lng = input('Long: ')
    locations.append((lat, lng))
    input_value = raw_input('Would you like more points? (y/n) ')

Python2中的

No input

它自动评估输入,执行用户输入的任何内容都是危险的,并且普遍认为是错误的做法。使用int(raw_input(x))

重载

+在Python中有很多含义,其中之一就是添加列表:

代码语言:javascript
复制
full_list = []
full_list.extend(venues)
full_list.extend(businesses)
full_list.extend(places)

变成:

代码语言:javascript
复制
full_list = venues + businesses + places

明显地增加了清晰度。

票数 6
EN

Code Review用户

发布于 2015-12-11 22:30:24

除了Caridorc的好评论外,我还有几点评论:

  • bayesian()中,您可以转换为浮动,但在此之前,您可能使用int - When为该函数提供参数,您可以进行一些数学运算,这可能是int操作,也可能不是int操作。您可能希望在较早的级别强制执行浮点数。
  • 更改为搜索引擎列表--而不是三次重复您的逻辑,我将改为将结果存储在列表中,并使用提供程序列表来保存地址、搜索方法、提供者名称等。这可以简化您的逻辑,并使扩展到新的提供者更加容易。
  • 没有输入验证- What是纬度和经度的输入格式?我知道至少有三到四种不同的变体。哪些变体被所有这些搜索引擎所接受?
  • - I分成一些更多的函数,比如您调用main()的方式,但是我会将它拆分成更多的函数,这样它就可以读到以下内容: def ():get_location_list() execute_search= execute_search( locations,search_engines) rated_restaurants =calculate_restaurant_rating(餐馆) write_restaurants("data.csv",rated_restaurants) #或相同的丑陋版本.write_restaurants("data.csv",计算餐馆等级( execute_search( get_location_list(),SEARCH_ENGINES ))定义了这个函数,它允许您的脚本作为其逻辑部分中的一个模块使用,并且您可以根据不同的需要来收集和操作数据。你仍然可以把它称为一个脚本来做一个单一的搜索。
票数 1
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/113644

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档