文章/答案/技术大牛

发布

社区首页 >问答首页 >查询类似于给定房屋的房屋

问查询类似于给定房屋的房屋
EN

Code Review用户

提问于 2015-07-12 04:28:42

回答 1查看 119关注 0票数 8

作为一个面试编码挑战，我被赋予了这个任务，我想知道这些代码是否结构良好，是否遵循Python准则。我选择根据相似性度量对房屋进行排序，然后返回第一个N。我很想知道是否有更好的方法来处理这件事。

我决定用熊猫来存储和查询数据。我编写了一个小型Model类来封装与熊猫的所有集成，以便在需要时可以通过调用关系数据库来切换它。

__author__ = 'Franklyn'

from pandas import Series, DataFrame
import pandas as pd

class Model(Series):
    """Base Class of Models, Used to represent the structure of data and define behaviours.

    This base class handles tasks that are common to all models such as serialization and saving objects in a collection

    Attributes:
        Objects: Store of model objects created from their respective class.
    """
    def __init__(self, data, fields):
        Series.__init__(self, data, index=fields)

    @classmethod
    def read_serialized_object(cls, path):
        try:
            cls.Objects = pd.read_pickle(path)
        except Exception as e:
            print "Could not read serialized objects: {0}".format(e.message)
        return cls

    @classmethod
    def write_serialized_object(cls, path):
        try:
            cls.Objects.to_pickle(path)
        except Exception as e:
            print "Could not write serialized objects: {0}".format(e.message)

    @classmethod
    def sort_by(cls, sort_callback, limit=10):
        """Sorts the collection using the callback function and return limit number of elements"""
        return cls.Objects.ix[cls.Objects.apply(lambda x: sort_callback(x), axis=1).argsort()[:limit]]

    def save(self):
        if self.__class__.Objects is None:
            self.__class__.Objects = DataFrame(columns=self.index)
        self.__class__.Objects = self.__class__.Objects.append(self, ignore_index=True)

我觉得在这里使用__class__来分配和初始化一个静态成员有点肮脏，我很想得到关于如何处理将要继承的静态成员初始化的反馈。

然后，House类继承了这个模型：

__author__ = 'Franklyn'

from pandas import DataFrame
from collections import namedtuple
from Model import Model
import math

class House(Model):
    """House model defines the attributes and behaviours of a House
    """
    DWELLING_TYPES = {'single-family', 'townhouse', 'apartment', 'patio', 'loft'}

    POOL_TYPES = {'private', 'community', 'none'}

    # Weighting coefficient for the dwelling type similarity
    DWELLING_COEFFICIENT = 100

    Listing = namedtuple('Listing',
                    ['num_bedrooms', 'num_bathrooms', 'living_area', 'lat', 'lon',
                     'exterior_stories', 'pool', 'dwelling_type',
                     'list_date', 'list_price', 'close_date', 'close_price'])

    def __default_similarity_callback(self, house_to):
        """Default similarity metric used by the class if no similarity callback is provided.
        Computes similarity between house1 and house2, similarity is based on the distance between them and a weighted
        cost of the similarity of dwelling type.
        :param house_to:  Series object of second house to compare to
        :return: similarity error between the two houses, lower numbers are more similar
        """
        similarity = self.distance(house_to)
        similarity -= House.DWELLING_COEFFICIENT*int(self.dwelling_type == house_to.dwelling_type)
        return similarity

    def __init__(self, listing):
        Model.__init__(self, listing, House.Listing._fields)

    def get_similar(self, num_listings, similarity_callback=None):
        """Returns the n most smilar houses to this house.
        :param num_listings:   Number of houses to return
        :param similarity_callback: A function that compares the similarity between two houses, must take in two parameters
        and return a number where smaller values are more similar.
        :return:    DataFrame of similar houses.
        """
        if similarity_callback is None:
            similarity_callback = self.__default_similarity_callback
        return House.sort_by(similarity_callback, num_listings)

    def distance(self, to_house):
        """Computes the distance from this house to another house using the equirectangular approximation.
        reference: http://www.movable-type.co.uk/scripts/latlong.html
        :param to_house: The house to computer distance to
        :return: distance in kilometers
        """
        lat1 = self['lat']
        lon1 = self['lon']
        lat2 = to_house['lat']
        lon2 = to_house['lon']

        earth_radius = 6371
        x_coordinate = (lon2 - lon1) * math.cos(0.5 * (lat2 + lat1))
        y_coordinate = (lat2 - lat1)
        distance_km = earth_radius * math.sqrt(x_coordinate*x_coordinate + y_coordinate*y_coordinate)

        return distance_km

我对House类非常满意，但我很想知道这里是否存在明显的错误。

下面是类的用法：

try:
    House.read_serialized_object("../static/data/house_listings")
except Exception as e:
    print "No test data available, Generating dataset: {0}".format(e.message)
    for k in range(0, NUM_LISTINGS):
        house = generate_random_house()
        house.save()
    House.write_serialized_object("../static/data/house_listings")

house = generate_random_house(dwelling_type='single-family')
print house.get_similar(10)

python-2.x

geospatial

pandas

python

回答 1

Code Review用户

回答已采纳

发布于 2015-07-13 20:49:19

想知道代码是否结构良好，是否遵循python准则。

我能帮上忙。然而，我不能与其他关切。我还没有做足够的静态类编程来做一个完善的回顾。

PEP8

行应以79个字符为限。这方面的例外是72的注释和docstring。
似乎有些人真的不喜欢使用__这个名字。方法名称和实例变量注意:关于__names的使用有一些争议(见下文)。继承设计如果您的类打算被子类化，并且您不希望子类使用这些属性，请考虑用双前导下划线和无尾下划线命名它们。..。注3:并不是每个人都喜欢名字乱七八糟的。尽量平衡避免意外名称冲突与高级调用者可能使用的需要。
操作符应该在它们的两边都有一个空间：2 + 2。这方面的例外是显示优先级：2*2 + 2。( House.DWELLING_COEFFICIENT*int(self.dwelling_type == house_to.dwelling_type)
非类名应该是snake_case，而不是CamelCase。这避免了混淆它是否是一个类(例如，Objects是一个类)。清单=namedtuple(.)
Python有非常严格的压痕规则，即使是垂直对齐:与开始分隔符对齐。Listing的定义如下:清单=namedtuple(‘like’，#.。“_日期表_价格，关闭_日期”，“结束_价格”)
除非空行告诉您，否则不使用PEP8是正常的。在某些情况下，House中有太多的常量声明，而类周围的声明太少。
- 用两行空白行包围顶层函数和类定义。
- 在函数中使用空行，以指示逻辑节。

PEP257

您的单行docstring很好，但需要全部放在一行上。结束引号与开头引号位于同一行。这看起来更适合一条线。
大多数多行文档字符串没有摘要行，后面是空行。
:param house_to:不是PEP257用来记录参数的方式。位置参数: house_to --比较第二宫的系列对象

总的来说，除了字符限制之外，您的代码是很好的。您的文档字符串需要工作，您需要更多的文档字符串。但它非常好。

非样式

使用super 更好比调用类‘__init__：

super(Model, self).__init__(data, index=fields)

我觉得在这里使用__class__来分配和初始化一个静态成员有点肮脏，我很想得到关于如何处理将要继承的静态成员初始化的反馈。

我在这里没有什么经验，所以我只是在猜测。但是Objects似乎是一个变量，而不是一个类。并且您希望将其预定义为DataFrame(columns=self.index)，但只对特定的类进行定义。

对于不想引用self的简单情况，这所以答案应该会有所帮助。

但是，由于这更复杂，我们可以查看文档以了解__class__是什么。快速阅读之后，您似乎希望使用type(self)而不是self.__class__。

type(x)通常与x.__class__相同(尽管不能保证这一点--允许一个新样式的类实例覆盖x.__class__返回的值)。

def save(self):
    if type(self).Objects is None:
        type(self).Objects = DataFrame(columns=self.index)
    type(self).Objects = type(self).Objects.append(self, ignore_index=True)

票数 2

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/96642

复制

相似问题

问查询类似于给定房屋的房屋
EN

回答 1

Code Review用户

PEP8

PEP257

非样式

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问查询类似于给定房屋的房屋EN

回答 1

Code Review用户

PEP8

PEP257

非样式

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问查询类似于给定房屋的房屋
EN