作为一个面试编码挑战,我被赋予了这个任务,我想知道这些代码是否结构良好,是否遵循Python准则。我选择根据相似性度量对房屋进行排序,然后返回第一个N。我很想知道是否有更好的方法来处理这件事。
我决定用熊猫来存储和查询数据。我编写了一个小型Model类来封装与熊猫的所有集成,以便在需要时可以通过调用关系数据库来切换它。
__author__ = 'Franklyn'
from pandas import Series, DataFrame
import pandas as pd
class Model(Series):
"""Base Class of Models, Used to represent the structure of data and define behaviours.
This base class handles tasks that are common to all models such as serialization and saving objects in a collection
Attributes:
Objects: Store of model objects created from their respective class.
"""
def __init__(self, data, fields):
Series.__init__(self, data, index=fields)
@classmethod
def read_serialized_object(cls, path):
try:
cls.Objects = pd.read_pickle(path)
except Exception as e:
print "Could not read serialized objects: {0}".format(e.message)
return cls
@classmethod
def write_serialized_object(cls, path):
try:
cls.Objects.to_pickle(path)
except Exception as e:
print "Could not write serialized objects: {0}".format(e.message)
@classmethod
def sort_by(cls, sort_callback, limit=10):
"""Sorts the collection using the callback function and return limit number of elements"""
return cls.Objects.ix[cls.Objects.apply(lambda x: sort_callback(x), axis=1).argsort()[:limit]]
def save(self):
if self.__class__.Objects is None:
self.__class__.Objects = DataFrame(columns=self.index)
self.__class__.Objects = self.__class__.Objects.append(self, ignore_index=True)我觉得在这里使用__class__来分配和初始化一个静态成员有点肮脏,我很想得到关于如何处理将要继承的静态成员初始化的反馈。
然后,House类继承了这个模型:
__author__ = 'Franklyn'
from pandas import DataFrame
from collections import namedtuple
from Model import Model
import math
class House(Model):
"""House model defines the attributes and behaviours of a House
"""
DWELLING_TYPES = {'single-family', 'townhouse', 'apartment', 'patio', 'loft'}
POOL_TYPES = {'private', 'community', 'none'}
# Weighting coefficient for the dwelling type similarity
DWELLING_COEFFICIENT = 100
Listing = namedtuple('Listing',
['num_bedrooms', 'num_bathrooms', 'living_area', 'lat', 'lon',
'exterior_stories', 'pool', 'dwelling_type',
'list_date', 'list_price', 'close_date', 'close_price'])
def __default_similarity_callback(self, house_to):
"""Default similarity metric used by the class if no similarity callback is provided.
Computes similarity between house1 and house2, similarity is based on the distance between them and a weighted
cost of the similarity of dwelling type.
:param house_to: Series object of second house to compare to
:return: similarity error between the two houses, lower numbers are more similar
"""
similarity = self.distance(house_to)
similarity -= House.DWELLING_COEFFICIENT*int(self.dwelling_type == house_to.dwelling_type)
return similarity
def __init__(self, listing):
Model.__init__(self, listing, House.Listing._fields)
def get_similar(self, num_listings, similarity_callback=None):
"""Returns the n most smilar houses to this house.
:param num_listings: Number of houses to return
:param similarity_callback: A function that compares the similarity between two houses, must take in two parameters
and return a number where smaller values are more similar.
:return: DataFrame of similar houses.
"""
if similarity_callback is None:
similarity_callback = self.__default_similarity_callback
return House.sort_by(similarity_callback, num_listings)
def distance(self, to_house):
"""Computes the distance from this house to another house using the equirectangular approximation.
reference: http://www.movable-type.co.uk/scripts/latlong.html
:param to_house: The house to computer distance to
:return: distance in kilometers
"""
lat1 = self['lat']
lon1 = self['lon']
lat2 = to_house['lat']
lon2 = to_house['lon']
earth_radius = 6371
x_coordinate = (lon2 - lon1) * math.cos(0.5 * (lat2 + lat1))
y_coordinate = (lat2 - lat1)
distance_km = earth_radius * math.sqrt(x_coordinate*x_coordinate + y_coordinate*y_coordinate)
return distance_km我对House类非常满意,但我很想知道这里是否存在明显的错误。
下面是类的用法:
try:
House.read_serialized_object("../static/data/house_listings")
except Exception as e:
print "No test data available, Generating dataset: {0}".format(e.message)
for k in range(0, NUM_LISTINGS):
house = generate_random_house()
house.save()
House.write_serialized_object("../static/data/house_listings")
house = generate_random_house(dwelling_type='single-family')
print house.get_similar(10)发布于 2015-07-13 20:49:19
想知道代码是否结构良好,是否遵循python准则。
我能帮上忙。然而,我不能与其他关切。我还没有做足够的静态类编程来做一个完善的回顾。
__这个名字。方法名称和实例变量注意:关于__names的使用有一些争议(见下文)。继承设计如果您的类打算被子类化,并且您不希望子类使用这些属性,请考虑用双前导下划线和无尾下划线命名它们。..。注3:并不是每个人都喜欢名字乱七八糟的。尽量平衡避免意外名称冲突与高级调用者可能使用的需要。2 + 2。这方面的例外是显示优先级:2*2 + 2。( House.DWELLING_COEFFICIENT*int(self.dwelling_type == house_to.dwelling_type)snake_case,而不是CamelCase。这避免了混淆它是否是一个类(例如,Objects是一个类)。清单=namedtuple(.)Listing的定义如下:清单=namedtuple(‘like’,#.。“_日期表_价格,关闭_日期”,“结束_价格”)House中有太多的常量声明,而类周围的声明太少。:param house_to:不是PEP257用来记录参数的方式。位置参数: house_to --比较第二宫的系列对象总的来说,除了字符限制之外,您的代码是很好的。您的文档字符串需要工作,您需要更多的文档字符串。但它非常好。
使用super 更好比调用类‘__init__:
super(Model, self).__init__(data, index=fields)我觉得在这里使用
__class__来分配和初始化一个静态成员有点肮脏,我很想得到关于如何处理将要继承的静态成员初始化的反馈。
我在这里没有什么经验,所以我只是在猜测。但是Objects似乎是一个变量,而不是一个类。并且您希望将其预定义为DataFrame(columns=self.index),但只对特定的类进行定义。
对于不想引用self的简单情况,这所以答案应该会有所帮助。
但是,由于这更复杂,我们可以查看文档以了解__class__是什么。快速阅读之后,您似乎希望使用type(self)而不是self.__class__。
type(x)通常与x.__class__相同(尽管不能保证这一点--允许一个新样式的类实例覆盖x.__class__返回的值)。
def save(self):
if type(self).Objects is None:
type(self).Objects = DataFrame(columns=self.index)
type(self).Objects = type(self).Objects.append(self, ignore_index=True)https://codereview.stackexchange.com/questions/96642
复制相似问题