首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python -在python中计算不同数据类型的距离

Python -在python中计算不同数据类型的距离
EN

Stack Overflow用户
提问于 2014-04-05 09:45:29
回答 2查看 1.4K关注 0票数 0

我有一个包含11个属性的数据。我想计算每个属性的距离。例如,它的属性(x1, x2, ..., x11)和for x1 & x2具有Nomal型,x3, x4, ... x10具有ordinal类型,那么x11具有binary类型。如何使用python读取属性?如何在python中区分这些属性,以及如何在python中区分这些属性,以便我可以计算距离?有人能告诉我我该怎么做吗?谢谢

样本数据: x1 (林业,人工林,其他,林业) x2 (人工林,人工林,灌木,森林) x3 (高,高,中,低) x4 (低,中,高,高) x5 (高,低,中,高) x6 (中,低,高,中) x7 (3,1,0,4) x8 (低,低,高,中) x9 (297,298,299,297) x10 (1,2,0,4) x11 (t,f)

EN

回答 2

Stack Overflow用户

发布于 2014-04-05 10:34:16

你可以这样做:

代码语言:javascript
复制
def distance(x,y):
    p = len(x)
    m = sum(map(lambda (a,b): 1 if a == b else 0, zip(x,y)))
    return float(p-m)/p

示例:

代码语言:javascript
复制
x1 = ("forestry", "plantation", "high", "low", "high", "medium", 3, "low", 297, 1, True)
x2 = ("plantation", "plantation", "high", "medium", "low", "low", 1, "low", 298, 2, True)

print distance(x1,x2) # result: 0.636363636364 = (11-4)/7
票数 0
EN

Stack Overflow用户

发布于 2014-04-06 03:21:50

我将其重写如下:

首先,我创建一个Nominal类型工厂:

代码语言:javascript
复制
class BaseNominalType:
    name_values = {}   # <= subclass must override this

    def __init__(self, name):
        self.name = name
        self.value = self.name_values[name]

    def __str__(self):
        return self.name

    def __sub__(self, other):
        assert type(self) == type(other), "Incompatible types, subtraction is undefined"
        return self.value - other.value

# class factory function
def make_nominal_type(name_values):
    try:
        nv = dict(name_values)
    except ValueError:
        nv = {item:i for i,item in enumerate(name_values)}

    # make custom type
    class MyNominalType(BaseNominalType):
        name_values = nv
    return MyNominalType

现在我可以定义你的名义类型了,

代码语言:javascript
复制
Forest = make_nominal_type(["shrubs", "plantation", "forestry", "other"])
Level  = make_nominal_type(["low", "medium", "high"])
Bool   = make_nominal_type({"f":False, "t":True})

然后我创建了一个MixedVector类型工厂:

代码语言:javascript
复制
# base class
class BaseMixedVectorType:
    types = []          # <= subclass must
    distance_fn = None  # <=   override these

    def __init__(self, values):
        self.values = [type_(value) for type_,value in zip(self.types, values)]

    def dist(self, other):
        return self.distance_fn([abs(s - o) for s,o in zip(self.values, other.values)])

# class factory function
def make_mixed_vector_type(types, distance_fn):
    tl = list(types)
    df = distance_fn

    class MyVectorType(BaseMixedVectorType):
        types = tl
        distance_fn = df
    return MyVectorType

然后创建您的数据类型,

代码语言:javascript
复制
# your mixed-vector type
DataItem = make_mixed_vector_type(
    [Forest, Forest, Level, Level, Level, Level, int, Level, int, int, Bool],
    ??? # have to define an appropriate distance function!
)

..。但是等等,我们还没有定义一个距离函数!我编写了这个类,允许您插入任何您喜欢的距离函数,格式如下:

代码语言:javascript
复制
def manhattan_dist(_, vector):
    return sum(vector)

def euclidean_dist(_, vector):
    return sum(v*v for v in vector) ** 0.5

# the distance function per your description:
def fractional_match_distance(_, vector):
    return float(sum(not v for v in vector)) / len(vector)

所以我们完成了创建

代码语言:javascript
复制
# your mixed-vector type
DataItem = make_mixed_vector_type(
    [Forest, Forest, Level, Level, Level, Level, int, Level, int, int, Bool],
    fractional_match_distance
)

并将其测试为

代码语言:javascript
复制
def main():
    raw_data = [
        ('forestry', 'plantation', 'high', 'low', 'high', 'medium', 3, 'low', 297, 1, 't'),
        ('plantation', 'plantation', 'high', 'medium', 'low', 'low', 1, 'low', 298, 2, 't'),
        ('other', 'shrubs', 'medium', 'high', 'medium', 'high', 0, 'high', 299, 0, 't'),
        ('forestry', 'forestry', 'low', 'high', 'high', 'medium', 4, 'medium', 297, 4, 'f')
    ]

    a, b, c, d = [DataItem(d) for d in raw_data]

    print("a to b, dist = {}".format(a.dist(b)))
    print("b to c, dist = {}".format(b.dist(c)))
    print("c to d, dist = {}".format(c.dist(d)))

if __name__=="__main__":
    main()

这给了我们

代码语言:javascript
复制
a to b, dist = 0.363636363636
b to c, dist = 0.0909090909091
c to d, dist = 0.0909090909091
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/22875632

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档