假设我有两个集合A和B,我知道它们的基数然后有三个不相交的子集A\ B,B和A∩B。它们的基数是
,
,
如果我有三个集A,B和C,那么相应的表达式是
,
,,
但是,对于任意数量的集合A,B,C,……,我找不到一个很好的算法来计算相同的事物。
所以我的问题是,有什么有用的算法来做这个吗?
下面是一些代码,这些代码生成了一组可以用于尝试的联合示例集。
import numpy as np
from string import ascii_uppercase
from itertools import combinations
from collections import defaultdict
def build_example(k):
'''Produces an example collection of unions with and the sought after disjoint subsets as key.'''
# Produce data in the disjoint subsets that will later be used as key
n_disjoint_subsets = 2**k - 1
set_ids = np.arange(n_disjoint_subsets).reshape(-1, 1) + 1
in_set = np.unpackbits(set_ids.astype(np.uint8), axis=1)[:, -k:] == 1
disjoint_counts = np.random.randint(0, 10, n_disjoint_subsets)
set_names = np.array([
set_name for set_name, _ in zip(ascii_uppercase, range(k))
], dtype=object)
key = np.hstack([in_set.astype(int), disjoint_counts.reshape(-1, 1)])
# Produce the unions
unions = defaultdict(int)
for r in range(1, k + 1):
for indices in combinations(range(k), r):
indices = np.array(indices)
mask = in_set[:, indices].any(axis=1)
name = " U ".join(set_names[indices])
unions[name] = disjoint_counts[mask].sum()
return unions, key发布于 2021-05-18 13:43:40
我意识到这个问题可以写成一个线性方程组。
标识每个不相交子集,其中基集(A,B,C,.)它包含在这样一种情况下,即用布尔值(1,0,1)和基组组成的每一个子集的并集来标识x(A,A∩C) \B\B,因此也用(1,0,1)来标识A_(1,0,1)。
然后建立方程系统,实现由(b1,b2,.)是U= c1 D1 + c2 D2 +.其中Di是ID(Di)的子集的计数,ci为1(如果有的话)(ID(Di)& ID(U)),否则为0。
下面是解决这个问题的代码
def calculate_subsets(unions):
set_names = np.unique(np.hstack(
[union_name.split(" U ") for union_name in unions]
))
k = len(set_names)
n_disjoint_subsets = 2**k - 1
set_ids = np.arange(n_disjoint_subsets).reshape(-1, 1) + 1
in_set = np.unpackbits(np.fliplr(set_ids.view(np.uint8)), axis=1)[:, -k:] == 1
# Build system of linear equations
system = np.zeros(2 * [n_disjoint_subsets], dtype=int)
for i_row, union_name in enumerate(unions):
union_id = np.array([name in union_name for name in set_names])
system[i_row, :] = (union_id & in_set).any(axis=1)
union_values = np.array(list(unions.values()))
disjoint_counts = np.linalg.solve(system, union_values).astype(int)
disjoint_subsets = np.hstack([in_set, disjoint_counts.reshape(-1, 1)])
return disjoint_subsetshttps://stackoverflow.com/questions/67571890
复制相似问题