我必须在一个元组中计算“Id”的频率,如下所示:
('{44371-zwart,40793,41878,44747,44371-wit}',),
('{46022,47917,48267,48343,48221}',),
('{43566,43834,31726,23503,4488}',),
('{21896,9391,32171,30984-wit-3942,27211}',),
('{35306,16901,24027,44222,38597}',),
('{40867,40872,41437,31421,35570-grijs}',),
('{32481,35728,36463,32473,43719}',)这只是数据的一小部分(约0.5%)
我现在的代码是:
cur.execute('SELECT similars FROM profiles')
data = cur.fetchall()
c = Counter(elem[0] for elem in data)它返回以下内容:
{
45110,46709,45109,45115,46462}': 1,
'{38535,38529,38532,38527,38546}': 1,
'{20062,17013,20634,21691,20622}': 1,
'{21141,43588,39649,45900,17126}': 1,
'{43552,41475,41478,32848,41477}': 1,
'{42265,42266,43570,26203,28862}': 1,
'{47874,47873,47878,47802-bruin,33101-avengers}': 1,
'{26234,2401,30414,5655,16605}': 1,
'{43405,43575,39649,21141,43195}': 1,
'{35420,35422,35367,35418,35417}': 1,
'{43195,47323,39649,43575,44454}': 1,
'{9760,43572,9764,9768,9816}': 1我期望/想要的结果是:
{'12392': 2, '7862': 1, '12313': 41}发布于 2021-04-06 10:38:40
既然你得到了这个
'{45110,46709,45109,45115,46462}': 1, '{38535,38529,38532,38527,38546}': 1, '{20062,17013,20634,21691,20622}': 1, '{21141,43588,39649,45900,17126}': 1, '{43552,41475,41478,32848,41477}': 1, '{42265,42266,43570,26203,28862}': 1, '{47874,47873,47878,47802-bruin,33101-avengers}': 1, '{26234,2401,30414,5655,16605}': 1, '{43405,43575,39649,21141,43195}': 1, '{35420,35422,35367,35418,35417}': 1, '{43195,47323,39649,43575,44454}': 1, '{9760,43572,9764,9768,9816}': 1在字典中转换此输出,以便您的第一级输出如下所示:
dct = {'{45110,46709,45109,45115,46462}': 1, '{38535,38529,38532,38527,38546}': 1, '{20062,17013,20634,21691,20622}': 1, '{21141,43588,39649,45900,17126}': 1, '{43552,41475,41478,32848,41477}': 1, '{42265,42266,43570,26203,28862}': 1, '{47874,47873,47878,47802-bruin,33101-avengers}': 1, '{26234,2401,30414,5655,16605}': 1, '{43405,43575,39649,21141,43195}': 1, '{35420,35422,35367,35418,35417}': 1, '{43195,47323,39649,43575,44454}': 1, '{9760,43572,9764,9768,9816}': 1
}现在,创建一个空的id_corpus of list type,然后使用dct.keys()获取该字典的所有键,并在这些键上启动一个循环。
现在,使用replace()的str class方法删除第一个也是最后一个括号,并使用split()方法将剩余的字符串解压到list中。将这个新表单列表添加到id_corpus中。记住不要append,使用+运算符添加它
最后,创建一个空的语料库字典并对id_corpus中的元素进行迭代,如果该元素存在于语料库字典中,则将其值增加1,否则将其值设置为1。
这是最后的解决方案
# Since I don't know how your data looks like
# and in what format are you getting data from MySQL
# that's why I am appending your solution
# A more optimized approach can be developed
# if I know more about the problem
dct = {'{45110,46709,45109,45115,46462}': 1, '{38535,38529,38532,38527,38546}': 1, '{20062,17013,20634,21691,20622}': 1, '{21141,43588,39649,45900,17126}': 1, '{43552,41475,41478,32848,41477}': 1, '{42265,42266,43570,26203,28862}': 1, '{47874,47873,47878,47802-bruin,33101-avengers}': 1, '{26234,2401,30414,5655,16605}': 1, '{43405,43575,39649,21141,43195}': 1, '{35420,35422,35367,35418,35417}': 1, '{43195,47323,39649,43575,44454}': 1, '{9760,43572,9764,9768,9816}': 1}
lst = []
for ky in dct.keys():
ky = ky.replace('{', '')
ky = ky.replace('}', '')
ky = ky.split(',')
lst += ky
sol = dict()
for id in lst:
if id in sol.keys():
sol[id] += 1
else:
sol[id] = 1
print(sol)输出
{'16605': 1, '44454': 1, '45900': 1, '20634': 1, '46462': 1, '35422': 1, '35420': 1, '17013': 1, '38532': 1, '47323': 1, '21141': 2, '43405': 1, '38527': 1, '17126': 1, '9816': 1, '38529': 1, '35418': 1, '45109': 1, '2401': 1, '41477': 1, '41478': 1, '41475': 1, '47802-bruin': 1, '26234': 1, '32848': 1, '35367': 1, '43195': 2, '20622': 1, '43588': 1, '35417': 1, '9760': 1, '38546': 1, '9764': 1, '28862': 1, '26203': 1, '9768': 1, '5655': 1, '39649': 3, '47874': 1, '43552': 1, '47873': 1, '38535': 1, '21691': 1, '30414': 1, '20062': 1, '43570': 1, '42266': 1, '42265': 1, '43575': 2, '46709': 1, '43572': 1, '47878': 1, '45110': 1, '33101-avengers': 1, '45115': 1}https://stackoverflow.com/questions/66966735
复制相似问题