我有以下格式的mongodb集合数据
[{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 11:20", "ml_pred":"Invalid","hum_pred":"valid"},
{"name":"axe2","base-url":"www.example2.com","date":"2022-06-22 12:20", "ml_pred":"Valid","hum_pred":"null"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 22:20", "ml_pred":"Invalid","hum_pred":"valid"},
{"name":"axe3","base-url":"www.example3.com","date":"2022-06-22 02:20", "ml_pred":"Valid","hum_pred":"null"},
{"name":"axe2","base-url":"www.example2.com","date":"2022-06-22 06:20", "ml_pred":"Invalid","hum_pred":"valid"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 14:20", "ml_pred":"Invalid","hum_pred":"null"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 10:20", "ml_pred":"Invalid","hum_pred":"invalid"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 01:20", "ml_pred":"Invalid","hum_pred":"null"}]我试图得到独特的基-url和名字作为一个回应。为此,我使用了pymongo,如下所示
filter_stuff = {'base-url': 1, 'name':1,'_id': 0}
data = list(crawlcol.find({},filter_stuff).distinct("base-url"))还给了我一个基本网址的列表。但我希望能有这样的输出
[{"name":"axe1","base-url":"www.example1.com"},
{"name":"axe2","base-url":"www.example2.com"},
{"name":"axe3","base-url":"www.example3.com"}]这是如何获得的
发布于 2022-07-01 20:23:54
这将根据需要给出结果。
result = list(crawlcol.aggregate(
[
{"$group": { "_id": { "base-url": "$base-url", "name": "$name" } } }
]
))https://stackoverflow.com/questions/72833338
复制相似问题