文章/答案/技术大牛

发布

社区首页 >问答首页 >来自ManyToManyField的最大相似度(ManyToManyField)

问来自ManyToManyField的最大相似度(ManyToManyField)
EN

Stack Overflow用户

提问于 2018-02-03 23:34:59

回答 2查看 6K关注 0票数 10

我必须实现一个搜索功能，这将是容错的。

目前，我的情况如下：

模型：

class Tag(models.Model):
    name = models.CharField(max_length=255)

class Illustration(models.Model):
    name = models.CharField(max_length=255)
    tags = models.ManyToManyField(Tag)

查询：

queryset.annotate(similarity=TrigramSimilarity('name', fulltext) + TrigramSimilarity('tags__name', fulltext))

示例数据：

插图：

ID |  Name  |        Tags       |
---|--------|-------------------|
 1 | "Dog"  | "Animal", "Brown" |
 2 | "Cat"  | "Animals"         |

插图有标签：

ID_Illustration | ID_Tag |
----------------|--------|
       1        |    1   |
       1        |    2   |
       2        |    3   |

标签：

ID_Tag |   Name   |
-------|----------|
   1   |  Animal  |
   2   |  Brown   |
   3   |  Animals |

当我使用"Animal"运行查询时，"Dog"的相似度应该高于"Cat"，因为这是一个完美的匹配。

不幸的是，这两个标签在某种程度上被认为是一起的。

目前，它似乎是将标记连接在一个字符串中，然后检查其相似性：

TrigramSimilarity("Animal Brown", "Animal") => X

但是，我想对其进行调整，以便在Illustration实例名称和它的标记之间获得最大的相似性：

Max([
    TrigramSimilarity('Name', "Animal"), 
    TrigramSimilarity("Tag_1", "Animal"), 
    TrigramSimilarity("Tag_2", "Animal"),
]) => X

Edit1:我试图查询所有插图，其中标题或其中一个标签的相似性大于X。

Edit2:其他示例：

全文=“动物” TrigramSimilarity(‘动物布朗’，全文) => x TrigramSimilarity(‘动物’，全文) => y 其中x但我想要的是 TrigramSimilarity(最大(‘动物’，‘布朗)，全文) => x(与动物相似)TrigramSimilarity(’动物‘，全文) => y 其中x>y

python

django

postgresql

django-queryset

trigram

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-02-08 14:03:21

您不能破坏tags__name (至少我不知道一种方法)。

从您的示例中，我可以假设有两个可能的解决方案(第一个解决方案并不严格使用Django)：

并不是所有的东西都需要严格地通过Django 我们有Python的能力，所以让我们使用它们：让我们首先编写这个查询：从django.db.models导入django.db.models导入query create_query(全文)：illustration_names = Illustration.objects.values_list(' name '，flat=True) tag_names = Tag.objects.values_list(' name '，flat=True)查询= []在illustration_names中的名称: score = SequenceMatcher (无，名称，名称，全文).ratio()如果分数== 1：#完全匹配的名称返回Q(name=name)如果分数>=阈值:query.append(Q(name=name)中的名字在tag_names: score =SequenceMatcher(无，名称，名称，全文).ratio() if得分== 1：#完全匹配名称返回Q(tags__name=name)如果分数>=阈值: query.append(Q(tags__name=name))返回查询然后创建查询集：从函式工具导入or_ queryset =Illustration.objects.filter(or_，create_query(全文))，只需在python 3中从操作符导入 解码上面的内容： 我们正在根据我们的Illustration和Tag来检查每个fulltext和Tag名称，并使用每个名称组成一个查询，其中每个名称都传递给THRESHOLD__。

- [`SequenceMatcher`](https://docs.python.org/3/library/difflib.html#difflib.SequenceMatcher) method compares sequences and returns a ratio `0 < ratio < 1` where 0 indicates **No-Match** and 1 indicates **Perfect-Match**. Check this answer for another usage example: [Find the similarity percent between two strings](https://stackoverflow.com/questions/17388213/find-the-similarity-percent-between-two-strings) (_Note:_ There are other strings comparing modules as well, find one that suits you)
- [`Q()`](https://docs.djangoproject.com/en/2.0/ref/models/querysets/#django.db.models.Q) Django objects, allow the creation of complex queries (more on the linked docs).
- With the [`operator`](https://docs.python.org/3/library/operator.html#module-operator) and [`reduce`](https://docs.python.org/3.0/library/functools.html#functools.reduce) we transform a list of `Q()` objects to an OR separated query argument:

Q(name=name_1) | Q(name=name_2) | ... | Q(tag_name=tag_name_1) | ...

注释：，您需要定义一个可接受的THRESHOLD。

正如您可以想象的那样，这将是有点慢，但这是预期的，当你需要做一个“模糊”的搜索。

( Django方法：) 使用具有高相似度阈值的查询，并按此相似性率对查询集排序：)).filter(similarity__gte=threshold).order_by('-similarity') ( similarity=Greatest( TrigramSimilarity('name'，全文))、TrigramSimilarity(‘tag__name’，全文) 解码上面的内容：

- [`Greatest()`](https://docs.djangoproject.com/en/2.0/ref/models/database-functions/#greatest) accepts an aggregation (not to be confused with the Django method `aggregate`) of expressions or of model fields and returns the max item.
- `TrigramSimilarity(word, search)` returns a rate between 0 and 1. The closer the rate is to 1, the more similar the `word` is to `search`.
- `.filter(similarity__gte=threshold)`, will filter similarities lower than the `threshold`.
- `0 < threshold < 1`. You can set the threshold to `0.6` which is pretty high (consider that the default is `0.3`). **You can play around with that to tune your performance.**
- Finally, order the queryset by the `similarity` rate in a descending order.

票数 11

Stack Overflow用户

发布于 2018-02-09 15:10:47

我只用TrigramSimilarity，最大值和最伟大解决了这个问题。

我在你的问题中填写了一些数据：

from illustrations.models import Illustration, Tag
Tag.objects.bulk_create([Tag(name=t) for t in ['Animal', 'Brown', 'Animals']])
Illustration.objects.bulk_create([Illustration(name=t) for t in ['Dog', 'Cat']])
dog=Illustration.objects.get(name='Dog')
cat=Illustration.objects.get(name='Cat')
animal=Tag.objects.get(name='Animal')
brown=Tag.objects.get(name='Brown')
animals=Tag.objects.get(name='Animals')
dog.tags.add(animal, brown)
cat.tags.add(animals)

我导入了所有必要的函数并初始化了fulltext__：

from illustrations.models import Illustration
from django.contrib.postgres.search import TrigramSimilarity
from django.db.models.functions import Greatest
from django.db.models import Max
fulltext = 'Animal'

然后我执行了查询：

Illustration.objects.annotate(
    max_similarity=Greatest(
        Max(TrigramSimilarity('tags__name', fulltext)),
        TrigramSimilarity('name', fulltext)
    )
).values('name', 'max_similarity')

具有以下结果的：

<QuerySet [{'name': 'Dog', 'max_similarity': 1.0}, {'name': 'Cat', 'max_similarity': 0.666667}]>

这是从PostgreSQL中删除的SQL查询：

SELECT "illustrations_illustration"."name", GREATEST(MAX(SIMILARITY("illustrations_tag"."name", 'Animal')), SIMILARITY("illustrations_illustration"."name", 'Animal')) AS "max_similarity"
FROM "illustrations_illustration"
LEFT OUTER JOIN "illustrations_illustration_tags" ON ("illustrations_illustration"."id" = "illustrations_illustration_tags"."illustration_id")
LEFT OUTER JOIN "illustrations_tag" ON ("illustrations_illustration_tags"."tag_id" = "illustrations_tag"."id")
GROUP BY "illustrations_illustration"."id", SIMILARITY("illustrations_illustration"."name", 'Animal')

可以使用max_similarity注释对结果进行筛选或排序。

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48603190

复制

相似问题

问来自ManyToManyField的最大相似度(ManyToManyField)
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问来自ManyToManyField的最大相似度(ManyToManyField)EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问来自ManyToManyField的最大相似度(ManyToManyField)
EN