我必须实现一个搜索功能,这将是容错的。
目前,我的情况如下:
模型:
class Tag(models.Model):
name = models.CharField(max_length=255)
class Illustration(models.Model):
name = models.CharField(max_length=255)
tags = models.ManyToManyField(Tag)查询:
queryset.annotate(similarity=TrigramSimilarity('name', fulltext) + TrigramSimilarity('tags__name', fulltext))示例数据:
插图:
ID | Name | Tags |
---|--------|-------------------|
1 | "Dog" | "Animal", "Brown" |
2 | "Cat" | "Animals" |插图有标签:
ID_Illustration | ID_Tag |
----------------|--------|
1 | 1 |
1 | 2 |
2 | 3 |标签:
ID_Tag | Name |
-------|----------|
1 | Animal |
2 | Brown |
3 | Animals |当我使用"Animal"运行查询时,"Dog"的相似度应该高于"Cat",因为这是一个完美的匹配。
不幸的是,这两个标签在某种程度上被认为是一起的。
目前,它似乎是将标记连接在一个字符串中,然后检查其相似性:
TrigramSimilarity("Animal Brown", "Animal") => X但是,我想对其进行调整,以便在Illustration实例名称和它的标记之间获得最大的相似性:
Max([
TrigramSimilarity('Name', "Animal"),
TrigramSimilarity("Tag_1", "Animal"),
TrigramSimilarity("Tag_2", "Animal"),
]) => XEdit1:我试图查询所有插图,其中标题或其中一个标签的相似性大于X。
Edit2:其他示例:
全文=“动物” TrigramSimilarity(‘动物布朗’,全文) => x TrigramSimilarity(‘动物’,全文) => y 其中x但我想要的是 TrigramSimilarity(最大(‘动物’,‘布朗),全文) => x(与动物相似)TrigramSimilarity(’动物‘,全文) => y 其中x>y
发布于 2018-02-08 14:03:21
您不能破坏tags__name (至少我不知道一种方法)。
从您的示例中,我可以假设有两个可能的解决方案(第一个解决方案并不严格使用Django):
Illustration和Tag来检查每个fulltext和Tag名称,并使用每个名称组成一个查询,其中每个名称都传递给THRESHOLD__。- [`SequenceMatcher`](https://docs.python.org/3/library/difflib.html#difflib.SequenceMatcher) method compares sequences and returns a ratio `0 < ratio < 1` where 0 indicates **No-Match** and 1 indicates **Perfect-Match**. Check this answer for another usage example: [Find the similarity percent between two strings](https://stackoverflow.com/questions/17388213/find-the-similarity-percent-between-two-strings) (_Note:_ There are other strings comparing modules as well, find one that suits you)
- [`Q()`](https://docs.djangoproject.com/en/2.0/ref/models/querysets/#django.db.models.Q) Django objects, allow the creation of complex queries (more on the linked docs).
- With the [`operator`](https://docs.python.org/3/library/operator.html#module-operator) and [`reduce`](https://docs.python.org/3.0/library/functools.html#functools.reduce) we transform a list of `Q()` objects to an OR separated query argument: Q(name=name_1) | Q(name=name_2) | ... | Q(tag_name=tag_name_1) | ...
注释:,您需要定义一个可接受的THRESHOLD。
正如您可以想象的那样,这将是有点慢,但这是预期的,当你需要做一个“模糊”的搜索。
- [`Greatest()`](https://docs.djangoproject.com/en/2.0/ref/models/database-functions/#greatest) accepts an aggregation (not to be confused with the Django method `aggregate`) of expressions or of model fields and returns the max item.
- `TrigramSimilarity(word, search)` returns a rate between 0 and 1. The closer the rate is to 1, the more similar the `word` is to `search`.
- `.filter(similarity__gte=threshold)`, will filter similarities lower than the `threshold`.
- `0 < threshold < 1`. You can set the threshold to `0.6` which is pretty high (consider that the default is `0.3`). **You can play around with that to tune your performance.**
- Finally, order the queryset by the `similarity` rate in a descending order.
发布于 2018-02-09 15:10:47
我只用TrigramSimilarity,最大值和最伟大解决了这个问题。
我在你的问题中填写了一些数据:
from illustrations.models import Illustration, Tag
Tag.objects.bulk_create([Tag(name=t) for t in ['Animal', 'Brown', 'Animals']])
Illustration.objects.bulk_create([Illustration(name=t) for t in ['Dog', 'Cat']])
dog=Illustration.objects.get(name='Dog')
cat=Illustration.objects.get(name='Cat')
animal=Tag.objects.get(name='Animal')
brown=Tag.objects.get(name='Brown')
animals=Tag.objects.get(name='Animals')
dog.tags.add(animal, brown)
cat.tags.add(animals)我导入了所有必要的函数并初始化了fulltext__:
from illustrations.models import Illustration
from django.contrib.postgres.search import TrigramSimilarity
from django.db.models.functions import Greatest
from django.db.models import Max
fulltext = 'Animal'然后我执行了查询:
Illustration.objects.annotate(
max_similarity=Greatest(
Max(TrigramSimilarity('tags__name', fulltext)),
TrigramSimilarity('name', fulltext)
)
).values('name', 'max_similarity')具有以下结果的:
<QuerySet [{'name': 'Dog', 'max_similarity': 1.0}, {'name': 'Cat', 'max_similarity': 0.666667}]>这是从PostgreSQL中删除的SQL查询:
SELECT "illustrations_illustration"."name", GREATEST(MAX(SIMILARITY("illustrations_tag"."name", 'Animal')), SIMILARITY("illustrations_illustration"."name", 'Animal')) AS "max_similarity"
FROM "illustrations_illustration"
LEFT OUTER JOIN "illustrations_illustration_tags" ON ("illustrations_illustration"."id" = "illustrations_illustration_tags"."illustration_id")
LEFT OUTER JOIN "illustrations_tag" ON ("illustrations_illustration_tags"."tag_id" = "illustrations_tag"."id")
GROUP BY "illustrations_illustration"."id", SIMILARITY("illustrations_illustration"."name", 'Animal')可以使用max_similarity注释对结果进行筛选或排序。
https://stackoverflow.com/questions/48603190
复制相似问题