首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何存储房地产属性

如何存储房地产属性
EN

Stack Overflow用户
提问于 2015-10-24 07:31:05
回答 1查看 434关注 0票数 2

我对Elasticsearch非常陌生。我有一些文件可以具有这样的属性:

  1. 浴室内没有
  2. 卧室
  3. Zip
  4. 地址

我想将这些属性存储在一个单独的字段中,这样用户就可以使用"3张床在97778(zip)“进行搜索。

我试过用一个数组字段,例如3张床、2张浴室、97778张床、3张浴室、97778张塞字分析器,这样我就可以限制这类词中的“at”、"at",但这似乎不是正确的方法,因为第二个文档的得分高于第一个文档。

此外,我有一个同义词分析器,因为如果一个用户搜索与“3BD”,它应该返回"3床位“。

现在我的问题是,存储属性的最佳方法是什么?这是我的一些假文件。

代码语言:javascript
复制
{
    "Beds" : 3,
    "Bath" : 2,
    "Zip" : 97778,
    "Attributes" : ["3 beds","2 baths", "97778"]
},
{
    "Beds" : 7,
    "Bath" : 3,
    "Zip" : 97778,
    "Attributes" : [7 beds,3 baths, 97778]
}

我是否应该将此架构更改为

代码语言:javascript
复制
{
    "Beds" : 7,
    "Bath" : 3,
    "Zip" : 97778,
    "Attributes" : [bed : "7", bath : "3", zip : "97778"]
}

如果是这样的话,那我该如何使用同义词分析器呢?

EN

回答 1

Stack Overflow用户

发布于 2015-10-24 08:19:35

第一种结构看起来更适合我。我使用Marvel在本地机器上使用这些属性创建了一个简单的索引:

代码语言:javascript
复制
PUT /test
{
  "settings": {
    "analysis": {
      "filter": {
        "my_stop": {
          "type":       "stop",
          "stopwords":  "_english_" 
        },
        "my_possessive_stemmer": {
          "type":       "stemmer",
          "language":   "possessive_english"
        },
        "my_synonym": {
          "type": "synonym",
          "synonyms": [
            "bd => bed",
            "bt, baths, bth => bath"]
        },
        "my_shingle": {
          "type" : "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 3,
          "output_unigrams": false,
          "output_unigrams_if_no_shingles": true
        }
      },
      "analyzer": {
        "my_english": {
          "tokenizer":  "standard",
          "filter": [
            "my_possessive_stemmer",
            "lowercase",
            "my_stop",
            "my_synonym",
            "kstem",
            "my_shingle"
          ]
        }
      }
    }
  },
  "mappings": {
    "documents": {
      "properties": {
        "Beds": {
          "type": "integer"
        },
        "Baths": {
          "type": "integer"
        },
        "Zip": {
          "type": "integer"
        },
        "Attributes": {
          "type": "string",
          "analyzer": "my_english"
        }
      }
    }
  }
}

这是一个标准的英语分析器(我只排除了词干过滤器,在我看来,它太咄咄逼人了,代之以kstem),当然还有你的同义词。我还添加了板条过滤器,它产生令牌组合,这是我们正在寻找的!

我增加了你的测试数据。请注意,我已经加倍关键字zip,以防用户希望查找zip 97778或97778 zip,这可能是可能的。

代码语言:javascript
复制
PUT /test/documents/1
{
  "Beds": 3,
  "Bath": 2,
  "Zip": 97778,
  "Attributes": ["3 beds", "2 baths", "zip 97778 zip"]
}

PUT /test/documents/2
{
  "Beds": 7,
  "Bath": 3,
  "Zip": 97778,
  "Attributes": ["7 beds", "3 baths", "zip 97778 zip"]
}

POST /test/documents/3
{
  "Attributes" : ["8310 prairie rose place", "md", "baltimore", "21208", "us", "3 bd", "3 bth", "1 pbh", "1 hbh", "cooktop", "dishwasher", "dryer", "garbage disposer", "ice maker", "microwave", "oven", "oven - double", "refrigerator", "washer", "appliances", "contemporary architecture", "ceiling fan(s)", "colling system", "brick", "basement", "forced air", "heating system", "3 floors", "2 parkings", "garage", "asphalt roof"]
}

POST /test/documents/4
{
  "Attributes" : ["8 winners circle", "md", "owings mills", "21117", "us", "2 bd", "1 bth", "dishwasher", "dryer", "garbage disposer", "microwave", "range", "refrigerator", "washer", "appliances", "traditional architecture", "new traditional architecture", "central a/c", "colling system", "vinyl siding", "heat pump", "heating system", "1 floors", "assigned", "unassigned", "unknown roof"]
}

这是一个简单的匹配查询:

代码语言:javascript
复制
POST /test/documents/_search
{
  "query": {
    "match": {
      "Attributes": {
        "query": "3 beds at 97778(zip)"
      }
    }
  }
}

它根据要求提供所需的数据:

代码语言:javascript
复制
{
  "_index" : "test",
  "_type" : "documents",
  "_id" : "1",
  "_score" : 0.020668881,
  "_source" : {
    "Beds" : 3,
    "Bath" : 2,
    "Zip" : 97778,
    "Attributes" : [
      "3 beds",
      "2 baths",
      "zip 97778 zip"
    ]
  }
},
{
  "_index" : "test",
  "_type" : "documents",
  "_id" : "2",
  "_score" : 0.004767749,
  "_source" : {
    "Beds" : 7,
    "Bath" : 3,
    "Zip" : 97778,
    "Attributes" : [
      "7 beds",
      "3 baths",
      "zip 97778 zip"
    ]
  }
},
{
  "_index" : "test",
  "_type" : "documents",
  "_id" : "3",
  "_score" : 0.0014899216,
  "_source" : {
    "Attributes" : [
      "8310 prairie rose place",
      "md",
      "baltimore",
      "21208",
      "us",
      "3 bd",
      "3 bth",
      "1 pbh",
      "1 hbh",
      "cooktop",
      "dishwasher",
      "dryer",
      "garbage disposer",
      "ice maker",
      "microwave",
      "oven",
      "oven - double",
      "refrigerator",
      "washer",
      "appliances",
      "contemporary architecture",
      "ceiling fan(s)",
      "colling system",
      "brick",
      "basement",
      "forced air",
      "heating system",
      "3 floors",
      "2 parkings",
      "garage",
      "asphalt roof"
    ]
  }
}

现在当我问这个问题时:

代码语言:javascript
复制
POST /test/documents/_search
{
  "query": {
    "match": {
      "Attributes": {
        "query": "2 bd and 1 bth at md"
      }
    }
  }
}

它返回这个结果,这是正确的:

代码语言:javascript
复制
{
  "_index" : "test",
  "_type" : "documents",
  "_id" : "4",
  "_score" : 0.0032357208,
  "_source" : {
    "Attributes" : [
      "8 winners circle",
      "md",
      "owings mills",
      "21117",
      "us",
      "2 bd",
      "1 bth",
      "dishwasher",
      "dryer",
      "garbage disposer",
      "microwave",
      "range",
      "refrigerator",
      "washer",
      "appliances",
      "traditional architecture",
      "new traditional architecture",
      "central a/c",
      "colling system",
      "vinyl siding",
      "heat pump",
      "heating system",
      "1 floors",
      "assigned",
      "unassigned",
      "unknown roof"
    ]
  }
}

您说您的结果总是以1得分,这表明您的查询运行不正确。我可以猜到,这个问题是针对attributes字段而不是Attributes运行的,不幸的是,Elasticsearch非常区分大小写。

从评论中,你说你在使用术语查询 --对文本数据使用它是不对的,因为它总是在寻找准确的术语匹配。总是在搜索文本数据时使用匹配查询

如果这有帮助的话请告诉我。

票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/33315690

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档