首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >将我的elasticsearch存储库连接到INCEpTION NER注释程序的正确的INCEpTION是什么?

将我的elasticsearch存储库连接到INCEpTION NER注释程序的正确的INCEpTION是什么?
EN

Stack Overflow用户
提问于 2022-01-22 07:06:36
回答 1查看 19关注 0票数 0

我已经为文本注释设置了一个本地环境,并希望使用在这里开发的INCEpTION应用程序:https://github.com/inception-project/inception/blob/main/CONTRIBUTORS.txt

当试图连接到我的存储库时,我可以使用下面的示例来连接和查找文档:https://inception-project.github.io/releases/22.1/docs/user-guide.html#sect_external-search-repos

但是,当试图连接到使用FSCrawler创建和索引的存储库时,我无法让搜索开始工作。

其示例的映射如下:

代码语言:javascript
复制
{
  "mappings": {
    "_doc": {
      "properties": {
        "doc": {
          "properties": {
            "text": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "metadata": {
          "properties": {
            "language": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "source": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "timestamp": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "title": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "uri": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      }
    }
  }
}

我的索引映射是:

代码语言:javascript
复制
{
  "mappings": {
    "_doc": {
      "dynamic_templates": [
        {
          "raw_as_text": {
            "path_match": "meta.raw.*",
            "mapping": {
              "fields": {
                "keyword": {
                  "ignore_above": 256,
                  "type": "keyword"
                }
              },
              "type": "text"
            }
          }
        }
      ],
      "properties": {
        "attachment": {
          "type": "binary"
        },
        "attributes": {
          "properties": {
            "group": {
              "type": "keyword"
            },
            "owner": {
              "type": "keyword"
            }
          }
        },
        "content": {
          "type": "text"
        },
        "file": {
          "properties": {
            "checksum": {
              "type": "keyword"
            },
            "content_type": {
              "type": "keyword"
            },
            "created": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "extension": {
              "type": "keyword"
            },
            "filename": {
              "type": "keyword",
              "store": true
            },
            "filesize": {
              "type": "long"
            },
            "indexed_chars": {
              "type": "long"
            },
            "indexing_date": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "last_accessed": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "last_modified": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "url": {
              "type": "keyword",
              "index": false
            }
          }
        },
        "meta": {
          "properties": {
            "altitude": {
              "type": "text"
            },
            "author": {
              "type": "text"
            },
            "comments": {
              "type": "text"
            },
            "contributor": {
              "type": "text"
            },
            "coverage": {
              "type": "text"
            },
            "created": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "creator_tool": {
              "type": "keyword"
            },
            "date": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "description": {
              "type": "text"
            },
            "format": {
              "type": "text"
            },
            "identifier": {
              "type": "text"
            },
            "keywords": {
              "type": "text"
            },
            "language": {
              "type": "keyword"
            },
            "latitude": {
              "type": "text"
            },
            "longitude": {
              "type": "text"
            },
            "metadata_date": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "modifier": {
              "type": "text"
            },
            "print_date": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "publisher": {
              "type": "text"
            },
            "rating": {
              "type": "byte"
            },
            "relation": {
              "type": "text"
            },
            "rights": {
              "type": "text"
            },
            "source": {
              "type": "text"
            },
            "title": {
              "type": "text"
            },
            "type": {
              "type": "text"
            }
          }
        },
        "path": {
          "properties": {
            "real": {
              "type": "keyword",
              "fields": {
                "fulltext": {
                  "type": "text"
                },
                "tree": {
                  "type": "text",
                  "analyzer": "fscrawler_path",
                  "fielddata": true
                }
              }
            },
            "root": {
              "type": "keyword"
            },
            "virtual": {
              "type": "keyword",
              "fields": {
                "fulltext": {
                  "type": "text"
                },
                "tree": {
                  "type": "text",
                  "analyzer": "fscrawler_path",
                  "fielddata": true
                }
              }
            }
          }
        }
      }
    }
  }
}

我可以使用标准_search从其他任何地方很好地搜索这两个存储库,并匹配"content“对象。

代码语言:javascript
复制
{
  "metadata": {
    "language": "en",
    "source": "My favourite document collection",
    "timestamp": "2011/11/11 11:11",
    "uri": "http://the.internet.com/my/document/collection/document1.txt",
    "title": "Cool Document Title"
  },
  "doc": {
      "text": "This is a test Document"
  }
}

即使在将示例1级别向上移动时,该查询也适用于该示例。

代码语言:javascript
复制
{
  "metadata": {
    "language": "en",
    "source": "My favourite document collection",
    "timestamp": "2011/11/11 11:11",
    "uri": "http://the.internet.com/my/document/collection/document1.txt",
    "title": "Cool Document Title"
  },
  "doc": "This is a test Document"
  }
}

为了访问映射的“内容”对象,我需要在下面指定哪个对象?

克里斯

EN

回答 1

Stack Overflow用户

发布于 2022-01-23 08:46:15

不幸的是,这是一个功能问题。

映射必须非常具体才能使用这一点,除非文档映射非常具体,否则重新映射(更改fscrawler映射)不起作用。

简单地更改字段和类型是行不通的。

https://github.com/inception-project/inception/issues/2516

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70810725

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档