我想在我的应用程序中提供全文搜索功能,所以我试图配置Azure搜索和认知搜索功能,以便我可以索引图像以及存储在Azure Blob存储中的非图像文档。然而,在使用Azure Search的REST通过Java代码配置Azure搜索时,我无法在Azure搜索中利用OCR功能,图像文档也没有被索引。在通过Java代码配置Azure搜索(使用Azure搜索REST )时,我遗漏了一些配置细节。
案例1:从Azure Portal,我可以
。
案例2:通过使用Azure REST的Java代码,我可以
我使用的示例Azure搜索Rest来自Java代码1. https://%s.search.windows.net/datasources?api-version=%s 2. https://%s.search.windows.net/skillsets/cog-search-demo-ss?api-version=%s 3. https://%s.search.windows.net/indexes/%s?api-version=%s 4. https://%s.search.windows.net/indexers?api-version=%s
配置jsons: 1. datasource.json
{
"name" : "csstoragetest",
"type" : "azureblob",
"credentials" : { "connectionString" : "connectionString" },
"container" : { "name" : "csblob" }
}{
"description": "Extract text from images and merge with content text to produce merged_text",
"skills":
[
{
"description": "Extract text (plain and structured) from image.",
"@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
"context": "/document/normalized_images/*",
"defaultLanguageCode": "null",
"detectOrientation": true,
"inputs": [
{
"name": "image",
"source": "/document/normalized_images/*"
}
],
"outputs": [
{
"name": "text",
"targetName": "myText"
},
{
"name": "layoutText",
"targetName": "myLayoutText"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.MergeSkill",
"description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
{
"name":"text", "source": "/document/content"
},
{
"name": "itemsToInsert", "source": "/document/normalized_images/*/text"
},
{
"name":"offsets", "source": "/document/normalized_images/*/contentOffset"
}
],
"outputs": [
{
"name": "mergedText", "targetName" : "merged_text"
}
]
}
]
}{
"name": "azureblob-indexing",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true, "searchable": false },
{ "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
]
}{
"name" : "azureblob-indexing1",
"dataSourceName" : "csstoragetest",
"targetIndexName" : "azureblob-indexing",
"schedule" : { "interval" : "PT2H" },
"skillsetName" : "cog-search-demo-ss",
"parameters":
{
"maxFailedItems":-1,
"maxFailedItemsPerBatch":-1,
"configuration":
{
"dataToExtract": "contentAndMetadata",
"imageAction":"generateNormalizedImages",
"parsingMode": "default",
"firstLineContainsHeaders": false,
"delimitedTextDelimiter": ","
}
}
}在通过java代码配置Azure搜索之后,图像文档应该在azure搜索中被索引,我应该能够根据它们中包含的文本搜索它们。
发布于 2019-11-05 12:24:33
我已经算出了我自己所需要的配置。它需要匹配案例1和2之间的所有参数,如上面所述(在问题中),然后更新配置jsons。
发布于 2019-11-01 02:06:32
尝试在没有skillset.json引号的情况下将默认语言代码设置为null
"defaultLanguageCode": nullhttps://stackoverflow.com/questions/58642521
复制相似问题