我正在尝试在elasticsearch 6.7中高效地存储和检索浮点数组。Numeroc文档值是排序的,这意味着我不能直接使用它们。
起初,我使用字段的source值,但在大型查询上的性能不是很好。
我尝试将浮动数组编码为二进制,并在脚本中对其进行解码。不幸的是,我在将byte[4]数组转换为painless中的float时遇到了麻烦。
在Java中,它看起来像这样
Float.intBitsToFloat((vector_bytes[3] << 24) | ((vector_bytes[2] & 0xff) << 16) | ((vector_bytes[1] & 0xff) << 8) | (vector_bytes[0] & 0xff));但是丢弃带有& 0xff的标志会抛出一个无痛的"Illegal tree structure."。
你知道该怎么做吗?
最小示例:
设置索引
# Minimal example binary array
# Create the index
PUT binary_array
{
"mappings" : {
"_doc" : {
"properties" : {
"vector_bin": { "type" : "binary", "doc_values": true },
"vector": { "type" : "float" }
}
}
}
}
# Put two documents
PUT binary_array/_doc/1
{
"vector": [1.0, 1.1, 1.2],
"vector_bin": "AACAP83MjD+amZk/"
}
PUT binary_array/_doc/2
{
"vector": [3.0, 2.1, 1.2],
"vector_bin": "AABAQGZmBkCamZk/"
}用于将二进制数组转换回数组的示例搜索
GET binary_array/_search
{
"script_fields": {
"vector_parsed": {
"script": {
"source": """
def vector_bytes = doc["vector_bin"].value.bytes;
def vector = new float[vector_bytes.length/4];
for (int i = 0; i < vector.length; ++i) {
def n = i*4;
// This would be the Java way, discarding the sign of bytes 0-2, but is raises a "Illegal tree structure." in painless
//def intBits = (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] & 0xff) << 16) | ((vector_bytes[n+1] & 0xff) << 8) | (vector_bytes[n] & 0xff);
// This runs but gives incorrect results
def intBits = (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] ) << 16) | ((vector_bytes[n+1] ) << 8) | (vector_bytes[n] );
vector[i] = Float.intBitsToFloat( intBits );
}
return vector;
"""
}
},
"vector_src": {
"script": """params._source["vector"]"""
}
}
}发布于 2019-07-26 10:27:06
经过进一步的研究,我意识到逐位and确实可以在painless中工作,但是0xff不能。
这解决了我的问题:
Float.intBitsToFloat( (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] & 255) << 16) | ((vector_bytes[n+1] & 255) << 8) | (vector_bytes[n] & 255) )https://stackoverflow.com/questions/57211362
复制相似问题