文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在Elasticsearch/painless中将二进制数据转换回浮点数组

问如何在Elasticsearch/painless中将二进制数据转换回浮点数组
EN

Stack Overflow用户

提问于 2019-07-26 07:35:52

回答 1查看 378关注 0票数 0

我正在尝试在elasticsearch 6.7中高效地存储和检索浮点数组。Numeroc文档值是排序的，这意味着我不能直接使用它们。

起初，我使用字段的source值，但在大型查询上的性能不是很好。

我尝试将浮动数组编码为二进制，并在脚本中对其进行解码。不幸的是，我在将byte[4]数组转换为painless中的float时遇到了麻烦。

在Java中，它看起来像这样

Float.intBitsToFloat((vector_bytes[3] << 24) | ((vector_bytes[2] & 0xff) << 16) |  ((vector_bytes[1] & 0xff) << 8) |  (vector_bytes[0] & 0xff));

但是丢弃带有& 0xff的标志会抛出一个无痛的"Illegal tree structure."。

你知道该怎么做吗？

最小示例：

设置索引

# Minimal example binary array
# Create the index
PUT binary_array 
{
  "mappings" : {
      "_doc" : {
          "properties" : {
              "vector_bin": { "type" : "binary", "doc_values": true },
              "vector": { "type" : "float" }
          }
      }
  }
}
# Put two documents
PUT binary_array/_doc/1
{
  "vector": [1.0, 1.1, 1.2],
  "vector_bin": "AACAP83MjD+amZk/"
}
PUT binary_array/_doc/2
{
  "vector": [3.0, 2.1, 1.2],
  "vector_bin": "AABAQGZmBkCamZk/"
}

用于将二进制数组转换回数组的示例搜索

GET binary_array/_search
{
  "script_fields": {
    "vector_parsed": {
      "script": {
        "source": """
        def vector_bytes = doc["vector_bin"].value.bytes;
        def vector = new float[vector_bytes.length/4];
        for (int i = 0; i < vector.length; ++i) {
          def n = i*4;
          // This would be the Java way, discarding the sign of bytes 0-2, but is raises a "Illegal tree structure." in painless
          //def intBits = (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] & 0xff) << 16) |  ((vector_bytes[n+1] & 0xff) << 8) |  (vector_bytes[n] & 0xff);
          // This runs but gives incorrect results
          def intBits = (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] ) << 16) |  ((vector_bytes[n+1] ) << 8) |  (vector_bytes[n] );
          vector[i] = Float.intBitsToFloat( intBits );
        }
        return vector;
        """
      }
    },
    "vector_src": {
      "script": """params._source["vector"]"""
    }
  }
}

elasticsearch

elasticsearch-6

elasticsearch-painless

回答 1

Stack Overflow用户

发布于 2019-07-26 10:27:06

经过进一步的研究，我意识到逐位and确实可以在painless中工作，但是0xff不能。

这解决了我的问题：

Float.intBitsToFloat( (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] & 255) << 16) |  ((vector_bytes[n+1] & 255) << 8) |  (vector_bytes[n] & 255) )

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57211362

复制

相似问题

问如何在Elasticsearch/painless中将二进制数据转换回浮点数组
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Elasticsearch/painless中将二进制数据转换回浮点数组EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Elasticsearch/painless中将二进制数据转换回浮点数组
EN