我一直在试用google的实体分析器,它看起来真的很不错!
但我已经为此头疼了一段时间了--我试图复制下面的图像(在google's natural language api page上看到)。

这是我从请求中返回的实体数据的格式。
数据没有顺序,只有出现的数据-所以循环每个单词,并检查每个单词看起来非常慢,而且由于每个单词都有多个单词,所以可能会变得有点复杂。
[
{
"mentions": [
{
"text": { "content": "group", "beginOffset": -1 },
"type": "COMMON",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "group", "beginOffset": -1 },
"type": "COMMON",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "group", "beginOffset": -1 },
"type": "COMMON",
"sentiment": { "magnitude": 0.30000001192092896, "score":0.30000001192092896 }
},
{
"text": { "content": "group", "beginOffset": -1 },
"type": "COMMON",
"sentiment": { "magnitude": 0.30000001192092896, "score":-0.30000001192092896 }
},
{
"text": { "content": "group", "beginOffset": -1 },
"type": "COMMON",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "group", "beginOffset": -1 },
"type": "COMMON",
"sentiment": { "magnitude": 0, "score": 0 }
}
],
"metadata": {},
"name": "group",
"type": "ORGANIZATION",
"salience": 0.34768930077552795,
"sentiment": { "magnitude": 1.100000023841858, "score": 0 }
},
{
"mentions": [
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0.10000000149011612, "score":-0.10000000149011612 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0.20000000298023224, "score": -0.20000000298023224 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth of Nations", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
},
{
"text": { "content": "Commonwealth\r\nOne", "beginOffset": -1 },
"type": "PROPER",
"sentiment": { "magnitude": 0, "score": 0 }
}
],
"metadata": {
"mid": "/m/0j7v_",
"wikipedia_url": "https://en.wikipedia.org/wiki/Commonwealth_of_Nations"
},
"name": "Commonwealth of Nations",
"type": "LOCATION",
"salience": 0.28001657128334045,
"sentiment": { "magnitude": 1.7000000476837158, "score": 0 }
},
...
]有没有一种我完全忽略的简单方法来做这件事?感谢你的见解/想法。
奥利
发布于 2018-02-23 00:42:56
我相信就是你所需要的:
beginOffset,表示给定文本中句子开头的(从零开始的)字符偏移量。请注意,此偏移量是使用传递的encodingType计算的。
如果您在请求中指定EncodingType,它应该可以工作。
如果未指定EncodingType,则编码相关信息(如beginOffset)将设置为-1。
https://stackoverflow.com/questions/48766977
复制相似问题