我在一个单元日志表中存储了一个XML (如下面所示)。
<user>
<uid>1424324325</uid>
<attribs>
<field>
...
</field>
<field>
<name>first</name>
<value>Joh,n</value>
</field>
<field>
...
</field>
<field>
<name>last</name>
<value>D,oe</value>
</field>
<field>
...
</field>
</attribs>
</user>hive表中的每一行都有关于不同用户的信息,我希望提取uid、name和name的值(从名称中删除任何逗号)。
1424324325 John Doe
1424435463 Jane Smith我能够从XML中提取值。
SELECT uid, fn, ln
FROM log_table
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/uid/text()')) uids as uid
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "first_name"]/value/text()')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "last_name"]/value/text()')) lns as ln;但是,我很难从名字&姓氏中删除不必要的逗号(如果有的话)。
当我试图使用下面所示的任何方法提取名字时,结果是空的。
LATERAL VIEW explode(xpath(logs['users_updates'], '/users/attribs/field[name = "first_name"]/value/replace(text(),",","")')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/users/attribs/field[name = "first_name"]/value/translate(text(),",","")')) fns as fn当我尝试如下所示时,替换有关无效函数的抱怨,而translate则在不删除额外逗号的情况下提取数据。
LATERAL VIEW explode(xpath(logs['users_updates'], replace('/subscriberUpdates/updates/field[name = "first_name"]/value/text()',",",""))) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], translate('/subscriberUpdates/updates/field[name = "first_name"]/value/text()',",",""))) fns as fn如何在没有名称值逗号的情况下提取信息?
1424324325 John Doe
1424435463 Jane Smith最终解决方案:这里是根据Jens的建议进行的最终工作查询
SELECT uid, regexp_replace(fn,","," ") as fname, regexp_replace(ln,","," ") as lname
FROM log_table
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/uid/text()')) uids as uid
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "first_name"]/value/text()')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "last_name"]/value/text()')) lns as ln;发布于 2014-02-28 09:16:30
在Hive中不支持XPath 2.0。这对你的问题有两次影响:
//value/translate(text(), ',', '') (为每个<value/>元素调用转换)是有效的XPath 2.0,但在XPath 1.0中不能这样做。另一方面,translate(//value, ',', '')返回作为单个字符串连接的所有<value/>项中的所有文本节点。replace 1.0中没有XPath函数。在Hive中传递包含逗号的值并执行字符串操作可能更容易一些。
另外请注意,因为您还没有获得XPath 2.0:translate只需要一个字符串作为第一个参数。您需要在此之前进行string-join。
https://stackoverflow.com/questions/22084184
复制相似问题