首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Hive - XPATH

Hive - XPATH
EN

Stack Overflow用户
提问于 2017-05-16 13:36:32
回答 3查看 1K关注 0票数 1

我有下面的XML

代码语言:javascript
复制
<qr>
    <Trade xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ss="http://www.mycomp.com/mycall/schema/1/durables/ss" xmlns:ss-raw="http://www.mycomp.com/api.dsl/tm/2/ss-raw/v1.0">
        <TradeId>
            <ss:SYSTEMID>1466413528</ss:SYSTEMID>
        </TradeId>
        <InstrumentId xsi:nil="true">test_instrument</InstrumentId>
        <TraderSourceSystemName xsi:nil="true">akjsdfklas</TraderSourceSystemName>
    </Trade>    
</qr>

我正在尝试使用

代码语言:javascript
复制
CREATE EXTERNAL TABLE sample(TradeId STRING,
    InstrumentId STRING,
    TraderSourceSystemName STRING
    )
    ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
    WITH SERDEPROPERTIES (
    "column.xpath.TradeId"="Trade/TradeId",
    "column.xpath.InstrumentId"="Trade/InstrumentId/text()",
      "column.xpath.TraderSourceSystemName"="Trade/TraderSourceSystemName/text()"
    )
    STORED AS
    INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
    LOCATION "hdfs://server:port/user/sl/sample/"
    TBLPROPERTIES (

  "xmlinput.start"="<Trade xmlns",
  "xmlinput.end"="</Trade>"
      );

当我从表中选择*时,第一个字段如下所示

代码语言:javascript
复制
<pre>
<TradeId><ss:SYSTEMID xmlns:ss="...namespace...">1466413528</ss:SYSTEMID></TradeId>

怎样才能像- 1466413528这样的行业

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2017-05-16 14:41:27

代码语言:javascript
复制
"column.xpath.TradeId" = "Trade/TradeId/*[local-name(.)='SYSTEMID']/text()"
代码语言:javascript
复制
create external table sample
(
    tradeid                 string
   ,instrumentid            string
   ,tradersourcesystemname  string
)
row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'

with serdeproperties 
(
    "column.xpath.TradeId"                  = "Trade/TradeId/*[local-name(.)='SYSTEMID']/text()"
   ,"column.xpath.InstrumentId"             = "Trade/InstrumentId/text()"
   ,"column.xpath.TraderSourceSystemName"   = "Trade/TraderSourceSystemName/text()"
)

stored as
inputformat     'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
outputformat    'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'

tblproperties 
(
    "xmlinput.start"    = "<Trade xmlns"
   ,"xmlinput.end"      = "</Trade>"
)
;
代码语言:javascript
复制
select * from sample
;
代码语言:javascript
复制
+------------+-----------------+------------------------+
|  tradeid   |  instrumentid   | tradersourcesystemname |
+------------+-----------------+------------------------+
| 1466413528 | test_instrument | akjsdfklas             |
+------------+-----------------+------------------------+

限制 ..。 目前只支持XPath 1.0规范。元素和属性的限定名的本地部分在处理Hive字段名时使用。名称空间前缀被忽略。 https://github.com/dvasilen/Hive-XML-SerDe/wiki/XML-data-sources

票数 0
EN

Stack Overflow用户

发布于 2017-05-16 14:28:56

将xpath //Trade/TradeId/ss:SYSTEMID/text()用于column.xpath.TradeId

票数 0
EN

Stack Overflow用户

发布于 2018-03-08 17:20:33

如何在此xml的op上创建一个表。

代码语言:javascript
复制
    <root>
<root1>
<id>4545482361</id>`enter code here`
<joiningdate>1/3/2010</joiningdate>
<Segments>
<Segment xse:type="manager">
<cityworked>Hyd</cityworked>
<reports>john</reports>
<salary>150000</salary>
<datestarted>1/3/2012</datestarted>
</Segment>
<Segment xse:type="manager">
<cityworked>Hyd</cityworked>
<reports>mike</reports>
<salary>225000</salary>
<datestarted>1/9/2014</datestarted>
</Segment>
<Segment xse:type="VP">
<cityworked>mumbai</cityworked>
<datestarted>1/9/2014</datestarted>
<subemployees>
<Fname>ram</Fname>
<Lname>Achanta</Lname>
<Desgination>Director of IT</Desgination>
</subemployees>
</Segment>
<Segment xse:type="SVP">
<Staus>currentposition</status>
<numberofemployees>10</numberofemployees>
</Segment>
</Segments>
</root1>
</root>
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/44003069

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档