首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用R解析BGG中的xml数据

用R解析BGG中的xml数据
EN

Stack Overflow用户
提问于 2022-09-25 22:59:52
回答 1查看 25关注 0票数 0

这个问题是这个问题的第二部分:How to parse xml lists and tables in R for BGG API

我想为这个表生成一个数据框架:

代码语言:javascript
复制
<marketplacelistings>
  <listing>
    <listdate>Thu, 19 Jan 2006 22:08:15 +0000</listdate>
    <price currency="EUR">90.00</price>
    <condition>likenew</condition>
    <notes>Siedler von Catan / Settlers of Catan-Set (Basisspiel/basic game + Erweiterungen Die Seefahrer/ Städte und Ritter/ 5-6 Spieler / extensions The Seafarers/ Cities and Knights/ 5-6 players); 3 x gespielt (Neuwertig; lediglich alle Bestandteile in EINER der Originalboxen verstaut) / 3 times played (like new; only all items in ONE original box stored); Abgabe nur komplett / selling only all together; KEIN Festpreis (nur um überhaupt etwas einzugeben) – erwarte Angebot! / no fixed price (just to complete the entries)– make an offer; Versand weltweit zu Lasten Käufer / shipping worldwide, paid by buyer</notes>
    <link href="https://boardgamegeek.com/market/product/40605" title="marketlisting"/>
  </listing>
  <listing>
    <listdate>Mon, 29 Sep 2008 15:25:32 +0000</listdate>
    <price currency="USD">34.95</price>
    <condition>new</condition>
    <notes>Brand New Sealed Board Game. Released from MayFair Games. Price is in USD. If you wish to pay in CAD...then we will convert at market rate. Shipping is $10.95 USD. We also carry the 5-6 Player Expansion that goes with this for $24.95 USD. We have sold thousands of board games across Canada. Please feel free to buy with confidence.</notes>
    <link href="https://boardgamegeek.com/market/product/116347" title="marketlisting"/>
  </listing>

这里我不知道该怎么做。这个游戏有大约100个列表,我想用它们制作一个数据帧。我从哪里开始?下面的代码不起作用,因为它提供了一个空结果。

代码语言:javascript
复制
listings_df <- do.call(rbind,lapply(
  getNodeSet(xmltop, '//marketplacelistings'),
  function(x) data.frame(
    XML:::xmlAttrsToDataFrame(xmlChildren(x)),
    row.names = NULL
  )))

这个问题的整个文件都在这里:https://boardgamegeek.com/xmlapi/boardgame/13&type=boardgame,boardgameexpansion,boardgameaccesory,rpgitem,rpgissue,videogame&versions=1&stats=1&videos=1&marketplace=1&comments=1

编辑:这是我的最后解决方案。它可能不是优雅的,但它是有效的。

代码语言:javascript
复制
marketplace_df_func <- function(xmltop){

 marketplace_df <- data.frame(
listdate = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//listdate"), xmlValue),
currency = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//price[@currency]"), xmlAttrs),
price = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//price"), xmlValue),
condition = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//condition"), xmlValue))

marketplace_df$listdate <- substr(marketplace_df$listdate, 1, 25)

return(marketplace_df)}
EN

回答 1

Stack Overflow用户

发布于 2022-09-25 23:14:55

由于这个XML现在有更多的元素而不是属性中的数据,所以只需运行可访问的xmlToDataFrame而不需要lapply循环:

代码语言:javascript
复制
library(XML) 

url <- "..."
doc <- xmlParse(readLines(url))

listings_df <- xmlToDataFrame(doc, nodes = getNodeSet(doc, "//listing"))
str(listings_df)
# 'data.frame': 103 obs. of  5 variables:
#  $ listdate : chr  "Thu, 19 Jan 2006 22:08:15 +0000" "Mon, 29 Sep 2008 15:25:32 +0000" "Sat, 18 Jul 2009 20:42:03 +0000" "Fri, 04 Dec 2009 14:25:25 +0000" ...
#  $ price    : chr  "90.00" "34.95" "49.00" "40.00" ...
#  $ condition: chr  "likenew" "new" "verygood" "new" ...
#  $ notes    : chr  "Siedler von Catan / Settlers of Catan-Set (Basisspiel/basic game + Erweiterungen Die Seefahrer/ Städte und Rit"| __truncated__ "Brand New Sealed Board Game. Released from MayFair Games.  Price is in USD.  If you wish to pay in CAD...then w"| __truncated__ "inlcudes 5/6 player expansion" "" ...
#  $ link     : chr  "" "" "" "" ...

若要绑定基础属性,请使用以下特殊方法:

代码语言:javascript
复制
listings_df <- data.frame(
    xmlToDataFrame(doc, nodes = getNodeSet(doc, "//listing")),
    XML:::xmlAttrsToDataFrame(getNodeSet(doc, "//listing/price")),
    XML:::xmlAttrsToDataFrame(getNodeSet(doc, "//listing/link")),
    row.names = NULL
)
str(listings_df)
# 'data.frame': 103 obs. of  8 variables:
#  $ listdate : chr  "Thu, 19 Jan 2006 22:08:15 +0000" "Mon, 29 Sep 2008 15:25:32 +0000" "Sat, 18 Jul 2009 20:42:03 +0000" "Fri, 04 Dec 2009 14:25:25 +0000" ...
#  $ price    : chr  "90.00" "34.95" "49.00" "40.00" ...
#  $ condition: chr  "likenew" "new" "verygood" "new" ...
#  $ notes    : chr  "Siedler von Catan / Settlers of Catan-Set (Basisspiel/basic game + Erweiterungen Die Seefahrer/ Städte und Rit"| __truncated__ "Brand New Sealed Board Game. Released from MayFair Games.  Price is in USD.  If you wish to pay in CAD...then w"| __truncated__ "inlcudes 5/6 player expansion" "" ...
#  $ link     : chr  "" "" "" "" ...
#  $ currency : chr  "EUR" "USD" "EUR" "EUR" ...
#  $ href     : chr  "https://boardgamegeek.com/market/product/40605" "https://boardgamegeek.com/market/product/116347" "https://boardgamegeek.com/market/product/158433" "https://boardgamegeek.com/market/product/181379" ...
#  $ title    : chr  "marketlisting" "marketlisting" "marketlisting" "marketlisting" ...
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73848343

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档