我正在尝试通过从XML文件中提取停机数据并将每次停机与特定仪表相关联来构建数据框架。数据的简化示例如下:
<MeterReadings Irn="311" Source="Remote">
<Meter MeterIrn="311" IsActive="true" />
<ConsumptionData>
</ConsumptionData>
<IntervalData>
<Reading TimeStamp="2016-10-13" />
</IntervalData>
<EventData>
<EventSpec Type="Outage Detected from Interval Data" Category="Full Power Outage / Restoration" />
<Event TimeStamp="2014-10-31 14:17:40" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:16:20" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:16:16" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:15:12" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:12:00" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data">
</Event>
</EventData>
</MeterReadings>我想要的是设置一个数据帧,它将在第一列中包含仪表编号,在第二列中包含每次停机的时间。
我尝试过使用以下表达式:
outage.inv <- data.frame(xpathSApply(doc, '//Event[contains(@EventInfo, "Outage detected from Interval Data")]/ancestor::MeterReadings', xmlGetAttr, "Irn"))
outage.df <- data.frame(xpathSApply(doc, '//MeterReadings/EventData/EventSpec[@Type="Outage Detected from Interval Data"]/following-sibling::Event', xmlGetAttr, "TimeStamp"))
outage.inv <- cbind(outage.inv, outage.df)但是第一个表达式只拉出仪表编号一次,所以变量的总数不匹配。在这种情况下,1米数和5次停机时间。有没有一种方法可以让祖先属性在子孙属性中每次出现时都被提取出来?
我已经检查了以下答案,但还没有弄清楚。
XPath to select element based on childs child value
R: How to get parent attributes and node values at the site time?
任何帮助都将不胜感激。
发布于 2016-10-14 03:32:30
另一种方法。
以下是数据:
txt <- ' <MeterReadings Irn="311" Source="Remote">
<Meter MeterIrn="311" IsActive="true" />
<ConsumptionData>
</ConsumptionData>
<IntervalData>
<Reading TimeStamp="2016-10-13" />
</IntervalData>
<EventData>
<EventSpec Type="Outage Detected from Interval Data" Category="Full Power Outage / Restoration" />
<Event TimeStamp="2014-10-31 14:17:40" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:16:20" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:16:16" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:15:12" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:12:00" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data">
</Event>
</EventData>
</MeterReadings>'我们可以用一种不同的方式处理记录:
library(xml2)
library(purrr)
library(dplyr)
doc <- read_xml(txt)
xml_find_all(doc, "//MeterReadings") %>%
map_df(function(x) {
meter <- xml_attr(x, "Irn")
xml_find_all(x, "//Event[contains(@EventInfo, 'Outage')]") %>%
map_df(function(y) {
data_frame(
meter=meter,
timestamp=xml_attr(y, "TimeStamp"),
discovered_at=xml_attr(y, "DiscoveredAt")
)
})
})这将生成:
## # A tibble: 5 × 3
## meter timestamp discovered_at
## <chr> <chr> <chr>
## 1 311 2014-10-31 14:17:40 2014-11-01 12:05:28
## 2 311 2014-10-31 14:16:20 2014-11-01 12:05:28
## 3 311 2014-10-31 14:16:16 2014-11-01 12:05:28
## 4 311 2014-10-31 14:15:12 2014-11-01 12:05:28
## 5 311 2014-10-31 14:12:00 2014-11-01 12:05:28发布于 2016-10-14 21:24:26
已修改答案,以过滤计量器和时间戳,以便不会重复所有计量器的所有时间戳:
outage.df <- xml_find_all(doc, "//MeterReadings[EventData/Event[contains(@EventInfo, 'Outage')]]") %>%
map_df(function(x) {
meter <- xml_attr(x, "Irn")
xml_find_all(x, paste("//MeterReadings[@Irn=",meter,"]/EventData/Event[contains(@EventInfo, 'Outage')]")) %>%
map_df(function(y) {
data_frame(
meter=meter,
timestamp=xml_attr(y, "TimeStamp"),
discovered_at=xml_attr(y, "DiscoveredAt")
)
})
})https://stackoverflow.com/questions/40028173
复制相似问题