我有一个医疗设备生成的XML数据,我正在尝试解析这些数据,但我无论如何也搞不懂它。下面是XML数据的一个示例:
<?xml version="1.0" encoding="UTF-8"?>
<Report Type="CprReport" Version="1.3" Generated="2020-05-06T09:08:47.94976" IncidentID="F0000001" PatientID="DE-IDENTIFIED" SoftwareVersion="11.0.0.1036">
<DeviceID>BLANK</DeviceID>
<DeviceType>LP15</DeviceType>
<CPRAnnotationEdited>false</CPRAnnotationEdited>
<PowerOn>2020-04-18T20:57:55.00000</PowerOn>
<DeviceConfiguration>2L355RRB0200UR</DeviceConfiguration>
<TimeAdjustment>PT1S</TimeAdjustment>
<CPRStatistics SACVersion="10" CPRPauseThreshold="10" CompressionPauseThreshold="3" MinimumTimeInterval="30" MinimumCompressions="5">
<OverallStatistic>
<AverageCompressionRatePerMin>96.3298058454348</AverageCompressionRatePerMin>
<AverageVentilationRatePerMin>9.34405124270701</AverageVentilationRatePerMin>
<MedianCompressionRatePerMin>99.8261838893409</MedianCompressionRatePerMin>
<MedianVentilationRatePerMin>6.51749943203332</MedianVentilationRatePerMin>
<TotalTimeCompressionsDuringPromptedCprSec>0</TotalTimeCompressionsDuringPromptedCprSec>
<TotalTimeCompressionsDuringValidSec>1277.72862806776</TotalTimeCompressionsDuringValidSec>
<TotalTimeCprDuringPromptedCprSec>0</TotalTimeCprDuringPromptedCprSec>
<TotalTimeCprDuringValidSec>1298.04384154134</TotalTimeCprDuringValidSec>
<TotalTimePromptedCprSec>0</TotalTimePromptedCprSec>
<TotalTimeValidSec>1339.14938235198</TotalTimeValidSec>
<TotalTimeValidSecEx>1339.14938235198</TotalTimeValidSecEx>
</OverallStatistic>
<IntervalStatistics>
<CPRStatisticsItem Interval="1" ResponsibleForCPR="" CPRDurationSec="0" PauseDurationSec="88.355" ReasonForPause="" IntervalComments="" AverageCompressionRatePerMin="-1" AverageVentilationRatePerMin="-1" MedianCompressionRatePerMin="-1" MedianVentilationRatePerMin="-1" TotalTimeCompressionsDuringPromptedCprSec="0" TotalTimeCompressionsDuringValidSec="0.016383236672237" TotalTimeCprDuringPromptedCprSec="0" TotalTimeCprDuringValidSec="0" TotalTimePromptedCprSec="0" TotalTimeValidSec="0.016383236672237" TotalTimeValidSecEx="0.016383236672237"/>
<CPRStatisticsItem Interval="2" ResponsibleForCPR="" CPRStartTime="2020-04-18T20:59:23.37100" CPREndTime="2020-04-18T21:00:53.69200" CPRDurationSec="90.321" PauseDurationSec="12.337" ReasonForPause="" IntervalComments="" AverageCompressionRatePerMin="95.8528439195659" AverageVentilationRatePerMin="0" MedianCompressionRatePerMin="127.518878980892" MedianVentilationRatePerMin="0" TotalTimeCompressionsDuringPromptedCprSec="0" TotalTimeCompressionsDuringValidSec="80.0812608538943" TotalTimeCprDuringPromptedCprSec="0" TotalTimeCprDuringValidSec="90.3207837740424" TotalTimePromptedCprSec="0" TotalTimeValidSec="102.673744224909" TotalTimeValidSecEx="102.673744224909"/>
<CPRStatisticsItem Interval="3" ResponsibleForCPR="" CPRStartTime="2020-04-18T21:01:06.04500" CPREndTime="2020-04-18T21:05:05.83000" CPRDurationSec="239.785" PauseDurationSec="18.89" ReasonForPause="" IntervalComments="" AverageCompressionRatePerMin="91.8527379821395" AverageVentilationRatePerMin="0" MedianCompressionRatePerMin="99.4940232081911" MedianVentilationRatePerMin="0" TotalTimeCompressionsDuringPromptedCprSec="0" TotalTimeCompressionsDuringValidSec="239.78505193486" TotalTimeCprDuringPromptedCprSec="0" TotalTimeCprDuringValidSec="239.78505193486" TotalTimePromptedCprSec="0" TotalTimeValidSec="258.691307054622" TotalTimeValidSecEx="258.691307054622"/>
<CPRStatisticsItem Interval="4" ResponsibleForCPR="" CPRStartTime="2020-04-18T21:05:24.73600" CPREndTime="2020-04-18T21:18:09.50600" CPRDurationSec="764.77" PauseDurationSec="9.322" ReasonForPause="" IntervalComments="" AverageCompressionRatePerMin="97.1202954559885" AverageVentilationRatePerMin="8.60363351605325" MedianCompressionRatePerMin="99.6300586675938" MedianVentilationRatePerMin="8.55159640102828" TotalTimeCompressionsDuringPromptedCprSec="0" TotalTimeCompressionsDuringValidSec="754.693797306596" TotalTimeCprDuringPromptedCprSec="0" TotalTimeCprDuringValidSec="764.769487860022" TotalTimePromptedCprSec="0" TotalTimeValidSec="774.107932763197" TotalTimeValidSecEx="774.107932763197"/>
<CPRStatisticsItem Interval="5" ResponsibleForCPR="" CPRStartTime="2020-04-18T21:18:18.84400" CPREndTime="2020-04-18T21:21:42.01300" CPRDurationSec="203.169" PauseDurationSec="0.491000000000014" ReasonForPause="" IntervalComments="" AverageCompressionRatePerMin="99.283111575899" AverageVentilationRatePerMin="9.42747647011503" MedianCompressionRatePerMin="99.4707780795654" MedianVentilationRatePerMin="9.45028304169019" TotalTimeCompressionsDuringPromptedCprSec="0" TotalTimeCompressionsDuringValidSec="203.152134735738" TotalTimeCprDuringPromptedCprSec="0" TotalTimeCprDuringValidSec="203.168517972411" TotalTimePromptedCprSec="0" TotalTimeValidSec="203.660015072578" TotalTimeValidSecEx="203.660015072578"/>
</IntervalStatistics>
</CPRStatistics>
<CPRShockPauseStatistics/>
</Report>我希望从<OverallStatistics>和<IntervalStatistics>生成两个数据帧。
我正在使用XML包解析数据,下面是我到目前为止所做的工作:
df <- xmlParse(file = "file.xml", useInternalNodes = TRUE)
df_1 <- xmlToDataFrame(df, nodes = getNodeSet(df, "//CPRStatistics"))这就是我被卡住的地方。
发布于 2020-05-06 23:22:55
通常,package xml2更容易使用,尽管package XML至少会尝试在合理的情况下创建数据帧。
从<OverallStatistic>获取的节点集更容易做成数据帧:
library(XML)
df <- xmlParse(file = "file.xml", useInternalNodes = TRUE)
overall <- xmlToDataFrame(df, nodes = getNodeSet(df, "//OverallStatistic"))此数据框中只有一行:
overall
#> AverageCompressionRatePerMin AverageVentilationRatePerMin
#> 1 96.3298058454348 9.34405124270701
#> MedianCompressionRatePerMin MedianVentilationRatePerMin
#> 1 99.8261838893409 6.51749943203332
#> TotalTimeCompressionsDuringPromptedCprSec TotalTimeCompressionsDuringValidSec
#> 1 0 1277.72862806776
#> TotalTimeCprDuringPromptedCprSec TotalTimeCprDuringValidSec TotalTimePromptedCprSec
#> 1 0 1298.04384154134 0
#> TotalTimeValidSec TotalTimeValidSecEx
#> 1 1339.14938235198 1339.14938235198第二个节点<IntervalStatistics>更难解析,因为值存储在属性中,而不是作为文本节点。这需要您找到所有的<CPRStatisticsItem>节点,将它们的所有属性剥离到一个列表中,然后将它们一起rbind到一个数据框中。在本例中,由于数据框有如此多的列,为了便于打印到屏幕,我将其转换为tibble而不是数据框:
dplyr::as_tibble(do.call(rbind, lapply(getNodeSet(df, "//CPRStatisticsItem")[-1], xmlAttrs)))
#> # A tibble: 4 x 19
#> Interval ResponsibleForC~ CPRStartTime CPREndTime CPRDurationSec PauseDurationSec
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2 "" 2020-04-18T~ 2020-04-1~ 90.321 12.337
#> 2 3 "" 2020-04-18T~ 2020-04-1~ 239.785 18.89
#> 3 4 "" 2020-04-18T~ 2020-04-1~ 764.77 9.322
#> 4 5 "" 2020-04-18T~ 2020-04-1~ 203.169 0.4910000000000~
#> # ... with 13 more variables: ReasonForPause <chr>, IntervalComments <chr>,
#> # AverageCompressionRatePerMin <chr>, AverageVentilationRatePerMin <chr>,
#> # MedianCompressionRatePerMin <chr>, MedianVentilationRatePerMin <chr>,
#> # TotalTimeCompressionsDuringPromptedCprSec <chr>,
#> # TotalTimeCompressionsDuringValidSec <chr>,
#> # TotalTimeCprDuringPromptedCprSec <chr>, TotalTimeCprDuringValidSec <chr>,
#> # TotalTimePromptedCprSec <chr>, TotalTimeValidSec <chr>, TotalTimeValidSecEx <chr>https://stackoverflow.com/questions/61638148
复制相似问题