我想修改<title> & </title> and <p> & </p>之间的文本。此外,它还可以在数据中重复多次。
<title> DTC Descriptor </title>
<p>This diagnostic procedure supports the following DTC:</p>
<title> Conditions for Running the DTC </title>
<p>This is good</p>所需产出:
<title>DTC Descriptor</title>
<p>This diagnostic procedure supports the following DTC:</p>
<title>Conditions for Running the DTC</title>
<p>This is good</p>我已经找到了trim函数,但是我只需要在标签之间应用这个函数。
谢谢!
发布于 2015-07-02 11:07:55
这里有两个选项-假设html包含示例文本:
library(XML)
doc <- htmlParse(html, asText = TRUE)
invisible(lapply(getNodeSet(doc, "//text()"), function(txt) xmlValue(txt) <- xmlValue(txt, trim = TRUE) ))
doc
# <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# <html>
# <head><title>DTC Descriptor</title></head>
# <body>
# <p>This diagnostic procedure supports the following DTC:</p><title>Conditions for Running the DTC</title>
# <p>This is good</p>
# </body>
# </html>
cat(gsub("(<[^>]+>)\\s*(.*?)\\s*(</[^>]+>)", "\\1\\2\\3", html))
# <title>DTC Descriptor</title>
# <p>This diagnostic procedure supports the following DTC:</p>
# <title>Conditions for Running the DTC</title>
# <p>This is good</p>https://stackoverflow.com/questions/31182225
复制相似问题