我希望遍历并收集具有data-table属性的节点,提取它的值,然后使用data-field或其他属性获取它的子节点,并提取它的值,该值将被保存为一个列表。
从下面的Html示例中,我在dom-tree中设置了dom-属性的锚点,这将在遍历和提取它们之后转换为模型结构。
<body>
<div class="wrap" data-table="page"> Sample Text <p data-field="heading" class="format" >Welcome to this page</p>
<div class="flex-grid generic-card">
<h1 class="card " data-field="intro">Text </h1>
<div class="card " data-field="body"></div>
</div>
</div>我期望最终的结果是一个类似于(page . ("title" "intro" "body"))的平面列表形式
使用下面的代码,我能够遍历节点并提取'data-table',但问题是,我无法提取附加到data-table的data-field。我试图使用递归方法,但没有成功,该方法包括重复'dom-struct'和dom-search函数的示例。我注意到的是,libxml-parse-html-region''在通过dom树进行解析后返回空字符串,并在dom节点旁边加上换行符,从而生成一个错误。
此代码的目的是递归地从树中提取节点
(require 'dom)
(defun dom-struct (x)
(print (dom-attr x 'data-table)) ; extract the data-table attribute
(print (dom-tag (dom-node x))) ;extract dom-tag
(print (dom-children (dom-node x))) ; extract dom-children attached to a node but don't know how to extract data-field attribute
(print (dom-search (dom-children (dom-node x)) (lambda (node) (assq 'data-attribute (cadr node)))))
(mapconcat #'dom-struct (dom-children (dom-node x)) ""))
(defun macro-structify (tag-entries)
(with-temp-buffer
(insert tag-entries)
(let* ((mytags (libxml-parse-html-region (point-min) (point-max))))
(dom-struct (car (dom-by-tag mytags 'body))))))
(let ((myskel "<html>
<head>
<title>Demo: Gradient Slide</title>
</head>
<link href=\"https://fonts.googleapis.com/css?family=Nunito+Sans\" rel=\"stylesheet\">
<link rel=\"stylesheet\" href=\"dist/build.css\">
<body data-table=\"layout\">
<header data-field=\"title\">
<h1>Skeleton Screen</h1>
</header>
<div class=\"wrap\" data-table=\"page\"> Sample Text <p data-field=\"heading\" class=\"format\" data-attribute=\"somethingsomething\">Welcome to this page</p>
<div class=\"flex-grid generic-card\">
<div class=\"card loading\" data-field=\"intro\">Text </div>
<div class=\"card loading\" data-field=\"body\"></div>
</div>
</div>
</body>
</html>"))
(macro-structify myskel))发布于 2022-04-01 15:37:45
下面是使用esxml包中的esxml-查询的解决方案。它查找具有data-field属性的所有具有data-table属性的div节点的子节点,然后将它们的属性值收集到列表中。
(require 'dom)
(require 'esxml-query)
(let* ((myskel "<html>
<head>
<title>Demo: Gradient Slide</title>
</head>
<link href=\"https://fonts.googleapis.com/css?family=Nunito+Sans\" rel=\"stylesheet\">
<link rel=\"stylesheet\" href=\"dist/build.css\">
<body data-table=\"layout\">
<header data-field=\"title\">
<h1>Skeleton Screen</h1>
</header>
<div class=\"wrap\" data-table=\"page\"> Sample Text <p data-field=\"heading\" class=\"format\" data-attribute=\"somethingsomething\">Welcome to this page</p>
<div class=\"flex-grid generic-card\">
<div class=\"card loading\" data-field=\"intro\">Text </div>
<div class=\"card loading\" data-field=\"body\"></div>
</div>
</div>
</body>
</html>")
(dom (with-temp-buffer
(insert myskel)
(libxml-parse-html-region (point-min) (point-max))))
(table-node (esxml-query "div[data-table]" dom))
(model-nodes (esxml-query-all "[data-field]" table-node))
(model-data-table (dom-attr table-node 'data-table))
(model-data-fields (mapcar (lambda (node) (dom-attr node 'data-field)) model-nodes)))
(cons model-data-table model-data-fields))
;; => ("page" "heading" "intro" "body")结果与您指定的结果不同,原因如下:
data-table属性的div标记之前包含一个带有data-table属性的div标记,但是您的HTML片段查看了后者,因此我已经更改了代码,以查找带有data-table属性的div标记。header标记,其data-field属性设置为"title“(预期字段),但它是body标记的一部分,data-table属性设置为"layout",而不是data-table属性设置为"page”的div标记(实际字段)(foo . (bar baz))与(foo bar baz)相同,通常以后一种形式打印。https://stackoverflow.com/questions/71705129
复制相似问题