我可以使用JsontoHtml库将JSON转换为HTML。现在,我需要将当前的HTML转换为JSON,如本站点所示。当查看代码时,我发现了以下脚本:
<script>
$(function(){
//HTML to JSON
$('#btn-render-json').click(function() {
//Set html output
$('#html-output').html( $('#html-input').val() );
//Process to JSON and format it for consumption
$('#html-json').html( FormatJSON(toTransform($('#html-output').children())) );
});
});
//Convert obj or array to transform
function toTransform(obj) {
var json;
if( obj.length > 1 )
{
json = [];
for(var i = 0; i < obj.length; i++)
json[json.length++] = ObjToTransform(obj[i]);
} else
json = ObjToTransform(obj);
return(json);
}
//Convert obj to transform
function ObjToTransform(obj)
{
//Get the DOM element
var el = $(obj).get(0);
//Add the tag element
var json = {'tag':el.nodeName.toLowerCase()};
for (var attr, i=0, attrs=el.attributes, l=attrs.length; i<l; i++){
attr = attrs[i];
json[attr.nodeName] = attr.value;
}
var children = $(obj).children();
if( children.length > 0 ) json['children'] = [];
else json['html'] = $(obj).text();
//Add the children
for(var c = 0; c < children.length; c++)
json['children'][json['children'].length++] = toTransform(children[c]);
return(json);
}
//Format JSON (with indents)
function FormatJSON(oData, sIndent) {
if (arguments.length < 2) {
var sIndent = "";
}
var sIndentStyle = " ";
var sDataType = RealTypeOf(oData);
// open object
if (sDataType == "array") {
if (oData.length == 0) {
return "[]";
}
var sHTML = "[";
} else {
var iCount = 0;
$.each(oData, function() {
iCount++;
return;
});
if (iCount == 0) { // object is empty
return "{}";
}
var sHTML = "{";
}
// loop through items
var iCount = 0;
$.each(oData, function(sKey, vValue) {
if (iCount > 0) {
sHTML += ",";
}
if (sDataType == "array") {
sHTML += ("\n" + sIndent + sIndentStyle);
} else {
sHTML += ("\"" + sKey + "\"" + ":");
}
// display relevant data type
switch (RealTypeOf(vValue)) {
case "array":
case "object":
sHTML += FormatJSON(vValue, (sIndent + sIndentStyle));
break;
case "boolean":
case "number":
sHTML += vValue.toString();
break;
case "null":
sHTML += "null";
break;
case "string":
sHTML += ("\"" + vValue + "\"");
break;
default:
sHTML += ("TYPEOF: " + typeof(vValue));
}
// loop
iCount++;
});
// close object
if (sDataType == "array") {
sHTML += ("\n" + sIndent + "]");
} else {
sHTML += ("}");
}
// return
return sHTML;
}
//Get the type of the obj (can replace by jquery type)
function RealTypeOf(v) {
if (typeof(v) == "object") {
if (v === null) return "null";
if (v.constructor == (new Array).constructor) return "array";
if (v.constructor == (new Date).constructor) return "date";
if (v.constructor == (new RegExp).constructor) return "regex";
return "object";
}
return typeof(v);
}
</script>

现在,我需要在PHP中使用以下函数。我可以得到HTML数据。我现在所需要的就是将JavaScript函数转换为PHP函数。这个是可能的吗?我的主要怀疑如下:
toTransform()的主要输入是一个对象。是否可以通过PHP将HTML转换为对象?请给我提个主意。
当我试图按照给出的答案将脚本标记转换为json时,我会得到错误。当我在json2html站点上尝试它时,它显示如下:

。。如何实现相同的解决方案?
发布于 2014-05-02 11:49:45
如果您能够获得一个表示HTML的DOMDocument对象,那么您只需要递归地遍历它并构造您想要的数据结构。
将HTML文档转换为DOMDocument应该非常简单,如下所示:
function html_to_obj($html) {
$dom = new DOMDocument();
$dom->loadHTML($html);
return element_to_obj($dom->documentElement);
}然后,简单地遍历$dom->documentElement,给出您所描述的结构,如下所示:
function element_to_obj($element) {
$obj = array( "tag" => $element->tagName );
foreach ($element->attributes as $attribute) {
$obj[$attribute->name] = $attribute->value;
}
foreach ($element->childNodes as $subElement) {
if ($subElement->nodeType == XML_TEXT_NODE) {
$obj["html"] = $subElement->wholeText;
}
else {
$obj["children"][] = element_to_obj($subElement);
}
}
return $obj;
}测试用例
$html = <<<EOF
<!DOCTYPE html>
<html lang="en">
<head>
<title> This is a test </title>
</head>
<body>
<h1> Is this working? </h1>
<ul>
<li> Yes </li>
<li> No </li>
</ul>
</body>
</html>
EOF;
header("Content-Type: text/plain");
echo json_encode(html_to_obj($html), JSON_PRETTY_PRINT);输出
{
"tag": "html",
"lang": "en",
"children": [
{
"tag": "head",
"children": [
{
"tag": "title",
"html": " This is a test "
}
]
},
{
"tag": "body",
"html": " \n ",
"children": [
{
"tag": "h1",
"html": " Is this working? "
},
{
"tag": "ul",
"children": [
{
"tag": "li",
"html": " Yes "
},
{
"tag": "li",
"html": " No "
}
],
"html": "\n "
}
]
}
]
}对更新问题的回答
上面提出的解决方案不适用于<script>元素,因为它被解析为不是DOMText,而是DOMCharacterData对象。这是因为PHP中的DOM扩展基于,,它将HTML解析为HTML4.0,而在HTML4.0中,<script>的内容是CDATA类型的,而不是#PCDATA类型的。
对于这个问题,您有两个解决方案。
LIBXML_NOCDATA标志添加到。 (我并不完全确定这对于。解析器是否有效)。$subElement->nodeType时添加一个附加测试。递归函数将变成:function element_to_obj($element) {
echo $element->tagName, "\n";
$obj = array( "tag" => $element->tagName );
foreach ($element->attributes as $attribute) {
$obj[$attribute->name] = $attribute->value;
}
foreach ($element->childNodes as $subElement) {
if ($subElement->nodeType == XML_TEXT_NODE) {
$obj["html"] = $subElement->wholeText;
}
elseif ($subElement->nodeType == XML_CDATA_SECTION_NODE) {
$obj["html"] = $subElement->data;
}
else {
$obj["children"][] = element_to_obj($subElement);
}
}
return $obj;
}如果您碰到另一个这种类型的bug,首先要检查节点$subElement的类型,因为存在许多其他可能性 --我的简短示例函数没有处理。
此外,您还会注意到,为了能够为libxml2构建DOM,必须修复HTML中的错误。这就是为什么即使不指定<html>和<head>元素也会出现的原因。您可以通过使用LIBXML_HTML_NOIMPLIED标志来避免这种情况。
脚本的测试用例
$html = <<<EOF
<script type="text/javascript">
alert('hi');
</script>
EOF;
header("Content-Type: text/plain");
echo json_encode(html_to_obj($html), JSON_PRETTY_PRINT);输出
{
"tag": "html",
"children": [
{
"tag": "head",
"children": [
{
"tag": "script",
"type": "text\/javascript",
"html": "\n alert('hi');\n "
}
]
}
]
}发布于 2018-05-30 17:17:35
我假设html字符串存储在$html变量中。所以你应该做:
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('*') as $el){
$result[] = ["type" => $el->tagName, "value" => $el->nodeValue];
}
$json = json_encode($result, JSON_UNESCAPED_UNICODE);备注:该算法不支持父-子标记,并将所有标记作为父元素获取,并在排序队列中解析所有标记。当然,您可以通过学习DOMDocument类特性来实现这个特性。
发布于 2022-10-22 02:59:07
我编写这篇文章是为了将HTML表单标记转换为JSON对象。你也许能从这一点上建立起来。
class HtmlToJson {
public $html;
public $filter;
function __construct($html, $filter) {
$this->dom = new DOMDocument();
$this->dom->loadHTML( $html );
$this->jsonObj = array('form_tag_attrs'=>array(), 'form_values'=>array());
$this->filter = $filter;
}
function recursivePair($element, $tagName) {
if ( isset( $element->attributes ) ) {
$nameAttr = $element->getAttribute('name');
if ($nameAttr) {
$this->jsonObj['form_values'][$nameAttr] = $element->getAttribute('value');
}
if ($element->nodeName === $tagName) {
foreach ( $element->attributes as $attribute ) {
$this->jsonObj['form_tag_attrs'][ $attribute->name ] = $attribute->value;
}
}
}
if ( isset( $element->childNodes ) ) {
foreach ( $element->childNodes as $subElement ) {
$this->recursivePair( $subElement, $tagName );
}
}
}
function json() {
$element = ($this->filter ? $this->dom->getElementsByTagName($this->filter)->item(0) : $this->dom->documentElement);
$this->recursivePair($element, $this->filter);
return $this->jsonObj;
}
}
$formJson = new HtmlToJson($curlResult, 'form');
echo json_encode($formJson->json());https://stackoverflow.com/questions/23062537
复制相似问题