文章/答案/技术大牛

发布

社区首页 >问答首页 >使用php正则表达式从html文档的live url获取任何用户输入的html标记

问使用php正则表达式从html文档的live url获取任何用户输入的html标记
EN

Stack Overflow用户

提问于 2011-06-24 14:18:00

回答 3查看 190关注 0票数 1

我想获取任何元，标题，脚本，链接标签，在HTML页面上，这是我写的程序(不正确，但会给专家的想法)。

<?php
function get_tag($tag_name, $url)
{
    $content = file_get_contents($url);

    // this is not correct : regular expression please //
    preg_match_all($tag_name, $content, $matches);

    return $matches;
}

print_r(get_tag('title', 'http://stackoverflow.com'));

?>

输出应该是这样的：

Array
(
    [0] => title
    [1] => Stack Overflow
)

谢谢！！

php

html

回答 3

Stack Overflow用户

回答已采纳

发布于 2011-06-24 15:09:56

function get_tags($tag, $url) {
//allow for improperly formatted html
libxml_use_internal_errors(true);
// Instantiate DOMDocument Class to parse html DOM
$xml = new DOMDocument();

// Load the file into the DOMDocument object
$xml->loadHTMLFile($url);

// Empty array to hold all links to return
$tags = array();

//Loop through all tags of the given type and store details in the array
foreach($xml->getElementsByTagName($tag) as $tag_found) {
      if ($tag_found->tagName == "meta")
      {
        $tags[] = array("meta_name" => $tag_found->getAttribute("name"), "meta_value" => $tag_found->getAttribute("content"));
      }
      else {
    $tags[] = array('tag' => $tag_found->tagName, 'text' => $tag_found->nodeValue);
     }
}

//Return the links
return $tags;
}

这个答案实际上会给出标记的名称作为第一个数组值，而不是" array“，并且还会停止警告。

票数 1

Stack Overflow用户

发布于 2011-06-24 14:37:32

在使用正则表达式解析HTML之前，您需要读取来自this question的第一个答案。

尝试使用DOMDocument，如下所示：

<?

function get_tags($tags, $url) {

    // Create a new DOM Document to hold our webpage structure
    $xml = new DOMDocument();

    // Load the url's contents into the DOM
    $xml->loadHTMLFile($url);

    // Empty array to hold all links to return
    $tags_found = array();

    //Loop through each <$tags> tag in the dom and add it to the $tags_found array
    foreach($xml->getElementsByTagName($tags) as $tag) {
        $tags_found[] = array('tag' => $tags, 'text' => $tag->nodeValue);
    }

    //Return the links
    return $tags_found;
}

print_r(get_tags('title', 'http://stackoverflow.com'));

?>

票数 1

Stack Overflow用户

发布于 2011-06-24 14:56:44

由于这些标记不能嵌套，因此不需要解析。

#<(meta|title|script|link)(?: .*?)?(?:/>|>(.*?)<(?:/\1)>)#is

如果你在你的函数中使用它，你将不得不写$tag_name而不是"meta|title|script|link“。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/6464160

复制

相似问题

问使用php正则表达式从html文档的live url获取任何用户输入的html标记
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用php正则表达式从html文档的live url获取任何用户输入的html标记EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用php正则表达式从html文档的live url获取任何用户输入的html标记
EN