首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >加载和解析HTML字符串

加载和解析HTML字符串
EN

Stack Overflow用户
提问于 2015-07-26 09:14:58
回答 1查看 234关注 0票数 0

当我试图解析Google的搜索结果时,我会得到一个错误。

代码

代码语言:javascript
复制
$html = file_get_contents('http://www.google.dk/search?q='.urlencode($query).'&start=0&num=100', false, $context);
                
$doc = new DOMDocument();
$doc->loadHTML($html);

错误

代码语言:javascript
复制
PHP Warning:  DOMDocument::loadHTML(): Input is not proper UTF-8, indicate encoding ! in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132

Warning: DOMDocument::loadHTML(): Input is not proper UTF-8, indicate encoding ! in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
PHP Warning:  DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
PHP Warning:  DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-07-26 09:42:44

libxml有一些内置的错误处理,这将有助于

代码语言:javascript
复制
            $query='php rocks';

            $data=file_get_contents('http://www.google.co.uk/search?q='.urlencode( $query ).'&start=0&num=100');
            libxml_use_internal_errors( true );
            $html = new DOMDocument('1.0','utf-8');
            $html->validateOnParse=false;
            $html->standalone=true;
            $html->preserveWhiteSpace=true;
            $html->strictErrorChecking=false;
            $html->substituteEntities=false;
            $html->recover=true;
            $html->formatOutput=true;
            $html->loadHTML( $data );
            $parse_errs=serialize( libxml_get_last_error() );
            libxml_clear_errors();


            $xpath=new DOMXPath( $html );
            $div=$html->getElementById('ires');
            $col=$xpath->query("ol/li/h3/a", $div );

            foreach( $col as $node ) echo $node->getAttribute('href').'<br />';

            $html=null;
            $xpath=null;
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/31635377

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档