我目前得到一个包含谷歌排名靠前的网站数据的html页面,我需要分解的能力
<li class="g"><!--m--><div class="rc" data-hveid="74"><span class="altcts"></span><h3 class="r"><a href="http://airconditioning-london.co.uk/" onMouseDown="return rwt(this,'','','','2','AFQjCNH1BqTrwsbjky2ajPKpf01lUuU_JA','','0CEsQFjAB','','',event)"><em>Air Conditioning London</em> | Installation | Repairs | Maintenance |</a></h3><div class="s"><div><div class="f kv" style="white-space:nowrap"><cite class="vurls"><b>airconditioning</b>-<b>london</b>.co.uk/</cite><div class="action-menu ab_ctl"><a class="clickable-dropdown-arrow ab_button" href="#" id="am-b1" aria-label="Result details" jsaction="ab.tdd;keydown:ab.hbke;keypress:ab.mskpe" aria-expanded="false" aria-haspopup="true" role="button" data-ved="0CEwQ7B0wAQ"><span class="mn-dwn-arw"></span></a><div class="action-menu-panel ab_dropdown" jsaction="keydown:ab.hdke;mouseover:ab.hdhne;mouseout:ab.hdhue" role="menu" tabindex="-1" data-ved="0CE0QqR8wAQ"><ul><li class="action-menu-item ab_dropdownitem" role="menuitem"><a class="fl" href="http://webcache.googleusercontent.com/search?q=cache:4BhUc7PZJMgJ:airconditioning-london.co.uk/+&cd=2&hl=en&ct=clnk&gl=uk" onMouseDown="return rwt(this,'','','','2','AFQjCNHtODEWSJL7iUlNPyYez6IpTq8vUQ','','0CE4QIDAB','','',event)">Cached</a></li><li class="action-menu-item ab_dropdownitem" role="menuitem"><a class="fl" href="/search?pws=1&igu=1&gl=GB&gll=53.41058,-2.97794&near=london&q=related:airconditioning-london.co.uk/+air+conditioning+london&tbo=1&sa=X&ei=jjj6UvCmBoyHrAe18oDwAQ&ved=0CE8QHzAB">Similar</a></li></ul></div></div></div><div class="f slp"></div><span class="st"><em>Air Conditioning London</em>, We are London's best Cooling contractor. A specialist in Installation, Repairs, Service, Maintenance. Residential & Commercial.</span></div></div></div><!--n--></li>我需要能够获得以下信息,我相信PregMatch将是完成此任务的最佳方式;
标记文本-我需要此标记之间的文本< h3 >标记链接-我需要能够从h3标记文本周围的链接中获取H3 -我需要能够获取显示在H3标记中的文本
我希望有人能帮上忙。
提前感谢
发布于 2015-06-24 23:58:56
< h3 >中的文本
$h3content = preg_replace('/<h3\b[^>]>(.*?)</h3>/', '$1', $source1);
# link around the H3 < span class=st > ???
# what you mean?
# <span class="st"> text that shows in the span tag
$spanContent = preg_replace('/<span\s+class="st"\b[^>]>(.*?)</span>/', '$1', $source1);https://stackoverflow.com/questions/21706678
复制相似问题