我有个问题要问。我正在解析网页,下面是我当前的代码:
<?php
// Include the library
include('simple_html_dom.php');
// Retrieve the DOM from a given URL
$html = file_get_html('siteone.htm');
// //Dates
echo 'Dates:<br />';
foreach($html->find('div.collectionLog td') as $e) {
$text = $e->innertext;
$string = preg_replace("/\([^)]+\)/","",$text);
echo $string . '<br>';
}
?>下面是HTML代码:
<div class="data-container collectionLog">
<h3>Collection Log</h3>
<div id="lcLoanPerf2">
<table id="lcLoanPerfTable2" class="plain-table">
<tbody>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
</tbody>
</table>
</div> 现在我要做的是将信息插入到数据库中。我知道如何轻松地做到这一点,但是我不知道如何制作它,所以它在同一行插入了td time和没有class/id的td。
所以基本上我想:
<td class="time">**/**/**</td>
<td>***********</td>放入一行,基本上是在它自己的MYSQL行中使用<tr class="">。
抱歉,如果我没有足够的描述,这是第一次,这是很难解释的。
如果你不明白,请让我知道。只是想让这一切成为可能。
谢谢,
Gamemann
发布于 2014-02-25 00:16:10
请使用诸如PHP Simple HTML DOM或Symfony DOM Crawler (带有Symfony CSS Selector)之类的内容来解析网站,而不是使用正则表达式。
使用Symfony DOM Crawler,请参阅此工作示例:
<?php
include 'vendor/autoload.php';
use Symfony\Component\DomCrawler\Crawler;
$crawler = new Crawler('<div class="data-container collectionLog">
<h3>Collection Log</h3>
<div id="lcLoanPerf2">
<table id="lcLoanPerfTable2" class="plain-table">
<tbody>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="odd">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
<tr class="">
<td class="time">**/**/**</td>
<td>***********</td>
</tr>
</tbody>
</table>
</div>');
$tfList = $crawler->filter('#lcLoanPerfTable2 tr td');
foreach($tfList AS $list) {
foreach($list->childNodes as $node) {
var_dump($node->wholeText);
}
}这样,您就得到了td-list和foreach中的(在本例中是两个) td元素。您可以将这些文件保存到数据库中,也可以随心所欲地使用它。
https://stackoverflow.com/questions/21896968
复制相似问题