考虑下面的php代码,它为客户的电子邮件抓取客户端的旧静态网站.
$urls = explode(PHP_EOL, file_get_contents('urls.txt'));
print '<pre>'; print_r($urls); print '</pre>';
print '<strong>Results:</strong><br>';
function get_emails($url) {
$html = file_get_contents($url);
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link){
$href = $link->getAttribute('href');
if (strpos($href, 'mailto') !== false) {
return str_replace("mailto:","",$href) . '<br>';
}
}
}
foreach ($urls as $key => $url) {
print get_emails($url);
}我正在从urls.txt读取一个url列表,但是结果只是文件中最后一个url的结果。其他的都被忽略了。我曾希望它会返回一个很好的名单,他所有的客户网址,以便我们可以导入到新的网站。
有人能帮忙诊断这个问题吗?
发布于 2017-07-21 09:07:21
是因为:-
return str_replace("mailto:","",$href) . '<br>';它将终止循环的执行。
1.要么做:-
$urls = explode(PHP_EOL, file_get_contents('urls.txt'));
print '<pre>'; print_r($urls); print '</pre>';
print '<strong>Results:</strong><br>';
function get_emails($url) {
$html = file_get_contents($url);
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link){
$href = $link->getAttribute('href');
echo str_replace("mailto:","",$href) . '<br>';
}
}
foreach ($urls as $key => $url) {
get_emails($url);
}2.或如下所示:-
$urls = explode(PHP_EOL, file_get_contents('urls.txt'));
print '<pre>'; print_r($urls); print '</pre>';
print '<strong>Results:</strong><br>';
function get_emails($url) {
$html = file_get_contents($url);
$data = array(); //define array
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link){
$href = $link->getAttribute('href');
$data[] = str_replace("mailto:","",$href) . '<br>'; //assign each value to the array
}
return $data;
}
foreach ($urls as $key => $url) {
print_r(get_emails($url));
}https://stackoverflow.com/questions/45233502
复制相似问题