我已经创建了一个非常简单的PHP爬虫,我想在Laravel项目中实现它。我不知道该把它放在哪里。我想启动脚本并在应用程序运行时运行它。
我知道它不应该在控制器中,也不应该在Cron计划中,所以有什么建议要把它设置在哪里呢?
$homepage = 'https://example.com';
$already_crawled = [];
$crawling = [];
function follow_links($url){
global $already_crawled;
global $crawling;
$doc = new DOMDocument();
$doc->loadHTML(file_get_contents($url));
$linklist = $doc->getElementsByTagName('a');
foreach ($linklist as $link) {
$l = $link->getAttribute("href");
$full_link = 'https://example.com'.$l;
if (!in_array($full_link, $already_crawled)) {
$already_crawled[] = $full_link;
$crawling[] = $full_link;
echo $full_link.PHP_EOL;
// Insert data in the DB
}
}
array_shift($crawling);
foreach ($crawling as $link) {
follow_links($link);
}
}
follow_links($homepage);发布于 2018-11-20 20:26:51
https://stackoverflow.com/questions/53399629
复制相似问题