我使用htmlunit从网页中抓取图像。我是htmlunit的初学者。我编码了,但不知道如何获取图像。下面是我的代码。
import java.io.*;
import java.net.URL;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class urlscrap {
public static void main(String[] args) throws Exception
{
//WebClient webClient = new WebClient(Opera);
WebClient webClient = new WebClient();
HtmlPage currentPage = (HtmlPage) webClient.getPage(new URL("http://www.google.com"));
System.out.println(currentPage.asText());
//webClient.closeAllWindows();
}
}发布于 2012-04-11 11:51:34
这对你有效吗??
import java.net.URL;
import java.util.List;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlImage;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class urlscrap {
public static void main(String[] args) throws Exception
{
//WebClient webClient = new WebClient(Opera);
WebClient webClient = new WebClient();
HtmlPage currentPage = (HtmlPage) webClient.getPage(new URL("http://www.google.com"));
//get list of all divs
final List<?> images = currentPage.getByXPath("//img");
for (Object imageObject : images) {
HtmlImage image = (HtmlImage) imageObject;
System.out.println(image.getSrcAttribute());
}
//webClient.closeAllWindows();
}
}发布于 2012-04-11 11:11:02
看起来你得到了页面的文本,这确实是第一步。你的问题是什么?您在查找页面中引用的所有图像时遇到问题了吗?我建议查找如何在Java中进行DOM解析,并使用它从页面中提取所有img标记。
发布于 2012-04-11 11:26:24
如果您不介意切换语言,那么我推荐Python的scrapy。这是迄今为止我用来抓取网页内容的最好的框架,包括图片(它甚至可以自动为你创建缩略图)。就我个人而言,我不会使用java来完成这些任务。
https://stackoverflow.com/questions/10099269
复制相似问题