我有一些困难,了解如何只能下载部分html页面。我尝试了传统的方法,通过URL::openStream方法和BufferedReader,但我不太确定这种方式是否推动我下载整个页面。问题是:我有相当大的HTML页面,我需要解析其中的两个数字,至少每秒更新一次。以上方法有助于在2-3秒内检测一次更改,我想知道是否有方法使其更快。因此,我想,如果取页部分可以帮助我。
发布于 2018-11-20 11:14:39
我认为您应该了解如何获取数据(SSE或WebSocket),并尝试订阅该服务。如果这是不可能的,尝试更高效的XML解析器。我建议https://vtd-xml.sourceforge.io/比JDK附带的DOM解析器快10倍。
也要小心BufferedReader.readLine(),因为有隐藏的分配成本(这是相当高级的东西,因为您必须考虑到CPU内存带宽、L1缓存丢失等等)。你不需要的字符串。
使用我提到的库的示例:
byte[] pageInBytes = readAllBytesFromTheURL();
VTDGen vg = new VTDGen();
vg.setDoc(pageInBytes);
vg.parse(false);
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
//Jump to the section that we want to process
ap.selectXPath("/html/body/div");
String fileId = vn.toString(vu.getElementFragment());发布于 2018-11-21 12:05:58
编写帮助器读取url内容。另一个类中元素的解析器。
public class HTMLReaderHelper {
private final URL currentURL;
HTMLReaderHelper(URL url){
currentURL = url;
}
public CharIterator charIterator(){
CharIterator iterator;
try {
iterator = new CharIterator();
} catch(IOException ex){
return null;
}
return iterator;
}
public StringIterator stringIterator(){
return new StringIterator();
}
class CharIterator implements java.util.Iterator<Character>{
private InputStream urlStream;
private boolean isValid;
private Queue<Character> buffer;
private CharIterator() throws IOException {
urlStream = currentURL.openStream();
isValid = true;
buffer = new ArrayDeque<>();
}
@Override
public boolean hasNext() {
char c;
try {
c = (char)urlStream.read();
buffer.add(c);
} catch (IOException ex) {
markInvalid();
return false;
}
return c != (char) -1;
}
@Override
public Character next() {
if(!isValid){
return null;
}
char c;
try {
if(buffer.size() > 0){
return buffer.remove();
}
c = (char)urlStream.read();
} catch (IOException ex) {
markInvalid();
return null;
}
return (c != (char)-1) ? c : null;
}
private void markInvalid(){
isValid = false;
}
}
class StringIterator implements java.util.Iterator<String>{
private CharIterator charPointer;
private Queue<String> buffer;
private boolean isValid;
private StringIterator(){
charPointer = charIterator();
isValid = true;
buffer = new ArrayDeque<>();
}
@Override
public boolean hasNext() {
String value = next();
try {
buffer.add(value);
} catch (NullPointerException ex){
markInvalid();
return false;
}
return isValid;
}
@Override
public String next() {
if(buffer.size() > 0){
return buffer.remove();
}
if(!isValid){
return null;
}
StringBuilder sb = new StringBuilder();
Character currentChar = charPointer.next();
if(currentChar == null){
return null;
}
while (currentChar.equals('\n') || currentChar.equals('\r')){
currentChar = charPointer.next();
if(currentChar == null){
return null;
}
}
while (currentChar != Character.valueOf('\n') && currentChar != Character.valueOf('\r')){
sb.append(currentChar);
currentChar = charPointer.next();
}
return sb.toString();
}
private void markInvalid(){
isValid = false;
}
}
}https://stackoverflow.com/questions/53390833
复制相似问题