我不太清楚如何表达这个问题或标题,所以就这样做了。我正在使用jsoup解析一个网页(http://champion.gg/statistics/),并试图使用以下代码从它们的表中获取统计数据。
public void connect(String url) {
try {
Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get();
System.out.println(doc.toString());
Element table = doc.select("table[class=table table-striped]").first();
Element tbody = table.select("tbody").first();
Iterator<Element> rows = tbody.select("tr").iterator();
rows.forEachRemaining(row -> {
System.out.println(row.toString());
});
} catch(IOException exception) {
if(Settings.DEBUG) {
Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception);
}
Program.alert("Error loading webpage!");
}
}它正在产生这个结果
<tr ng-repeat="champion in filteredChampions = (championData | startsWith:search.title | filter:roleSort | orderBy:[order+sortExpression.sortBy,order+sortExpression.lastSortBy])">
<td class="rank">{{indexNumber($index, filteredChampions.length)}}</td>
<td ng-class="{'selected-column':determineSelected('title')}"> <a href="/champion/{{champion.key}}/{{champion.role}}">
<div class="tsm-tooltip tsm-angular-champion-tt" data-type="champions" data-name="{{champion.key}}" data-id="{{matchupData}}">
<div class="matchup-champion {{champion.key}}"></div>
<span class="stat-champ-title">{{champion.title}}</span>
</div> </a> </td>
<td class="stats-role-title" ng-class="{'selected-column':determineSelected('role')}">{{champion.role}}</td>
<td ng-class="{'selected-column':determineSelected('winPercent')}"> <span ng-class="{'top-half': (champion.general.winPercent >= 50), 'bottom-half': (champion.general.winPercent < 50)}">{{champion.general.winPercent}}%</span> </td>
<td ng-class="{'selected-column':determineSelected('playPercent')}">{{champion.general.playPercent}}%</td>
<td ng-class="{'selected-column':determineSelected('banRate')}">{{champion.general.banRate}}%</td>
<td ng-class="{'selected-column':determineSelected('experience')}">{{champion.general.experience}}</td>
<td ng-class="{'selected-column':determineSelected('kills')}">{{champion.general.kills}}</td>
<td ng-class="{'selected-column':determineSelected('deaths')}">{{champion.general.deaths}}</td>
<td ng-class="{'selected-column':determineSelected('assists')}">{{champion.general.assists}}</td>
<td ng-class="{'selected-column':determineSelected('largestKillingSpree')}">{{champion.general.largestKillingSpree}}</td>
<td ng-class="{'selected-column':determineSelected('totalDamageDealtToChampions')}">{{champion.general.totalDamageDealtToChampions}}</td>
<td ng-class="{'selected-column':determineSelected('totalDamageTaken')}">{{champion.general.totalDamageTaken}}</td>
<td ng-class="{'selected-column':determineSelected('totalHeal')}">{{champion.general.totalHeal}}</td>
<td ng-class="{'selected-column':determineSelected('minionsKilled')}">{{champion.general.minionsKilled}}</td>
<td ng-class="{'selected-column':determineSelected('neutralMinionsKilledEnemyJungle')}">{{champion.general.neutralMinionsKilledEnemyJungle}}</td>
<td ng-class="{'selected-column':determineSelected('neutralMinionsKilledTeamJungle')}">{{champion.general.neutralMinionsKilledTeamJungle}}</td>
<td ng-class="{'selected-column':determineSelected('goldEarned')}">{{champion.general.goldEarned}}</td>
<td ng-class="{'selected-column':determineSelected('overallPosition')}">{{champion.general.overallPosition}}</td>
<td ng-class="{'selected-column':determineSelected('overallPositionChange')}"><span class="glyphicon" ng-class="{'glyphicon-arrow-up': (champion.general.overallPositionChange > 0), 'glyphicon-arrow-down': (champion.general.overallPositionChange < 0), 'same-position': (champion.general.overallPositionChange === 0)}">{{Math.abs(champion.general.overallPositionChange)}}</span></td>
</tr>现在,不是为一个特定的冠军的平均杀死量产生结果,而是在我得到的结果中说是champion.general.kills。我如何解析这个页面,这样它就可以代替champion.general.kills给出一个实际的结果,比如8?
发布于 2016-11-13 14:14:14
当涉及到从网页中提取数据时,你必须去数据所在的地方。在这种情况下,数据仍然在网页中,这是很好的。您需要获取包含数据的脚本标记并对其进行解析。现在,这个示例代码假设它是索引11处的脚本标记。
public static void main(String[] args)
{
try
{
Document doc = Jsoup
.connect("http://champion.gg/statistics/")
.userAgent(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
.get();
System.out.println(doc.toString());
Elements table = doc.select("script");
Element script = table.get(11);
parseText(script);
}
catch (IOException exception)
{
}
}
public static void parseText(Element script)
{
String text = ((DataNode) script.childNode(0)).toString().trim();
int index = text.indexOf("_id");
while (index > 0)
{
index += 6;// Beginning of value
int endQuote = text.indexOf("\"", index);
String id = text.substring(index, endQuote);
index = text.indexOf("\"key\":\"", endQuote);
endQuote = text.indexOf("\"", index + 8);
String key = text.substring(index, endQuote);
index = text.indexOf("\"kills\":", endQuote);
endQuote = text.indexOf(",", index);
String kills = text.substring(index, endQuote);
text = text.substring(endQuote);
index = text.indexOf("_id", index);
System.out.println(id + key + kills);
}
}输出:
5812965753fa9743395ee93a"key":"Urgot"kills":6.47
5812965753fa9743395ee93b"key":"Aatrox"kills":5.8
5812965753fa9743395ee93d"key":"Galio"kills":4.58
5812965753fa9743395ee940"key":"Kled"kills":7.3 ...
发布于 2016-11-14 01:13:48
我在ProgrammersBlock的帮助下找到了答案。通过检索脚本数据,我将它从JSON转换为一个完整的java对象!
package com.databot.web.parser;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import java.util.logging.Level;
import org.jsoup.Jsoup;
import org.jsoup.nodes.DataNode;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import com.databot.Program;
import com.databot.Settings;
import com.databot.champions.ChampionStats;
import com.databot.champions.Champion;
import com.google.gson.stream.JsonReader;
public class WebParser {
public void connect(String url) {
try {
Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get();
Elements table = doc.select("script");
Element script = table.get(11);
parseText(script);
} catch(IOException exception) {
if(Settings.DEBUG) {
Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception);
}
Program.alert("Error loading webpage!");
}
}
public void parseText(Element script)
{
String text = ((DataNode) script.childNode(0)).toString().substring(22).trim();
System.out.println(text);
List<Champion> champions = new ArrayList<>();
try {
JsonReader reader = new JsonReader(new StringReader(text));
reader.setLenient(true);
reader.beginArray();
while(reader.hasNext()) {
reader.beginObject();
String id = "", key = "", role = "", title = "";
ChampionStats stats = new ChampionStats(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0);
while(reader.hasNext()) {
String name = reader.nextName();
if(name.equalsIgnoreCase("_id")) {
id = reader.nextString();
} else if(name.equalsIgnoreCase("key")) {
key = reader.nextString();
} else if(name.equalsIgnoreCase("role")) {
role = reader.nextString();
} else if(name.equalsIgnoreCase("title")) {
title = reader.nextString();
} else if(name.equalsIgnoreCase("general")) {
double winPercent = 0, playPercent = 0, banRate = 0, experience = 0, kills = 0, deaths = 0, assists = 0, totalDamageDealtToChampions = 0, totalDamageTaken = 0, totalHeal = 0, largestKillingSpree = 0, minionsKilled = 0, neutralMinionsKilledTeamJungle = 0, neutralMinionsKilledEnemyJungle = 0, goldEarned = 0;
int overallPosition = 0, overallPositionChange = 0;
reader.beginObject();
while(reader.hasNext()) {
String gName = reader.nextName();
if(gName.equalsIgnoreCase("winPercent")) {
winPercent = reader.nextDouble();
} else if(gName.equalsIgnoreCase("playPercent")) {
playPercent = reader.nextDouble();
} else if(gName.equalsIgnoreCase("banRate")) {
banRate = reader.nextDouble();
} else if(gName.equalsIgnoreCase("experience")) {
experience = reader.nextDouble();
} else if(gName.equalsIgnoreCase("kills")) {
kills = reader.nextDouble();
} else if(gName.equalsIgnoreCase("deaths")) {
deaths = reader.nextDouble();
} else if(gName.equalsIgnoreCase("assists")) {
assists = reader.nextDouble();
} else if(gName.equalsIgnoreCase("totalDamageDealtToChampions")) {
totalDamageDealtToChampions = reader.nextDouble();
} else if(gName.equalsIgnoreCase("totalDamageTaken")) {
totalDamageTaken = reader.nextDouble();
} else if(gName.equalsIgnoreCase("totalHeal")) {
totalHeal = reader.nextDouble();
} else if(gName.equalsIgnoreCase("largestKillingSpree")) {
largestKillingSpree = reader.nextDouble();
} else if(gName.equalsIgnoreCase("minionsKilled")) {
minionsKilled = reader.nextDouble();
} else if(gName.equalsIgnoreCase("neutralMinionsKilledTeamJungle")) {
neutralMinionsKilledTeamJungle = reader.nextDouble();
} else if(gName.equalsIgnoreCase("neutralMinionsKilledEnemyJungle")) {
neutralMinionsKilledEnemyJungle = reader.nextDouble();
} else if(gName.equalsIgnoreCase("goldEarned")) {
goldEarned = reader.nextDouble();
} else if(gName.equalsIgnoreCase("overallPosition")) {
overallPosition = reader.nextInt();
} else if(gName.equalsIgnoreCase("overallPositionChange")) {
overallPositionChange = reader.nextInt();
} else {
reader.skipValue();
}
}
reader.endObject();
stats = new ChampionStats(winPercent, playPercent, banRate, experience, kills, deaths, assists, totalDamageDealtToChampions, totalDamageTaken, totalHeal, largestKillingSpree, minionsKilled, neutralMinionsKilledTeamJungle, neutralMinionsKilledEnemyJungle, goldEarned, overallPosition, overallPositionChange);
} else {
reader.skipValue();
}
}
reader.endObject();
champions.add(new Champion(id, key, role, title, stats));
}
reader.endArray();
reader.close();
} catch (Exception e) {
Program.alert("Error reading JSON data!");
e.printStackTrace();
}
champions.forEach(champion -> {
System.out.println(champion.toString());
});
}
}如果有人感兴趣的话,这是我的完整的WebParser类,我确信有更好的方法或者更有效的方法来写这个,但是这就是现在对我有效的方法!
https://stackoverflow.com/questions/40570505
复制相似问题