我正在尝试使用Apache Jena解析RDF文档递归。它由如下数据集组成:
<dcat:dataset>
<dcat:Dataset rdf:about="http://url/" >
<dct:description xml:lang="ca">Description</dct:description>
<dct:license rdf:resource="http://creativecommons.org/licenses/by/3.0/"/>
<dcat:keyword xml:lang="ca">Keyword1</dcat:keyword>
<dcat:distribution>
<dcat:Download>
<dcat:accessURL>http:/url/</dcat:accessURL>
<dct:format>
<dct:IMT>
<rdf:value>application/pdf</rdf:value>
<rdfs:label>pdf</rdfs:label>
</dct:IMT>
</dct:format>
<dct:modified rdf:datatype="http://www.w3.or/2001/XMLSchema#date">2012-11-09T16:23:22</dct:modified>
</dcat:Download>
</dcat:distribution>
<dct:publisher>
<foaf:Organization>
<dct:title xml:lang="en">Company</dct:title>
<foaf:homepage rdf:resource="http://url/"/>
</foaf:Organization>
</dct:publisher>
</dcat:Dataset>
</dcat:dataset>到目前为止,我要获取每条语句,它直接位于dcat:Dataset (Iterate over specific resource in RDF file with Jena)之下,但我希望找到每一层中的每一个三元组。我的输出应该如下所示:
description: Description
license: http://creativecommons.org/licenses/by/3.0/
keyword: Keyword1
distribution -> Download -> accessurl: http:/url/
distribution -> Download -> format -> IMT -> value: application/pdf
distribution -> Download -> format -> IMT -> label: pdf
...我尝试过使用递归函数,它迭代语句,当语句不是文字时,它跟随对象到下一个节点。如下所示:
private String recursiveQuery(Statement stmt) {
Resource subject = stmt.getSubject();
Property predicate = stmt.getPredicate();
RDFNode object = stmt.getObject();
if(object.isLiteral()) {
out.println("LIT: " + predicate.getLocalName());
return object.toString();
} else {
out.println(predicate.getLocalName());
Resource r = stmt.getResource();
StmtIterator stmts = r.listProperties();
while (stmts.hasNext()) {
Statement s = stmts.next();
out.println(s.getPredicate().getLocalName());
return recursiveQuery(s);
}
}
return null;
}但不知何故,我用这种方法一无所获。非常感谢你的每一次见解。
发布于 2013-06-10 22:48:32
根据您之前链接到的问题,我完成了您的数据,以便我们有一些工作数据可供使用。以下是完整的数据:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dcat="http://www.w3.org/ns/dcat#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:dct="http://purl.org/dc/terms/"
xmlns:dctypes="http://purl.org/dc/dcmitype/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<dcat:Catalog rdf:about="http://uri/">
<dcat:dataset>
<dcat:Dataset rdf:about="http://url/" >
<dct:description xml:lang="ca">Description</dct:description>
<dct:license rdf:resource="http://creativecommons.org/licenses/by/3.0/"/>
<dcat:keyword xml:lang="ca">Keyword1</dcat:keyword>
<dcat:distribution>
<dcat:Download>
<dcat:accessURL>http:/url/</dcat:accessURL>
<dct:format>
<dct:IMT>
<rdf:value>application/pdf</rdf:value>
<rdfs:label>pdf</rdfs:label>
</dct:IMT>
</dct:format>
<dct:modified rdf:datatype="http://www.w3.or/2001/XMLSchema#date">2012-11-09T16:23:22</dct:modified>
</dcat:Download>
</dcat:distribution>
<dct:publisher>
<foaf:Organization>
<dct:title xml:lang="en">Company</dct:title>
<foaf:homepage rdf:resource="http://url/"/>
</foaf:Organization>
</dct:publisher>
</dcat:Dataset>
</dcat:dataset>
</dcat:Catalog>
</rdf:RDF>听起来您只是想在dcat:Dataset类型的每个元素上执行一个depth first search。这很容易做到。我们只需选择dcat:Dataset类型的每个元素,然后从该RDFNode开始深度优先搜索。
import java.util.HashSet;
import java.util.Set;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.RDFNode;
import com.hp.hpl.jena.rdf.model.Statement;
import com.hp.hpl.jena.rdf.model.StmtIterator;
import com.hp.hpl.jena.vocabulary.RDF;
public class DFSinRDFwithJena {
public static void main(String[] args) {
Model model = ModelFactory.createDefaultModel();
model.read( "rdfdfs.rdf" );
StmtIterator stmts = model.listStatements( null, RDF.type, model.getResource( "http://www.w3.org/ns/dcat#" + "Dataset" ));
while ( stmts.hasNext() ) {
rdfDFS( stmts.next().getSubject(), new HashSet<RDFNode>(), "" );
}
model.write( System.out, "N3" );
}
public static void rdfDFS( RDFNode node, Set<RDFNode> visited, String prefix ) {
if ( visited.contains( node )) {
return;
}
else {
visited.add( node );
System.out.println( prefix + node );
if ( node.isResource() ) {
StmtIterator stmts = node.asResource().listProperties();
while ( stmts.hasNext() ) {
Statement stmt = stmts.next();
rdfDFS( stmt.getObject(), visited, prefix + node + " =[" + stmt.getPredicate() + "]=> " );
}
}
}
}
}这将生成以下输出:
http://url/
http://url/ =[http://purl.org/dc/terms/publisher]=> -f6d9b42:13f2e8dc5fb:-7ffd
http://url/ =[http://purl.org/dc/terms/publisher]=> -f6d9b42:13f2e8dc5fb:-7ffd =[http://purl.org/dc/terms/title]=> Company@en
http://url/ =[http://purl.org/dc/terms/publisher]=> -f6d9b42:13f2e8dc5fb:-7ffd =[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]=> http://xmlns.com/foaf/0.1/Organization
http://url/ =[http://www.w3.org/ns/dcat#distribution]=> -f6d9b42:13f2e8dc5fb:-7fff
http://url/ =[http://www.w3.org/ns/dcat#distribution]=> -f6d9b42:13f2e8dc5fb:-7fff =[http://purl.org/dc/terms/modified]=> 2012-11-09T16:23:22^^http://www.w3.or/2001/XMLSchema#date
http://url/ =[http://www.w3.org/ns/dcat#distribution]=> -f6d9b42:13f2e8dc5fb:-7fff =[http://purl.org/dc/terms/format]=> -f6d9b42:13f2e8dc5fb:-7ffe
http://url/ =[http://www.w3.org/ns/dcat#distribution]=> -f6d9b42:13f2e8dc5fb:-7fff =[http://purl.org/dc/terms/format]=> -f6d9b42:13f2e8dc5fb:-7ffe =[http://www.w3.org/2000/01/rdf-schema#label]=> pdf
http://url/ =[http://www.w3.org/ns/dcat#distribution]=> -f6d9b42:13f2e8dc5fb:-7fff =[http://purl.org/dc/terms/format]=> -f6d9b42:13f2e8dc5fb:-7ffe =[http://www.w3.org/1999/02/22-rdf-syntax-ns#value]=> application/pdf
http://url/ =[http://www.w3.org/ns/dcat#distribution]=> -f6d9b42:13f2e8dc5fb:-7fff =[http://purl.org/dc/terms/format]=> -f6d9b42:13f2e8dc5fb:-7ffe =[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]=> http://purl.org/dc/terms/IMT
http://url/ =[http://www.w3.org/ns/dcat#distribution]=> -f6d9b42:13f2e8dc5fb:-7fff =[http://www.w3.org/ns/dcat#accessURL]=> http:/url/
http://url/ =[http://www.w3.org/ns/dcat#distribution]=> -f6d9b42:13f2e8dc5fb:-7fff =[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]=> http://www.w3.org/ns/dcat#Download
http://url/ =[http://www.w3.org/ns/dcat#keyword]=> Keyword1@ca
http://url/ =[http://purl.org/dc/terms/license]=> http://creativecommons.org/licenses/by/3.0/
http://url/ =[http://purl.org/dc/terms/description]=> Description@ca
http://url/ =[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]=> http://www.w3.org/ns/dcat#Dataset这没有你描述的输出漂亮,但似乎是你想要的。
关于RDF as a Graph表示的说明
这个问题使用了“每个语句,它直接位于dcat:Dataset之下”的符号,我认为值得指出的是,以防出现任何混淆,RDF是一种基于图形的表示。RDF/XML序列化确实可以用来提供一些结构良好的、人类可读的XML,但是没有任何要求XML表示必须具有这种结构。要查看这种不同之处,请注意以下RDF/XML表示的图形与本答案前面发布的图形相同。
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dcat="http://www.w3.org/ns/dcat#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:dct="http://purl.org/dc/terms/"
xmlns:dctypes="http://purl.org/dc/dcmitype/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >
<rdf:Description rdf:nodeID="A0">
<dct:modified rdf:datatype="http://www.w3.or/2001/XMLSchema#date">2012-11-09T16:23:22</dct:modified>
<dct:format rdf:nodeID="A1"/>
<dcat:accessURL>http:/url/</dcat:accessURL>
<rdf:type rdf:resource="http://www.w3.org/ns/dcat#Download"/>
</rdf:Description>
<rdf:Description rdf:about="http://uri/">
<dcat:dataset rdf:resource="http://url/"/>
<rdf:type rdf:resource="http://www.w3.org/ns/dcat#Catalog"/>
</rdf:Description>
<rdf:Description rdf:about="http://url/">
<dct:publisher rdf:nodeID="A2"/>
<dcat:distribution rdf:nodeID="A0"/>
<dcat:keyword xml:lang="ca">Keyword1</dcat:keyword>
<dct:license rdf:resource="http://creativecommons.org/licenses/by/3.0/"/>
<dct:description xml:lang="ca">Description</dct:description>
<rdf:type rdf:resource="http://www.w3.org/ns/dcat#Dataset"/>
</rdf:Description>
<rdf:Description rdf:nodeID="A2">
<foaf:homepage rdf:resource="http://url/"/>
<dct:title xml:lang="en">Company</dct:title>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization"/>
</rdf:Description>
<rdf:Description rdf:nodeID="A1">
<rdfs:label>pdf</rdfs:label>
<rdf:value>application/pdf</rdf:value>
<rdf:type rdf:resource="http://purl.org/dc/terms/IMT"/>
</rdf:Description>
</rdf:RDF>RDF图是完全相同的,尽管XML结构非常不同。我提出这一点只是为了强调这样一个事实:将RDF作为图形使用是非常重要的,而不是作为分层XML使用,即使特定的序列化可能建议我们可以使用后者。
https://stackoverflow.com/questions/17024419
复制相似问题