我试图按下面的方式索引嵌套结构,并且很难用SOlrJ和DIH进行索引。我与此斗争了一段时间,希望能在这方面提供一些帮助。
我如何用SolrJ或DIH来解决这个问题。谢谢
我希望我的数据看起来像我的索引:
“医生”:[ {“姓名”:“先生难以置信","id":101,”工作“:”超级英雄“,"_version_":"1483934897344086016”“儿童”:{“c_name”:“紫罗兰”"c_age":10“c_gender”:“女性”},{ "c_name":"Dash“"c_age":8”c_gender“:”男性“} ]
我的schema.xml
<schema name="datasearch" version="1.5">
<uniqueKey>id</uniqueKey>
<fields>
<field name="_version_" type="long" indexed="true" stored="true" />
<field name="_root_" type="string" indexed="true" stored="false"/>
<field name="id" type="string" indexed="true" stored="true" />
<field name="name" type="text" indexed="true" stored="true" />
<field name="job" type="string" indexed="true" stored="true"/>
<!-- I want to add children here -->
<!-- <field name="children" indexed="true" stored="true"/> -->
<field name="c_name" type="string" indexed="true" stored="true"/>
<field name="c_age" type="int" indexed="true" stored="true"/>
<field name="c_sex" type="string" indexed="true" stored="true"/>
</fields>
<types>
<fieldType name="string" class="solr.TrieLongField" />
<fieldType name="int" class="solr.TrieIntField" />
<fieldType name="date" class="solr.TrieDateField" omitNorms="true" />
<fieldType name="long" class="solr.StrField" sortMissingLast="true"/>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
</types>
<defaultSearchField>name</defaultSearchField>
</schema>SolrJ尝试
val serverUrl = current.configuration.getString("solr.server.url").get
val solr = new HttpSolrServer(serverUrl)
def testAddChildDoc={
val doc = {
new SolrInputDocument(){
addField("id", "101")
addField("name", "Mr Incredible")
}
}
val c1 = new SolrInputDocument(){
addField("c_name", "violet")
addField("c_age", 10)
}
val c2 = new SolrInputDocument(){
addField("c_name", "dash")
addField("c_age", 8)
}
doc.addChildDocument(c1)
doc.addChildDocument(c2)
solr.deleteByQuery("*:*")
solr.add(doc)
solr.commit(true, true)
}响应
=>ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: [doc=null] missing required field: id
[RemoteSolrException: [doc=null] missing required field: id]因此,我继续将id添加到childDocs中,使上面的
...
val c1 = new SolrInputDocument(){
addField("id", "101")
addField("c_name", "violet")
addField("c_age", 10)
}
val c2 = new SolrInputDocument(){
addField("id", "101")
addField("c_name", "dash")
addField("c_age", 8)
}
.....然后重新运行get-all查询,现在我得到以下结果
SolrJ尝试2加所有查询
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"indent": "true",
"q": "*:*",
"_": "1415194092582",
"wt": "json"
}
},
"response": {
"numFound": 3,
"start": 0,
"docs": [
{
"id": 101,
"c_name": violet,
"c_age": "10",
},
{
"id": 101,
"c_name": dash,
"c_age": "8"
},
{
"id": 101,
"name": "Mr Incredible",
"_version_": "1483938552238571520"
}
]
}
}所以我放弃在这里尝试DIH,如下所示
db-dataconfig.xml
<dataConfig>
<dataSource type="JdbcDataSource"
driver="org.postgresql.Driver"
url="jdbc:postgresql://xxx:5432/xxxx"
user="xx" password="xx"
readOnly="true" autoCommit="false" transactionIsolation="TRANSACTION_READ_COMMITTED" holdability="CLOSE_CURSORS_AT_COMMIT" />
<document>
<entity name="parent" query="select id,name, job from PARENTS LIMIT 1" >
<field column="name"/>
<field column="id"/>
<field column="job"/>
<entity child="true" name="children" query="select c_name, c_gender, c_age from CHILDREN" where="pid = ${parent.id}" processor="CachedSqlEntityProcessor">
<field column="c_age" />
<field column="c_gender" />
<field column="c_name"/>
</entity>
</entity>
</document>
</dataConfig>查询get-在完全导入之后,如上面所示,没有子索引。
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"indent": "true",
"q": "*:*",
"_": "1415195060664",
"wt": "json"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"name": "Mr Incredible",
"id": 101,
"_version_": "1483939357483073536"
}
]
}
}发布于 2014-11-05 20:10:08
为了能够在DIH中使用child="true",应用来自https://issues.apache.org/jira/browse/SOLR-5147的修补程序(我认为这是solr-3076中相同的DIH补丁)。
在忽略当前主干的细节方面,修补程序本身似乎不兼容。
发布于 2014-11-28 11:32:37
为了从Solr 4.10.1获得以下响应
{
"name": "MR INCREDIBLE ",
"id": 101,
"job": "super hero",
"type": "parent",
"_root_":"101"
"_version_": "1483934897344086016"
"childDocuments": [
{
"c_name":"Violet",
"c_age":10,
"c_gender":"female",
"id":"101_Violet",
"_root_":"101"
},
{
"c_name":"Dash",
"c_age":8,
"c_gender":"male",
"id":"101Dash",
"_root_":"101"
}
]
}需要在模式中定义"type“字段,以区分父文档和子文档:
<fields>
<field name="_version_" type="long" indexed="true" stored="true" />
<field name="_root_" type="string" indexed="true" stored="false"/>
<field name="id" type="string" indexed="true" stored="true" />
<field name="name" type="text" indexed="true" stored="true" />
<field name="job" type="string" indexed="true" stored="true"/>
<field name="c_name" type="string" indexed="true" stored="true"/>
<field name="c_age" type="int" indexed="true" stored="true"/>
<field name="c_gender" type="string" indexed="true" stored="true"/>
<field name="type" type="string" indexed="true" stored="true" />
</fields>子文档也需要一个唯一的"id",就像任何其他文档一样。索引中的所有文档都应该是父/子关系,否则查询可能返回意外的结果。如果你需要既不是父母也不是孩子的文件,给他们分配一个假父母。
SolrJ
要使用子/父文档,需要solrj.jar版本4.5或更高版本。
SolrServer solr = new HttpSolrServer(serverUrl);
SolrInputDocument doc = new SolrInputDocument();
String id = "101";
doc.addField("id", id);
doc.addField("name", "Mr Incredible");
doc.addField("job", "super hero");
doc.addField("type", "parent");
SolrInputDocument childDoc1 = new SolrInputDocument();
String name1 = "Violet";
childDoc1.addField("id", id + "_" + name1);
childDoc1.addField("c_name", name1);
childDoc1.addField("c_age", 10);
childDoc1.addField("c_gender", "female");
doc.addChildDocument(childDoc1);
SolrInputDocument childDoc2 = new SolrInputDocument();
String name2 = "Dash";
childDoc2.addField("id", id + "_" + name2);
childDoc2.addField("c_name", name2);
childDoc2.addField("c_age", 8);
childDoc2.addField("c_gender", "male");
doc.addChildDocument(childDoc2);
solr.add(doc);
solr.commit();最后,查询如下:
http://localhost/solr/core/select?q={!parent which='type:parent'}&fl=*,[child parentFilter=type:parent]&wt=json&indent=true只获得女性性别的结果:
http://localhost/solr/core/select?q={!parent which='type:parent'}c_gender:female&fl=*,[child parentFilter=type:parent childFilter=c_gender:female]&wt=json&indent=truehttps://stackoverflow.com/questions/26759366
复制相似问题