首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >针对wikidata优化聚合查询

针对wikidata优化聚合查询
EN

Stack Overflow用户
提问于 2019-05-03 10:38:48
回答 1查看 79关注 0票数 1

我正在运行一个针对wiki数据的聚合查询。该查询试图计算电影的平均持续时间,按其类型和出版年份分组。

查询中的多个分组/子查询旨在保留从电影到分组标准(年份和类型)的n-1关系,以及电影与其持续时间之间的1-1关系。这样做的原因是有大致正确的聚合( OLAP和数据仓库实践者熟悉的n-1关系)。

查询中嵌入了更多解释。因此,我不能放下在子查询和if语句或组连接中完成的分组。此查询在Wikidata SPARQL端点上超时。

问题

我需要一些提高绩效的建议.有优化暗示吗?如果这是不可能的,任何人都知道认证的方式(让他们知道我没有在玩)来查询维基数据,这样就可以增加超时时间,或者使用的方式增加超时(通常是)

代码语言:javascript
复制
    # Average duration of films, grouped by their genre and the year of publication       
SELECT  
        ?genre1                    # film genre
        ?year1                     # film year of publication
        (AVG(?duration1) AS ?avg)   # film average duration

WHERE
        {      
            # Calculating the average duration for each single film.
            # As there are films with multiple duration, these durations are 
            # averagred by grouping aggregating durations by film.
            # Hence, a single duration for each film is projected out from the subquery.
            {
              select ?film (avg(?duration) as ?duration1)  
              where{
                ?film   <http://www.wikidata.org/prop/direct/P2047>   ?duration .    
              }group by ?film
            }

            # Here the grouping criteria (genre and year) are calculated.
            # The criteria is grouped by film, so that in case multiple 
            # genre/multiple year exist for a single film, all of them are
            # group concated into a single value.
            # Also in case of a lack of a value of year or genre for some
            # specific film, a dummy value "OtherYear"/"OtherGenre" is generated.
            {
              select ?film (
                                IF
                                (
                                    group_concat(distinct ?year ; separator="-- ") != "", 
                                    # In case multiple year exist for a single film, all of them are group concated into a single value.
                                    group_concat(distinct ?year ; separator="-- "), 
                                   # In case of a lack of a value of year for some specific film, a dummy value "OtherYear" is generated.
                                    "OtherYear"                                        
                                )
                                as ?year1
                              )
                                (
                                IF
                                (
                                    group_concat(distinct ?genre ; separator="-- ") != "",
                                    # In case multiple genre exist for a single film, all of them are group concated into a single value.
                                    group_concat(distinct ?genre ; separator="-- "), 
                                    # In case of a lack of a value of genre for some specific film, a dummy value "OtherGenre" is generated.
                                    "OtherGenre"  
                                )
                                as ?genre1
                              ) 

              where 
              {
                ?film  <http://www.wikidata.org/prop/direct/P31>  <http://www.wikidata.org/entity/Q11424> .
                 optional {
                   ?film   <http://www.wikidata.org/prop/direct/P577>  ?date .
                   BIND(year(?date) AS ?year)
                 }
                 optional {
                   ?film <http://www.wikidata.org/prop/direct/P136>  ?genre .
                 }
              } group by ?film              
          }

        } GROUP BY ?year1 ?genre1
EN

回答 1

Stack Overflow用户

发布于 2019-05-03 11:18:59

在将两个IF表达式替换为一个简单的sample (从组中选择一个任意值)之后,该查询似乎可以工作:

代码语言:javascript
复制
    (sample(?year) as ?year1)
    (sample(?genre) as ?genre1) 

因此,group_concat的费用问题似乎是主要问题。我不觉得这很直观,也没有任何解释。

也许带有sample的版本足够好,或者至少它可以为您提供一个进一步改进的基准点。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55968131

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档