我正在运行一个针对wiki数据的聚合查询。该查询试图计算电影的平均持续时间,按其类型和出版年份分组。
查询中的多个分组/子查询旨在保留从电影到分组标准(年份和类型)的n-1关系,以及电影与其持续时间之间的1-1关系。这样做的原因是有大致正确的聚合( OLAP和数据仓库实践者熟悉的n-1关系)。
查询中嵌入了更多解释。因此,我不能放下在子查询和if语句或组连接中完成的分组。此查询在Wikidata SPARQL端点上超时。
问题
我需要一些提高绩效的建议.有优化暗示吗?如果这是不可能的,任何人都知道认证的方式(让他们知道我没有在玩)来查询维基数据,这样就可以增加超时时间,或者使用的方式增加超时(通常是)
# Average duration of films, grouped by their genre and the year of publication
SELECT
?genre1 # film genre
?year1 # film year of publication
(AVG(?duration1) AS ?avg) # film average duration
WHERE
{
# Calculating the average duration for each single film.
# As there are films with multiple duration, these durations are
# averagred by grouping aggregating durations by film.
# Hence, a single duration for each film is projected out from the subquery.
{
select ?film (avg(?duration) as ?duration1)
where{
?film <http://www.wikidata.org/prop/direct/P2047> ?duration .
}group by ?film
}
# Here the grouping criteria (genre and year) are calculated.
# The criteria is grouped by film, so that in case multiple
# genre/multiple year exist for a single film, all of them are
# group concated into a single value.
# Also in case of a lack of a value of year or genre for some
# specific film, a dummy value "OtherYear"/"OtherGenre" is generated.
{
select ?film (
IF
(
group_concat(distinct ?year ; separator="-- ") != "",
# In case multiple year exist for a single film, all of them are group concated into a single value.
group_concat(distinct ?year ; separator="-- "),
# In case of a lack of a value of year for some specific film, a dummy value "OtherYear" is generated.
"OtherYear"
)
as ?year1
)
(
IF
(
group_concat(distinct ?genre ; separator="-- ") != "",
# In case multiple genre exist for a single film, all of them are group concated into a single value.
group_concat(distinct ?genre ; separator="-- "),
# In case of a lack of a value of genre for some specific film, a dummy value "OtherGenre" is generated.
"OtherGenre"
)
as ?genre1
)
where
{
?film <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q11424> .
optional {
?film <http://www.wikidata.org/prop/direct/P577> ?date .
BIND(year(?date) AS ?year)
}
optional {
?film <http://www.wikidata.org/prop/direct/P136> ?genre .
}
} group by ?film
}
} GROUP BY ?year1 ?genre1发布于 2019-05-03 11:18:59
在将两个IF表达式替换为一个简单的sample (从组中选择一个任意值)之后,该查询似乎可以工作:
(sample(?year) as ?year1)
(sample(?genre) as ?genre1) 因此,group_concat的费用问题似乎是主要问题。我不觉得这很直观,也没有任何解释。
也许带有sample的版本足够好,或者至少它可以为您提供一个进一步改进的基准点。
https://stackoverflow.com/questions/55968131
复制相似问题