摘要:我有一个概念表concept(id, definition, parent-id, path-to-root),我想创建一个扩展的概念表extended_concept(concept-id, definition, parent-id, siblings-ids),其中siblings-ids是一个逗号分隔的字符串,比如'sibling-id-1,sibling-id-2,..',兄弟姐妹的最小数量等于3。
详情:我有以下资料:
表示本体(图)的概念表(
concepts(id, definition, parent-id, path-to-root) )。它包含10,000,000条记录。以下是一些样本记录:
('A28681773', "definition2", 'A32452653', 'A24672666.A24681708.A24674976.A27414154.A32452653')
('A33193765', "definition1", 'A24670862', 'A24672666.A24681708.A24674976.A27414154.A27414079.A24679016.A24670862')(concept-id, definition, parent-id, siblings-ids)的大型.csv文件,不包括少于3个兄弟级的概念()
我认为依靠C#构建这个.csv文件在时间和空间上都是非常复杂的。我无法想出一个sql查询来构建这样一个表,用-by的方式--将表导出为.csv文件就很容易了。
有什么想法吗?
发布于 2022-09-03 09:44:03
解决了..。
group_concat聚合函数将所有子对象置于其父.旁边以逗号分隔的字符串中。
create view parent as
(
# group_concat is the aggregation function that will concat the fields of Id Column in a comma-separated string
select Parent_Id as Id, group_concat(Id) as kids, count(Id) as kids_count
from concept
group by Parent_Id
having kids_count > 3); # must be > 3, not >= 3, because later we will exclude the concept from being a sibling for itself..parent和concept表,其中concept.Parent_Id = parent.Id。需要进行一些字符串操作(通过replace和trim函数)来修复当我们将该概念排除在其自身之外时发生的错误。create table extended_concept as
(
select
concept.Id,
concept.Parent_Id,
concept.Definition,
# excluding the concept from being a sibling to itself will produce errors like: it will convert 'Id1,Id2,Id3' to ',Id2,Id3' or 'Id1,,Id3' or 'Id1,Id2,'
# so to fix it we will trim BOTH leading and trailing',', and replace ',,' with only one ','
TRIM(BOTH ',' from
REPLACE(
REPLACE(parent.kids,
concept.Id,
''),
',,', ',')
)
as siblings,
(parent.kids_count - 1) as siblings_count # because we excluded the concept from being a sibling for itself
from concept
inner join
parent
where concept.Parent_Id = parent.Id
order by siblings_count ASC
); -- count = 9,759extended_concept作为一个.csv文件导出,然后与父视图一起放置。
drop view parent;
drop table extended_concept;https://stackoverflow.com/questions/73572890
复制相似问题