首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >大型数据集上的EF6聚合

大型数据集上的EF6聚合
EN

Stack Overflow用户
提问于 2017-05-23 15:43:08
回答 1查看 114关注 0票数 1

有两张桌子,一个是活动,另一个是八度:

代码语言:javascript
复制
+---------+-------+
| EventId | Time  |
+---------+-------+

+----------+---------+-----------+-------+
| OctaveId | EventId | Frequency | Value |
+----------+---------+-----------+-------+

每个事件平均有10个八度音阶,每10秒记录一次,现在大约有40万个事件和400万个八度音阶。我想在一个特定的时间范围内过滤这些事件,按小时对它们进行汇总,并为每个具有相同频率的八度音阶的平均值返回它们。我使用的EF6 LINQ代码是:

代码语言:javascript
复制
_context.Events
      .Where(x => x.Time >= afterDate)
      .Where(x => x.Time <= beforeDate)
      .Select(x => new { year = x.Time.Year, month = x.Time.Month, day = x.Time.Day, hour = x.Time.Hour, data = x.Data })
      .GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
      .Where(x => x.Any())
      .Select(x => new
      {
         Time = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
         Data = x.SelectMany(y => y.data).GroupBy(y => new { frequency = y.Frequency }).Select(y => new
         {
            frequency  = y.Key.frequency,
            value = Math.Round(y.Average(z => z.Value), 1),
         })

      })
        .OrderByDescending(m => m.Time)
        .Take(limit);

这是可行的,但只有当时间跨度很小(一些小时)。如果将其增加到某几天,则查询似乎将永远运行。我对Server的要求是不是太高了?或者有更好的方法来运行这个查询/构造我的数据?如果我删除SelectMany(.).GroupBy(.)那就不再是疯狂的缓慢了。

生成的SQL查询是:

代码语言:javascript
复制
SELECT 
    [Project5].[C1] AS [C1], 
    [Project5].[C2] AS [C2], 
    [Project5].[C3] AS [C3], 
    [Project5].[C4] AS [C4], 
    [Project5].[C5] AS [C5], 
    [Project5].[C6] AS [C6], 
    [Project5].[C8] AS [C7], 
    [Project5].[Frequency] AS [Frequency], 
    [Project5].[C7] AS [C8]
    FROM ( SELECT 
        [Limit1].[C1] AS [C1], 
        [Limit1].[C2] AS [C2], 
        [Limit1].[C3] AS [C3], 
        [Limit1].[C4] AS [C4], 
        [Limit1].[C5] AS [C5], 
        [Limit1].[C6] AS [C6], 
        CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS float) ELSE ROUND([GroupBy1].[A1], 1) END AS [C7], 
        [GroupBy1].[K1] AS [Frequency], 
        CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C8]
        FROM   (SELECT TOP (10000) [Project4].[C1] AS [C1], [Project4].[C2] AS [C2], [Project4].[C3] AS [C3], [Project4].[C4] AS [C4], [Project4].[C5] AS [C5], [Project4].[C6] AS [C6]
            FROM ( SELECT 
                [Project2].[C1] AS [C1], 
                [Project2].[C2] AS [C2], 
                [Project2].[C3] AS [C3], 
                [Project2].[C4] AS [C4], 
                1 AS [C5], 
                convert (datetime2,right('000' + convert(varchar(255), [Project2].[C1]), 4) + '-' + convert(varchar(255), [Project2].[C2]) + '-' + convert(varchar(255), [Project2].[C3]) + ' ' + convert(varchar(255), [Project2].[C4]) + ':' + convert(varchar(255), 0) + ':' + str(cast(0 as float(53)), 10, 7), 121) AS [C6]
                FROM ( SELECT 
                    [Distinct1].[C1] AS [C1], 
                    [Distinct1].[C2] AS [C2], 
                    [Distinct1].[C3] AS [C3], 
                    [Distinct1].[C4] AS [C4]
                    FROM ( SELECT DISTINCT 
                        DATEPART (year, [Extent1].[TimeEnd]) AS [C1], 
                        DATEPART (month, [Extent1].[TimeEnd]) AS [C2], 
                        DATEPART (day, [Extent1].[TimeEnd]) AS [C3], 
                        DATEPART (hour, [Extent1].[TimeEnd]) AS [C4]
                        FROM [dbo].[Events] AS [Extent1]
                        WHERE ([Extent1].[TimeEnd] >= @p__linq__1) AND ([Extent1].[TimeEnd] <= @p__linq__2)
                    )  AS [Distinct1]
                )  AS [Project2]
                WHERE  EXISTS (SELECT 
                    1 AS [C1]
                    FROM [dbo].[Events] AS [Extent2]
                    WHERE ([Extent2].[TimeEnd] >= @p__linq__1) AND ([Extent2].[TimeEnd] <= @p__linq__2) AND (([Project2].[C1] = (DATEPART (year, [Extent2].[TimeEnd]))) OR (([Project2].[C1] IS NULL) AND (DATEPART (year, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C2] = (DATEPART (month, [Extent2].[TimeEnd]))) OR (([Project2].[C2] IS NULL) AND (DATEPART (month, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C3] = (DATEPART (day, [Extent2].[TimeEnd]))) OR (([Project2].[C3] IS NULL) AND (DATEPART (day, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C4] = (DATEPART (hour, [Extent2].[TimeEnd]))) OR (([Project2].[C4] IS NULL) AND (DATEPART (hour, [Extent2].[TimeEnd]) IS NULL)))
                )
            )  AS [Project4]
            ORDER BY [Project4].[C6] DESC ) AS [Limit1]
        OUTER APPLY  (SELECT 
            [Extent4].[Frequency] AS [K1], 
            AVG([Extent4].[Value]) AS [A1]
            FROM  [dbo].[Events] AS [Extent3]
            INNER JOIN [dbo].[Octaves] AS [Extent4] ON [Extent3].[EventId] = [Extent4].[EventId]
            WHERE ([Extent3].[TimeEnd] >= @p__linq__1) AND ([Extent3].[TimeEnd] <= @p__linq__2) AND (([Limit1].[C1] = (DATEPART (year, [Extent3].[TimeEnd]))) OR (([Limit1].[C1] IS NULL) AND (DATEPART (year, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C2] = (DATEPART (month, [Extent3].[TimeEnd]))) OR (([Limit1].[C2] IS NULL) AND (DATEPART (month, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C3] = (DATEPART (day, [Extent3].[TimeEnd]))) OR (([Limit1].[C3] IS NULL) AND (DATEPART (day, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C4] = (DATEPART (hour, [Extent3].[TimeEnd]))) OR (([Limit1].[C4] IS NULL) AND (DATEPART (hour, [Extent3].[TimeEnd]) IS NULL)))
            GROUP BY [Extent4].[Frequency] ) AS [GroupBy1]
    )  AS [Project5]
    ORDER BY [Project5].[C6] DESC, [Project5].[C1] ASC, [Project5].[C2] ASC, [Project5].[C3] ASC, [Project5].[C4] ASC, [Project5].[C8] ASC

更新1

我已经尝试“翻转”查询,通过直接查询八度,我有更好的结果。我先按日期和频率分组,计算平均值,然后再按时间分组。它一点也不优雅,但它是第一个真正起作用的解决方案。如果分组是不同的(例如按时间,然后按频率,然后平均),它仍然不能工作。

代码语言:javascript
复制
 _context.Octaves
.Where(x => x.Event.Time >= afterDate)
.Where(x => x.Event.Time <= beforeDate)
.GroupBy(x => new { year = x.Event.Time.Year, month = x.Event.Time.Month, day = x.Event.Time.Day, hour = x.Event.Time.Hour, freq = x.Frequency })
.Select(x => new
{
  year = x.Key.year,
  month = x.Key.month,
  day = x.Key.day,
  hour = x.Key.hour,
  freq = x.Key.freq,
  value = Math.Round(x.Average(y => y.Value), 1)

})
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Select(x => new
{
  timeEnd = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
  data = x.Select(y=> new {freq = y.freq, value = y.value })

})
.OrderByDescending(m => m.timeEnd)
.Take(limit)
EN

回答 1

Stack Overflow用户

发布于 2017-05-24 05:27:39

我不确定,但你可能想试试这个。可能会更糟,我不确定。

代码语言:javascript
复制
_context.Events.AsNoTracking()
  .Where(x => x.Time >= afterDate &&  x.Time <= beforeDate)
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Select(x => new
               {Time = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
                   Data = x.SelectMany
                   (y => 
                        y.Select(h => 
                        h.data.GroupBy(y => y.Frequency).select(y => 
                                new {
                                        frequency = y.Key,
                                        value = Math.Round(y.Average(z => z.Value), 1)
                                    }
 ))))
    .OrderByDescending(m => m.Time)
    .Take(limit);
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/44139430

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档