ClickHouse SQL: 如何获取每天数据并填充缺失日期

您可能遇到过这样的问题:使用 ClickHouse 查询得到的数据,日期并不连续,例如只包含 2021-11-27 和 2022-05-07 的数据,而您想要获取每天的数据,并对没有数据的日期默认赋值为 0。

原始查询语句:

SELECT ts,
  if(i >= 31, arraySum(arraySlice(total_increase, i-30, 31)), arraySum(arraySlice(total_increase, 1, i))) as total_increase_num
FROM 
  (
    SELECT groupArray(Ds) as t,
      groupArray(app_count) as total_increase
    from (
        SELECT min_Ds as Ds,
          count(distinct AppIdentifier) as app_count
        FROM (
            select AppIdentifier,
              min(Ds) as min_Ds
            from log_iMonkey_iOS_overview
            GROUP BY AppIdentifier
          )
        GROUP BY min_Ds
        order by min_Ds
      ) 
  ) ARRAY
  JOIN t as ts,
  arrayEnumerate(t) as i where ts >= toDate(#monthly_active_new_business.start#)

查询到数据如下:

| ts | total_increase_num | |---|---| | 2021-11-27 | 1 | | 2022-05-07 | 2 |

解决方案:

可以使用左连接(LEFT JOIN)将日期表与查询结果表连接起来,将没有数据的日期赋值为0。具体实现方法如下:

WITH dates AS (
  SELECT toDate(#monthly_active_new_business.start#) AS date
  UNION ALL
  SELECT date + 1
  FROM dates
  WHERE date < toDate(today())
)
SELECT dates.date AS ts,
  if(i >= 31, arraySum(arraySlice(total_increase, i-30, 31)), arraySum(arraySlice(total_increase, 1, i))) AS total_increase_num
FROM dates
LEFT JOIN (
  SELECT groupArray(Ds) AS t,
    groupArray(app_count) AS total_increase
  FROM (
    SELECT min_Ds AS Ds,
      count(distinct AppIdentifier) AS app_count
    FROM (
      SELECT AppIdentifier,
        min(Ds) AS min_Ds
      FROM log_iMonkey_iOS_overview
      GROUP BY AppIdentifier
    )
    GROUP BY min_Ds
    ORDER BY min_Ds
  )
) ARRAY ON 1 = 1
LEFT JOIN arrayEnumerate(t) AS i ON dates.date >= ts
ORDER BY ts

代码解释:

  1. 使用一个递归公共表表达式(CTE)来创建日期表 dates,从查询结果的最早日期 #monthly_active_new_business.start# 到今天 today()
  2. 使用左连接将 dates 表与查询结果表连接起来,对于没有数据的日期,将 total_increase_num 赋值为 0。
  3. 最后按照日期 ts 排序输出。

通过这种方法,您就可以轻松地获取每天的数据,并填充缺失的日期,方便后续的统计分析。

ClickHouse SQL: 如何获取每天数据并填充缺失日期

原文地址: https://www.cveoy.top/t/topic/oDm7 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录