ClickHouse SQL: 如何获取每天数据并填充缺失日期
ClickHouse SQL: 如何获取每天数据并填充缺失日期
您可能遇到过这样的问题:使用 ClickHouse 查询得到的数据,日期并不连续,例如只包含 2021-11-27 和 2022-05-07 的数据,而您想要获取每天的数据,并对没有数据的日期默认赋值为 0。
原始查询语句:
SELECT ts,
if(i >= 31, arraySum(arraySlice(total_increase, i-30, 31)), arraySum(arraySlice(total_increase, 1, i))) as total_increase_num
FROM
(
SELECT groupArray(Ds) as t,
groupArray(app_count) as total_increase
from (
SELECT min_Ds as Ds,
count(distinct AppIdentifier) as app_count
FROM (
select AppIdentifier,
min(Ds) as min_Ds
from log_iMonkey_iOS_overview
GROUP BY AppIdentifier
)
GROUP BY min_Ds
order by min_Ds
)
) ARRAY
JOIN t as ts,
arrayEnumerate(t) as i where ts >= toDate(#monthly_active_new_business.start#)
查询到数据如下:
| ts | total_increase_num | |---|---| | 2021-11-27 | 1 | | 2022-05-07 | 2 |
解决方案:
可以使用左连接(LEFT JOIN)将日期表与查询结果表连接起来,将没有数据的日期赋值为0。具体实现方法如下:
WITH dates AS (
SELECT toDate(#monthly_active_new_business.start#) AS date
UNION ALL
SELECT date + 1
FROM dates
WHERE date < toDate(today())
)
SELECT dates.date AS ts,
if(i >= 31, arraySum(arraySlice(total_increase, i-30, 31)), arraySum(arraySlice(total_increase, 1, i))) AS total_increase_num
FROM dates
LEFT JOIN (
SELECT groupArray(Ds) AS t,
groupArray(app_count) AS total_increase
FROM (
SELECT min_Ds AS Ds,
count(distinct AppIdentifier) AS app_count
FROM (
SELECT AppIdentifier,
min(Ds) AS min_Ds
FROM log_iMonkey_iOS_overview
GROUP BY AppIdentifier
)
GROUP BY min_Ds
ORDER BY min_Ds
)
) ARRAY ON 1 = 1
LEFT JOIN arrayEnumerate(t) AS i ON dates.date >= ts
ORDER BY ts
代码解释:
- 使用一个递归公共表表达式(CTE)来创建日期表
dates,从查询结果的最早日期#monthly_active_new_business.start#到今天today()。 - 使用左连接将
dates表与查询结果表连接起来,对于没有数据的日期,将total_increase_num赋值为 0。 - 最后按照日期
ts排序输出。
通过这种方法,您就可以轻松地获取每天的数据,并填充缺失的日期,方便后续的统计分析。
原文地址: https://www.cveoy.top/t/topic/oDm7 著作权归作者所有。请勿转载和采集!