ClickHouse SQL: Calculate Daily Unique App Identifiers with 30-Day Rolling Window
This ClickHouse SQL query calculates the number of unique App Identifiers appearing in the last 30 days, providing a daily breakdown. It also includes a rolling sum of new App Identifiers for each day.
SELECT date,
if(i >= 31, length(arrayDistinct(arrayFlatten(arraySlice(total, i-30, 31)))), length(arrayDistinct(arrayFlatten(arraySlice(total, 1, i))))) as total_num,
if(i >= 31, arraySum(arraySlice(total_increase, i-30, 31)), arraySum(arraySlice(total_increase, 1, i))) as total_increase_num
FROM
(
SELECT groupArray(A.date) as t,
groupArray(app_id_list) as total,
groupArray(app_count) as total_increase
from (
SELECT date, groupArray(AppIdentifier) as app_id_list
FROM (
select AppIdentifier, toDate(Ds) as date
from log_iMonkey_iOS_overview
GROUP BY AppIdentifier, date
)
GROUP BY date
order by date
) as A
join
(
SELECT min_date as date,
count(distinct AppIdentifier) as app_count
FROM (
select AppIdentifier,
toDate(min(Ds)) as min_date
from log_iMonkey_iOS_overview
GROUP BY AppIdentifier
)
GROUP BY min_date
order by min_date
) as C on A.date = C.date
) ARRAY
JOIN (SELECT toDate('2022-01-01') + number as date FROM numbers(31)) as dates ON date >= toDate(#monthly_active_new_business.start#) AND date <= today()
JOIN arrayEnumerate(t) as i where i <= length(total)
Explanation:
-
Daily Breakdown: The query uses
toDate(Ds)to convert theDscolumn to a date, enabling grouping by date. This ensures daily data. Thedatestable, generated using thenumbersfunction, provides a sequence of dates starting from 2022-01-01. We join on this table to ensure all dates are represented in the final output. -
Unique App Identifiers: The
arrayDistinct(arrayFlatten(arraySlice(...)))expression calculates the number of unique App Identifiers appearing within the last 30 days.arraySliceextracts the relevant portion of thetotalarray,arrayFlattenflattens it, andarrayDistinctremoves duplicates. Theifstatement handles cases where the date is within the first 30 days of data, using only available data. -
Rolling Sum: The
arraySum(arraySlice(...))expression calculates the rolling sum of new App Identifiers for each day.arraySliceextracts the relevant portion of thetotal_increasearray, andarraySumcalculates the total. Theifstatement handles cases where the date is within the first 30 days of data, using only available data. -
Efficiency: This query utilizes ClickHouse's array functions to efficiently handle the rolling calculations. The use of
arrayEnumeratehelps to iterate over the date array and track the position within the rolling window.
Key Points:
- This query provides daily insights into unique App Identifier counts and new App Identifier growth over a 30-day rolling window. This data can be useful for analyzing trends and identifying patterns in app usage.
- ClickHouse's array functions are well-suited for efficiently handling rolling calculations and generating dynamic time series data.
- You can adjust the start date in the
datestable to customize the query's date range. - The query assumes the
log_iMonkey_iOS_overviewtable has columns namedDs(date),AppIdentifier, and a way to calculate theapp_count(total number of distinct App Identifiers per day).
Note: The #monthly_active_new_business.start# placeholder represents the start date of the desired time period. Replace it with your actual start date.
原文地址: https://www.cveoy.top/t/topic/oBb4 著作权归作者所有。请勿转载和采集!