This ClickHouse SQL query calculates the number of unique App Identifiers appearing in the last 30 days, providing a daily breakdown. It also includes a rolling sum of new App Identifiers for each day.

SELECT date,
  if(i >= 31, length(arrayDistinct(arrayFlatten(arraySlice(total, i-30, 31)))), length(arrayDistinct(arrayFlatten(arraySlice(total, 1, i))))) as total_num,
  if(i >= 31, arraySum(arraySlice(total_increase, i-30, 31)), arraySum(arraySlice(total_increase, 1, i))) as total_increase_num
FROM 
  (
    SELECT groupArray(A.date) as t,
      groupArray(app_id_list) as total,
      groupArray(app_count) as total_increase
    from (
        SELECT date, groupArray(AppIdentifier) as app_id_list
        FROM (
            select AppIdentifier, toDate(Ds) as date
            from log_iMonkey_iOS_overview
            GROUP BY AppIdentifier, date
          )
        GROUP BY date
        order by date
      ) as A 
      join 
      (
        SELECT min_date as date,
          count(distinct AppIdentifier) as app_count
        FROM (
            select AppIdentifier,
              toDate(min(Ds)) as min_date
            from log_iMonkey_iOS_overview
            GROUP BY AppIdentifier
          )
        GROUP BY min_date
        order by min_date
      ) as C on A.date = C.date
  ) ARRAY
  JOIN (SELECT toDate('2022-01-01') + number as date FROM numbers(31)) as dates ON date >= toDate(#monthly_active_new_business.start#) AND date <= today()
  JOIN arrayEnumerate(t) as i where i <= length(total)

Explanation:

  1. Daily Breakdown: The query uses toDate(Ds) to convert the Ds column to a date, enabling grouping by date. This ensures daily data. The dates table, generated using the numbers function, provides a sequence of dates starting from 2022-01-01. We join on this table to ensure all dates are represented in the final output.

  2. Unique App Identifiers: The arrayDistinct(arrayFlatten(arraySlice(...))) expression calculates the number of unique App Identifiers appearing within the last 30 days. arraySlice extracts the relevant portion of the total array, arrayFlatten flattens it, and arrayDistinct removes duplicates. The if statement handles cases where the date is within the first 30 days of data, using only available data.

  3. Rolling Sum: The arraySum(arraySlice(...)) expression calculates the rolling sum of new App Identifiers for each day. arraySlice extracts the relevant portion of the total_increase array, and arraySum calculates the total. The if statement handles cases where the date is within the first 30 days of data, using only available data.

  4. Efficiency: This query utilizes ClickHouse's array functions to efficiently handle the rolling calculations. The use of arrayEnumerate helps to iterate over the date array and track the position within the rolling window.

Key Points:

  • This query provides daily insights into unique App Identifier counts and new App Identifier growth over a 30-day rolling window. This data can be useful for analyzing trends and identifying patterns in app usage.
  • ClickHouse's array functions are well-suited for efficiently handling rolling calculations and generating dynamic time series data.
  • You can adjust the start date in the dates table to customize the query's date range.
  • The query assumes the log_iMonkey_iOS_overview table has columns named Ds (date), AppIdentifier, and a way to calculate the app_count (total number of distinct App Identifiers per day).

Note: The #monthly_active_new_business.start# placeholder represents the start date of the desired time period. Replace it with your actual start date.

ClickHouse SQL: Calculate Daily Unique App Identifiers with 30-Day Rolling Window

原文地址: https://www.cveoy.top/t/topic/oBb4 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录