Recently I had received an apparently simple task: “Count the number of returning users, per month age.”, and it should be presented like a cohort analysis.

It should be solved as a simple query and returning something like this:

year_month_first_buy | month_age | count
----------------------+-----------+-------
               200904 |         0 |    10
               200904 |         1 |    8
               200904 |         2 |    5
               200904 |         4 |    1

Then I realized that I have no returning users that bought 3 months after his first buy.

And of course, the better way to show this kind of data is not easy to create in a simple query.

It should be presented like this:

 year_month_first_buy |  0 |  1 |  2 |  3 | 4
----------------------+----+----+----+----+---
               200904 | 10 |  8 |  5 |  0 | 1
               200905 | 15 | 11 |  9 |  8 |
               200906 | 25 | 20 | 18 |    |

After some research, I found the generate_series function in postgreSQL in order to solve the interval problem.

I have to create one record for each month from the starting date to current month and then I will count the number of users.

But first I need to calculate the difference in months between today and the date I started to sell, you can see below.

SELECT (date_part('year', f) * 12 + date_part('month', f))::integer
FROM age(NOW(), '2009-04-01') f

Then I created one record for each month with the generate_series function, see the code below.

SELECT * 
FROM generate_series(0, 
    (SELECT 
        (date_part('year', f) * 12 + date_part('month', f))::integer
    FROM age(NOW(), '2009-04-01') f)) i

Now I know how to get the month age, I need to create the month list since I started to sell. My first sell was in April 2009, so I need to create one record for each month since this date. The code below generates one row for each month from the start date until now.

SELECT
  t1.year, t2.month
FROM
  (SELECT * 
   FROM generate_series(2009, 
                        date_part('year', NOW())::integer) year) t1,
  (SELECT * 
   FROM generate_series(1, 12) month) t2
WHERE
  (t1.year = 2009 AND t2.month >= 4)
OR
  (t1.year > 2009 AND t1.year < date_part('year', NOW())::integer)
OR
  (t1.year = date_part('year', NOW())::integer 
   AND t2.month <= date_part('month', NOW())::integer)

The result should be something like this

 year | month
------+-------
 2009 | 4
 2009 | 5
 2009 | 6
...
 2016 | 8
 2016 | 9

 

Using all together

I have the year/month list, I know how to calculate the month count between 2 dates and how to create a series from 0 to this month count.

Now I have to mix all this data in order to generate a list (or a table) to group the count of users.

I have done this with a lot of subqueries, but it will be fast due to the limited amount of data I have.

SELECT year_month, age 
FROM (
  SELECT
    t1.year || RIGHT('0'||t2.month, 2)::varchar year_month,
    (SELECT (date_part ('year', f) * 12 + 
             date_part ('month', f))::integer 
     FROM age(NOW(), (t1.year::varchar||'-'||t2.month::varchar
                      ||'-01')::timestamp) f) month_age
  FROM
    (SELECT * 
     FROM generate_series(2009, 
                          date_part('year', NOW())::integer) year) t1,
    (SELECT * FROM generate_series(1, 12) month) t2
  WHERE
    (t1.year = 2009 AND t2.month >= 4)
  OR
    (t1.year > 2009 AND t1.year < date_part('year', NOW())::integer)
  OR
    (t1.year = date_part('year', NOW())::integer 
     AND t2.month <= date_part('month', NOW())::integer)
  ) months,
  generate_series(0, months.month_age) age

The result is:

year_month | age
-----------+-----
    200904 | 0
    200904 | 1
...
    200904 | 89
    200905 | 0
    200905 | 1
...
    200905 | 88
    200906 | 0
...
    201609 | 0

Now the count is the easy part, just select how many users who had done his first buy in the year_month and also bought age months after.

And then you will have to use some function or startegy to convert the age into columns, for some version of PostgreSQL you can use the crosstab function.

Conclusion

It was really difficult to explain, maybe you get confused about some step. Don’t hesitate to leave your comment if you have any question or issue.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s