Postgresql crosstab

12/5/2023

Obviously, Postgres has its own function for it. Then, all the column label values will become separate columns with only one value (amount). It's easy to see that this table, when to transpose it, will be perfect to use in JOIN for the main product query. Thanks to it, we get a table filled up with empty labels where there is no data.

By adding a predefined ordering column (ord) - we can resign from the period_start column (subquery sorting takes no effect on the main query), but it is always a good idea to leave it in the WHERE clause subquery - the database engine will process smaller amount of data. This table (and query) can be easily extended with missing labels by adding them as a temporary table and using a current query in LEFT OUTER JOIN. Retrieving historical data and forecasts for one product from one timespan took only a few milliseconds. Then we tried to look at our problem from another side. All was always about more or less costly operations on each period in timespan for each product separately. We first tried to optimize the query part added by the serializer, but without good results. No way, there must have been a better solution. Moreover, if a new product had been added, we would have had to add entries starting from 2007.

But if we had switched back to this solution, we would have always needed to add zero value to all the historical periods with no entry (up to 10 years back). There would have been no problem if we had avoided design optimization and put all zero-entries into the database. The query, which contains 30 months (18 historical, 12 forecasts, by default) for one hundred products was generated in 42(!) seconds.

For each aggregation period the query looked as follows:ĬASE-WHEN clause repeated for each period, for each product. Unfortunately, the effect was disappointing. As simple as that - we added all the necessary fields into a serializer ( Django REST Framework) dynamically, and DjangoORM base SQL query was extended automatically. Additionally, depending on a chosen period of time and aggregation - a series of columns with data about historical sales and sales forecasts. Each product was written in one row and particular information in the proper columns. The data presentation application was like a table (the data was shown with the JavaScript framework, fetched from API). Ok, zero is also information, but it was simpler and more economical to recognize no value as value equal to zero. That's almost one million entries yearly ((52 weeks + 12 months + 1-year aggregation) * 15.000) of which only some have real information. That meant that for 15.000 all products under analysis we needed to add 15.000 entries for each period. for most products of one significant value (for example 5 sold items in this month) there were a few or a dozen next entries where there was no value (value was zero). We noticed that these data are rare type, i.e. Once imported and aggregated historical data is immutable, so we do it when each new aggregation is imported and write them to different tables (one for aggregation). Since the historical data was ten years long, we decided that forcing the database engine to count all the aggregations on the fly does not make sense. The client expected to be able to take analysis in three possible aggregations: weekly, monthly and annually.

0 Comments

Postgresql crosstab

Leave a Reply.

Author

Archives

Categories