GROUP BY Functions in Oracle
We will have an overview of the functionality available for aggregation in data warehouses
Setup
The examples in this article will be run against the following simple dimension table.
DROP TABLE dimension_tab; CREATE TABLE dimension_tab ( fact_1_id NUMBER NOT NULL, fact_2_id NUMBER NOT NULL, fact_3_id NUMBER NOT NULL, fact_4_id NUMBER NOT NULL, sales_value NUMBER(10,2) NOT NULL ); INSERT INTO dimension_tab SELECT TRUNC(DBMS_RANDOM.value(low => 1, high => 3)) AS fact_1_id, TRUNC(DBMS_RANDOM.value(low => 1, high => 6)) AS fact_2_id, TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_3_id, TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_4_id, ROUND(DBMS_RANDOM.value(low => 1, high => 100), 2) AS sales_value FROM dual CONNECT BY level <= 1000; COMMIT;
To keep the queries and their output simple I am going to ignore the fact tables and also limit the number of distinct values in the columns of the dimension table.
GROUP BY
Let’s start be reminding ourselves how the GROUP BY
clause works. An aggregate function takes multiple rows of data returned by a query and aggregates them into a single result row.
SELECT SUM(sales_value) AS sales_value FROM dimension_tab; SALES_VALUE ----------- 50528.39 1 row selected. SQL>
Including the GROUP BY
clause limits the window of data processed by the aggregate function. This way we get an aggregated value for each distinct combination of values present in the columns listed in theGROUP BY
clause. The number of rows we expect can be calculated by multiplying the number of distinct values of each column listed in the GROUP BY
clause. In this case, if the rows were loaded randomly we would expect the number of distinct values for the first three columns in the table to be 2, 5 and 10 respectively. So using the fact_1_id
column in the GROUP BY
clause should give us 2 rows.
SELECT fact_1_id, COUNT(*) AS num_rows, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY fact_1_id ORDER BY fact_1_id; FACT_1_ID NUM_ROWS SALES_VALUE ---------- ---------- ----------- 1 478 24291.35 2 522 26237.04 2 rows selected. SQL>
Including the first two columns in the GROUP BY
clause should give us 10 rows (2*5), each with its aggregated values.
SELECT fact_1_id, fact_2_id, COUNT(*) AS num_rows, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY fact_1_id, fact_2_id ORDER BY fact_1_id, fact_2_id; FACT_1_ID FACT_2_ID NUM_ROWS SALES_VALUE ---------- ---------- ---------- ----------- 1 1 83 4363.55 1 2 96 4794.76 1 3 93 4718.25 1 4 105 5387.45 1 5 101 5027.34 2 1 109 5652.84 2 2 96 4583.02 2 3 110 5555.77 2 4 113 5936.67 2 5 94 4508.74 10 rows selected. SQL>
Including the first three columns in the GROUP BY
clause should give us 100 rows (2*5*10).
SELECT fact_1_id, fact_2_id, fact_3_id, COUNT(*) AS num_rows, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY fact_1_id, fact_2_id, fact_3_id ORDER BY fact_1_id, fact_2_id, fact_3_id; FACT_1_ID FACT_2_ID FACT_3_ID NUM_ROWS SALES_VALUE ---------- ---------- ---------- ---------- ----------- 1 1 1 10 381.61 1 1 2 6 235.29 1 1 3 7 270.7 1 1 4 13 634.05 1 1 5 10 602.36 1 1 6 7 538.41 1 1 7 5 245.87 1 1 8 8 435.54 1 1 9 8 506.59 1 1 10 9 513.13 ... 2 5 1 14 714.84 2 5 2 13 686.56 2 5 3 13 579.5 2 5 4 10 336.87 2 5 5 5 215.17 2 5 6 4 268.72 2 5 7 14 667.22 2 5 8 7 451.29 2 5 9 8 365.24 2 5 10 6 223.33 100 rows selected. SQL>
Top Posts
- RANK, DENSE_RANK, FIRST and LAST Analytic Functions
- Partitioned Tables And Indexes in Oracle
- PIVOT Function in Oracle
- Cardinality and Index in Oracle
- Oracle Architecture – SQL Query Processing