GROUP BY Functions in Oracle

oracle-techonol-prajwal

GROUP BY Functions in Oracle

We will have an overview of the functionality available for aggregation in data warehouses

Setup

The examples in this article will be run against the following simple dimension table.

DROP TABLE dimension_tab;
CREATE TABLE dimension_tab (
  fact_1_id   NUMBER NOT NULL,
  fact_2_id   NUMBER NOT NULL,
  fact_3_id   NUMBER NOT NULL,
  fact_4_id   NUMBER NOT NULL,
  sales_value NUMBER(10,2) NOT NULL
);

INSERT INTO dimension_tab
SELECT TRUNC(DBMS_RANDOM.value(low => 1, high => 3)) AS fact_1_id,
       TRUNC(DBMS_RANDOM.value(low => 1, high => 6)) AS fact_2_id,
       TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_3_id,
       TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_4_id,
       ROUND(DBMS_RANDOM.value(low => 1, high => 100), 2) AS sales_value
FROM   dual
CONNECT BY level <= 1000;
COMMIT;

To keep the queries and their output simple I am going to ignore the fact tables and also limit the number of distinct values in the columns of the dimension table.

GROUP BY

Let’s start be reminding ourselves how the GROUP BY clause works. An aggregate function takes multiple rows of data returned by a query and aggregates them into a single result row.

SELECT SUM(sales_value) AS sales_value
FROM   dimension_tab;

SALES_VALUE
-----------
   50528.39

1 row selected.

SQL>

Including the GROUP BY clause limits the window of data processed by the aggregate function. This way we get an aggregated value for each distinct combination of values present in the columns listed in theGROUP BY clause. The number of rows we expect can be calculated by multiplying the number of distinct values of each column listed in the GROUP BY clause. In this case, if the rows were loaded randomly we would expect the number of distinct values for the first three columns in the table to be 2, 5 and 10 respectively. So using the fact_1_id column in the GROUP BY clause should give us 2 rows.

SELECT fact_1_id,
       COUNT(*) AS num_rows,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id
ORDER BY fact_1_id;

 FACT_1_ID   NUM_ROWS SALES_VALUE
---------- ---------- -----------
         1        478    24291.35
         2        522    26237.04

2 rows selected.

SQL>

Including the first two columns in the GROUP BY clause should give us 10 rows (2*5), each with its aggregated values.

SELECT fact_1_id,
       fact_2_id,
       COUNT(*) AS num_rows,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id, fact_2_id
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID   NUM_ROWS SALES_VALUE
---------- ---------- ---------- -----------
         1          1         83     4363.55
         1          2         96     4794.76
         1          3         93     4718.25
         1          4        105     5387.45
         1          5        101     5027.34
         2          1        109     5652.84
         2          2         96     4583.02
         2          3        110     5555.77
         2          4        113     5936.67
         2          5         94     4508.74

10 rows selected.

SQL>

Including the first three columns in the GROUP BY clause should give us 100 rows (2*5*10).

SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       COUNT(*) AS num_rows,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id, fact_2_id, fact_3_id
ORDER BY fact_1_id, fact_2_id, fact_3_id;

 FACT_1_ID  FACT_2_ID  FACT_3_ID   NUM_ROWS SALES_VALUE
---------- ---------- ---------- ---------- -----------
         1          1          1         10      381.61
         1          1          2          6      235.29
         1          1          3          7       270.7
         1          1          4         13      634.05
         1          1          5         10      602.36
         1          1          6          7      538.41
         1          1          7          5      245.87
         1          1          8          8      435.54
         1          1          9          8      506.59
         1          1         10          9      513.13
...
         2          5          1         14      714.84
         2          5          2         13      686.56
         2          5          3         13       579.5
         2          5          4         10      336.87
         2          5          5          5      215.17
         2          5          6          4      268.72
         2          5          7         14      667.22
         2          5          8          7      451.29
         2          5          9          8      365.24
         2          5         10          6      223.33

100 rows selected.

SQL>

Top Posts

  1. RANK, DENSE_RANK, FIRST and LAST Analytic Functions
  2. Partitioned Tables And Indexes in Oracle
  3. PIVOT Function in Oracle
  4. Cardinality and Index in Oracle
  5. Oracle Architecture – SQL Query Processing

 

 

error: Content is protected !!