Introduction
In the realm of SQL, the Cartesian join, also known as CROSS JOIN, is a powerful tool for generating paired combinations of rows from two tables. In this blog post, we’ll explore how to leverage Cartesian joins to extract valuable insights from a hypothetical employee_work_location
table.
Scenario
Consider a scenario where we have an employee_work_location
table with columns such as employee_name
, work_location
, start_date
, and end_date
. The goal is to extract data showing employee names and work locations for each month.
Example Data
employee_name | work_location | start_date | end_date |
---|---|---|---|
emp_a | Office A | 2022-01-01 | 2022-12-31 |
emp_b | Office B | 2021-05-15 | 2022-02-28 |
emp_c | Office C | 2022-03-10 | 2023-01-15 |
emp_d | Office A | 2021-08-20 | 2022-11-30 |
emp_a | Office B | 2021-03-05 | 2022-09-15 |
emp_b | Office C | 2022-06-12 | 2023-03-01 |
emp_c | Office A | 2021-12-01 | 2022-06-30 |
emp_d | Office B | 2022-02-15 | 2022-12-15 |
emp_e | Office C | 2021-07-10 | 2022-08-25 |
emp_f | Office A | 2022-04-18 | 2023-02-28 |
Querying with Cartesian Joins
To achieve our goal, we can use a single SQL query with the help of Cartesian joins and a temporary date series. The query generates a series of months from the minimum start date to the maximum end date, performing a cross join with the employee_work_location
table. This ensures that we capture all relevant combinations.
SQL Query:
WITH date_series AS (
SELECT generate_series(
(SELECT DATE_TRUNC('month', MIN(start_date)) FROM employee_work_location),
(SELECT DATE_TRUNC('month', MAX(end_date)) + INTERVAL '1 month' FROM employee_work_location),
'1 month'::interval
)::DATE AS work_month
)
SELECT
ds.work_month,
ew.employee_name,
ew.work_location
FROM
date_series ds
CROSS join employee_work_location ew
where DATE_TRUNC('month', ew.start_date) and DATE_TRUNC('month', ew.end_date)
ORDER BY
ds.work_month, ew.employee_name, ew.work_location;
Note:
We are truncating the month to get the interval from beginning of the month, and adding 1 month to the interval through
+ INTERVAL '1 month'
to not miss the max end month
Result
Now output of the our query presents a comprehensive view of employee work locations for each month.
work_month | employee_name | work_location |
---|---|---|
2021-03-01 | emp_a | Office B |
2021-04-01 | emp_a | Office B |
2021-01-01 | emp_a | Office B |
2021-06-01 | emp_a | Office B |
2021-06-01 | emp_b | Office B |
… | … | … |
2022-12-01 | emp_b | Office C |
2022-12-01 | emp_c | Office C |
2022-12-01 | emp_d | Office B |
2022-12-01 | emp_f | Office A |
2023-01-01 | emp_b | Office C |
2023-01-01 | emp_c | Office C |
2023-01-01 | emp_f | Office A |
2023-02-01 | emp_b | Office C |
2023-02-01 | emp_f | Office A |
2023-03-01 | emp_b | Office C |
Further Enhancements: Separating Months and Years
For additional clarity, we can further refine the result by extracting the year and month using PostgreSQL’s EXTRACT()
function.
WITH date_series AS (
SELECT generate_series(
(SELECT DATE_TRUNC('month', MIN(start_date)) FROM employee_work_location),
(SELECT DATE_TRUNC('month', MAX(end_date)) + INTERVAL '1 month' FROM employee_work_location),
'1 month'::interval
)::DATE AS work_month
)
SELECT
extract(year FROM ds.work_month) as "year",
extract(month FROM ds.work_month) as "month",
ew.employee_name,
ew.work_location
FROM
date_series ds
CROSS join employee_work_location ew
where DATE_TRUNC('month', ds.work_month) between DATE_TRUNC('month', ew.start_date) and DATE_TRUNC('month', ew.end_date)
ORDER BY
"year", "month", ew.employee_name, ew.work_location;
The result now includes separate columns for the year and month, providing a more organized view of the data.
year | month | employee_name | work_location |
---|---|---|---|
2021 | 3 | emp_a | Office B |
2021 | 4 | emp_a | Office B |
2021 | 4 | emp_a | Office B |
2021 | 5 | emp_a | Office B |
2021 | 6 | emp_a | Office B |
… | … | … | … |
2023 | 1 | emp_c | Office C |
2023 | 1 | emp_f | Office A |
2023 | 2 | emp_b | Office C |
2023 | 2 | emp_f | Office A |
2023 | 3 | emp_b | Office C |
With these SQL techniques, we can efficiently extract and organize data, gaining valuable insights into temporal relationships.