Working from a list of my ancestors’ occupations I wanted to list the top 5 occupations in each occupational category per decade, so that given
year category description total ... 1851 Agriculture Ag Lab 4 1851 Agriculture Farm Bailiff 1 1851 Agriculture Farm Laborer 2 1851 Crafts Blacksmith 1 1851 Crafts Carpenter 3 1851 Crafts Shoe Maker 1 1852 Crafts Shoe Maker 1 1853 Crafts Blacksmith 1 1853 Crafts Shoe Maker 1 1854 Crafts Blacksmith 1 ....
I would get
decade category description total rank ... 1850 Agriculture Ag Lab 4 1 1850 Agriculture Farm Laborer 2 2 1850 Agriculture Farm Bailiff 1 3 1850 Crafts Blacksmith 3 1 1850 Crafts Carpenter 3 2 1850 Crafts Shoe Maker 3 3 ...
Using the windows functions introduced in version 8.4, this was the solution I ended up with:
- The outermost loop selects the top 5
- The middle loop ranks the data
- The innermost loop sums the data by decade
SELECT * FROM (
SELECT decade,
category,
description,
total,
rank() OVER (PARTITION BY decade,
category
ORDER BY total DESC, description) AS pos
FROM (
SELECT
CASE
WHEN year BETWEEN 1840 AND 1849 THEN 1840
WHEN year BETWEEN 1850 AND 1859 THEN 1850
WHEN year BETWEEN 1860 AND 1869 THEN 1860
WHEN year BETWEEN 1870 AND 1879 THEN 1870
WHEN year BETWEEN 1880 AND 1889 THEN 1880
WHEN year BETWEEN 1890 AND 1899 THEN 1890
ELSE 1900
END as decade,
category,
description,
SUM(total) as total
FROM occupations
GROUP BY 1,2,3) as orig
) AS ss
WHERE pos < 6;
The method for grouping by decade is suggested on grokbase.com.
The ranking then filtering is discussed on the PostgreSQL site:
SELECT depname, empno, salary, enroll_date
FROM
(SELECT depname, empno, salary, enroll_date,
rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos
FROM empsalary
) AS ss
WHERE pos < 3;