Practical example demonstrating how to query for min/max values of grouped documents in MongoDB including additional non-grouped field values using $push
, $map
and $filter
aggregation operators.
Table of Contents
Query task: initial data situation & first trivial approach
Let’s assume you have the following documents with hourly measured temperatures in a collection called temperatures
in your MongoDB…
[
{
"date" : ISODate("2022-05-10T09:00:00.000+02:00"),
"temp" : 11.3
},
{
"date" : ISODate("2022-05-10T10:00:00.000+02:00"),
"temp" : 11.7
},
{
"date" : ISODate("2022-05-10T11:00:00.000+02:00"),
"temp" : 11.5
},
{
"date" : ISODate("2022-05-11T10:00:00.000+02:00"),
"temp" : 14.3
},
{
"date" : ISODate("2022-05-11T11:00:00.000+02:00"),
"temp" : 14.7
},
{
"date" : ISODate("2022-05-11T12:00:00.000+02:00"),
"temp" : 14.9
}
]
The task now is to query for the maximal temperature for every day and the time(s), this temperature happened the day. So the desired output is:
- At 2022-05-10 the max. temperature was 11.7 degrees at 10:00:00
- At 2022-05-11 the max. temperature was 14.9 degrees at 12:00:00
Selecting only the max temperature per day can be easily achieved with a trivial $group
aggregation pipeline stage, like this:
db.temperatures.aggregate(
[
{
$group: {
_id: { $dateToString: { format: '%Y-%m-%d', date: '$date' } },
tempMax: { $max: '$temp' }
}
}
]);
The result would be:
[
{
"_id" : "2022-05-10",
"tempMax" : 11.7
},
{
"_id" : "2022-05-11",
"tempMax" : 14.9
}
]
Question now is, how additional fields like the time(s) can be added for every day? Obviously, they cannot be retrieved with any aggregation operator like $min/$max/$avg in the grouping stage because these are non-aggregated values. With traditional SQL, this could be achieved using joins or sub-selects – but how to do that in MongoDB?
Solution: projecting non-aggregated fields to a group using $push, $map and $filter
To achieve the desired output, we will use some more MongoDB aggregation operators and extend the first query by the following steps:
- Collect all documents per day using $push and $$CURRENT in a helper field
items
in the grouping stage. - Add a projection stage where a new field
tempMaxDates
is created and filled with all the dates out of the groups collecteditems
where temp is equal totempMax
of the grouping stage. To do so, the $map and $filter aggregation operators are used.
The final query is:
db.temperatures.aggregate(
[
{ $group: {
_id: { $dateToString: { format: '%Y-%m-%d', date: '$date' } },
tempMax: { $max: '$temp' },
items: { $push: '$$CURRENT' } } },
{ $project: {
tempMax: 1,
tempMaxDates: {
$map: {
input: {
$filter: {
input: '$items', as: 'i',
cond: { $eq: [ '$$i.temp', '$tempMax' ] }
}
},
as: 'maxOccur',
in: '$$maxOccur.date' }
}
}
}
]);
With that we get the desired result including the date(s) for the max temperatures per day:
[
{
"_id" : "2022-05-10",
"tempMax" : 11.7,
"tempMaxDates" : [
ISODate("2022-05-10T10:00:00.000+02:00")
]
},
{
"_id" : "2022-05-11",
"tempMax" : 14.9,
"tempMaxDates" : [
ISODate("2022-05-11T12:00:00.000+02:00")
]
}
]
Note: Since tempMaxDates
is an array where all timestamps for max temperature are pushed, this solution also fits perfectly when the maximum temperature occurs more then once per day (or group).
Please keep in mind that aggregations are operations directly executed on the MongoDB server. Depending on your concrete scenario (number of docs, resulting groups, parallel queries etc.) this query may be slow or cause trouble on your server.
If you encounter this problems, and additional matching stage to narrow down the processed docs could be a possible mitigation, like so:
{ $match: { date: { $gte: dateStart, $lte: dateEnd } } }
The $match
step should be the first element of the aggregation operations and placed before $group
and $project
.
In-depth: explanation of the solution
Let’s have a closer look on the crucial parts of the solution.
Collecting raw data for each group
items: { $push: '$$CURRENT' }
This line in the grouping stage will cause MongoDB to create a field items
which contains all original documents that are grouped together – in our case all documents of a day. In other words, items
will give us access to the ‘raw data’ of each group in the following stages.
$push adds elements to an array and $$CURRENT references the currently processed/grouped document.
Projecting non-aggregated fields from collected raw data
tempMaxDates: {
$map: {
input: {
$filter: {
input: '$items', as: 'i',
cond: { $eq: [ '$$i.temp', '$tempMax' ] }
}
},
as: 'maxOccur',
in: '$$maxOccur.date' }
}
}
This part of the query constructs a new field called tempMaxDates
in the projection stage.
For that, the items
field from the grouping stage is first filtered to get only the documents of the group where the temperature equals the maximum. Note that $
is used to reference root document fields whereas $$
is used to reference variables.
$$i.temp
refers to the temperature field of all elements in theitems
array which has variable namei
here$tempMax
refers to the maximum temperature field determined in the previous grouping stage
From the filtered docs in variable maxOccur
, only the date
field is mapped to the resulting array using $$maxOccur.date
as the mapping expression.
Bonus: SQL solution based on PostgreSQL
If you have an equivalent data-set in a traditional SQL database like PostgreSQL…
test=> select * from temperatures order by date;
date | temp
---------------------+-------
2022-05-10 09:00:00 | 11.30
2022-05-10 10:00:00 | 11.70
2022-05-10 11:00:00 | 11.50
2022-05-11 10:00:00 | 14.30
2022-05-11 11:00:00 | 14.70
2022-05-11 12:00:00 | 14.90
…a possible solution could be to use a WITH clause (or Common Table Expression [CTE]) to extract the maximum temperature per day and then join back to the original data for getting all the timestamps, like so:
test=> with ttt as
(
select date_trunc('day', t.date) as date, max(t.temp) as temp
from temperatures t
group by date_trunc('day', t.date)
)
select t.date, t.temp
from temperatures t, ttt
where ttt.date = date_trunc('day', t.date) and t.temp = ttt.temp;
date | temp
---------------------+-------
2022-05-10 10:00:00 | 11.70
2022-05-11 12:00:00 | 14.90
Happy querying 🙂