Query min/max values with additional non-grouped fields in MongoDB

Practical example demonstrating how to query for min/max values of grouped documents in MongoDB including additional non-grouped field values using $push, $map and $filter aggregation operators.

Query task: initial data situation & first trivial approach

Let’s assume you have the following documents with hourly measured temperatures in a collection called temperatures in your MongoDB…

[
  {
    "date" : ISODate("2022-05-10T09:00:00.000+02:00"),
    "temp" : 11.3
  },
  {
    "date" : ISODate("2022-05-10T10:00:00.000+02:00"),
    "temp" : 11.7
  },
  {
    "date" : ISODate("2022-05-10T11:00:00.000+02:00"),
    "temp" : 11.5
  },
  {
    "date" : ISODate("2022-05-11T10:00:00.000+02:00"),
    "temp" : 14.3
  },
  {
    "date" : ISODate("2022-05-11T11:00:00.000+02:00"),
    "temp" : 14.7
  },
  {
    "date" : ISODate("2022-05-11T12:00:00.000+02:00"),
    "temp" : 14.9
  }
]

The task now is to query for the maximal temperature for every day and the time(s), this temperature happened the day. So the desired output is:

  • At 2022-05-10 the max. temperature was 11.7 degrees at 10:00:00
  • At 2022-05-11 the max. temperature was 14.9 degrees at 12:00:00

Selecting only the max temperature per day can be easily achieved with a trivial $group aggregation pipeline stage, like this:

db.temperatures.aggregate(
[
  {
    $group: { 
      _id: { $dateToString: { format: '%Y-%m-%d', date: '$date' } }, 
      tempMax: { $max: '$temp' }
    }
  }
]);

The result would be:

[
  {
    "_id" : "2022-05-10",
    "tempMax" : 11.7
  },
  {
    "_id" : "2022-05-11",
    "tempMax" : 14.9
  }
]

Question now is, how additional fields like the time(s) can be added for every day? Obviously, they cannot be retrieved with any aggregation operator like $min/$max/$avg in the grouping stage because these are non-aggregated values. With traditional SQL, this could be achieved using joins or sub-selects – but how to do that in MongoDB?

Solution: projecting non-aggregated fields to a group using $push, $map and $filter

To achieve the desired output, we will use some more MongoDB aggregation operators and extend the first query by the following steps:

  • Collect all documents per day using $push and $$CURRENT in a helper field items in the grouping stage.
  • Add a projection stage where a new field tempMaxDates is created and filled with all the dates out of the groups collected items where temp is equal to tempMax of the grouping stage. To do so, the $map and $filter aggregation operators are used.

The final query is:

db.temperatures.aggregate(
[
  { $group: { 
    _id: { $dateToString: { format: '%Y-%m-%d', date: '$date' } }, 
    tempMax: { $max: '$temp' }, 
    items: { $push: '$$CURRENT' } } },
  { $project: {  
    tempMax: 1, 
    tempMaxDates: { 
      $map: { 
        input: { 
          $filter: { 
            input: '$items', as: 'i', 
            cond: { $eq: [ '$$i.temp', '$tempMax' ] } 
          } 
        }, 
        as: 'maxOccur', 
        in: '$$maxOccur.date' } 
      } 
     } 
   }
]);

With that we get the desired result including the date(s) for the max temperatures per day:

[
  {
    "_id" : "2022-05-10",
    "tempMax" : 11.7,
    "tempMaxDates" : [ 
      ISODate("2022-05-10T10:00:00.000+02:00")
    ]
  },
  {
    "_id" : "2022-05-11",
    "tempMax" : 14.9,
    "tempMaxDates" : [ 
      ISODate("2022-05-11T12:00:00.000+02:00")
    ]
  }
]

Note: Since tempMaxDates is an array where all timestamps for max temperature are pushed, this solution also fits perfectly when the maximum temperature occurs more then once per day (or group).

Please keep in mind that aggregations are operations directly executed on the MongoDB server. Depending on your concrete scenario (number of docs, resulting groups, parallel queries etc.) this query may be slow or cause trouble on your server.

If you encounter this problems, and additional matching stage to narrow down the processed docs could be a possible mitigation, like so:

{ $match: { date: { $gte: dateStart, $lte: dateEnd } } }

The $match step should be the first element of the aggregation operations and placed before $group and $project.

In-depth: explanation of the solution

Let’s have a closer look on the crucial parts of the solution.

Collecting raw data for each group

items: { $push: '$$CURRENT' }

This line in the grouping stage will cause MongoDB to create a field items which contains all original documents that are grouped together – in our case all documents of a day. In other words, items will give us access to the ‘raw data’ of each group in the following stages.

$push adds elements to an array and $$CURRENT references the currently processed/grouped document.

Projecting non-aggregated fields from collected raw data

tempMaxDates: { 
  $map: { 
    input: { 
      $filter: { 
        input: '$items', as: 'i', 
        cond: { $eq: [ '$$i.temp', '$tempMax' ] } 
      } 
    }, 
    as: 'maxOccur', 
    in: '$$maxOccur.date' } 
  } 
}

This part of the query constructs a new field called tempMaxDates in the projection stage.

For that, the items field from the grouping stage is first filtered to get only the documents of the group where the temperature equals the maximum. Note that $ is used to reference root document fields whereas $$ is used to reference variables.

  • $$i.temp refers to the temperature field of all elements in the items array which has variable name i here
  • $tempMax refers to the maximum temperature field determined in the previous grouping stage

From the filtered docs in variable maxOccur, only the date field is mapped to the resulting array using $$maxOccur.date as the mapping expression.

Bonus: SQL solution based on PostgreSQL

If you have an equivalent data-set in a traditional SQL database like PostgreSQL…

test=> select * from temperatures order by date;
        date         | temp  
---------------------+-------
 2022-05-10 09:00:00 | 11.30
 2022-05-10 10:00:00 | 11.70
 2022-05-10 11:00:00 | 11.50
 2022-05-11 10:00:00 | 14.30
 2022-05-11 11:00:00 | 14.70
 2022-05-11 12:00:00 | 14.90

…a possible solution could be to use a WITH clause (or Common Table Expression [CTE]) to extract the maximum temperature per day and then join back to the original data for getting all the timestamps, like so:

test=> with ttt as 
(
  select date_trunc('day', t.date) as date, max(t.temp) as temp 
  from temperatures t 
  group by date_trunc('day', t.date)
)
select t.date, t.temp 
from temperatures t, ttt 
where ttt.date = date_trunc('day', t.date) and t.temp = ttt.temp;

        date         | temp  
---------------------+-------
 2022-05-10 10:00:00 | 11.70
 2022-05-11 12:00:00 | 14.90

Happy querying 🙂

Useful links