Understanding the Aggregation Pipeline in MongoDB for Building Applications

by Deborah Emeni

19 min read

The word 'Aggregation Pipeline' was coined from the word 'Aggregation' and Aggregation simply means the formation of several things into a cluster.

In simple terms, Aggregation in MongoDB can process data in a MongoDB Collection and produce results.

This article will give Node.js software developers a detailed explanation with examples of the Aggregation Pipeline in MongoDB, along with illustrations showing how the Aggregation Pipeline works!

I will also include real-life examples of the use cases of the Aggregation Pipeline, describing the use of Aggregation Stages with examples. I will show you how these Aggregation Stages work in an actual code example using Nodejs and a MongoDB Collection to build a simple application. 

What Is an Aggregation Timeline?

You need to understand what the Aggregation Framework is in MongoDB before jumping into the Aggregation Pipeline. 

The Aggregation Framework is simply a framework like any other framework such as the React and Vue.js frameworks for JavaScript applications. Although, the Aggregation Framework has its unique purpose, that is, what it is used for. 

The Aggregation Framework in MongoDB is used for running various analyses on several MongoDB Collections. Analyses like filtering some data from a given Collection in the database and performing operations on the data like getting the data based on some conditions that may be defined in Stages called Aggregation Stages. The Aggregation Framework performs these analyses via the Aggregation Pipeline. 
Look at the illustration below, which shows a typical representation of the Aggregation Pipeline.

From the illustration above, you can see that the Aggregation Pipeline works with MongoDB Collections by accepting a single MongoDB Collection and passing this Collection through some or multiple stages. With these stages, you can perform operations on the Collection. As the Pipeline name implies, the document from the collection is taken as an input from one stage and is passed into the next stage in the Pipeline as an input. 

At the end of the Pipeline, there will be just one output which is then transformed or Aggregated data!
One of the functionalities of the Aggregation Pipeline is the ability to process the data from a given MongoDB Collection and not necessarily save the result in the Collection but perform more processes/analysis on the data and return the desired result (Aggregated data).

Real World Use Cases Of the Aggregation Pipeline

There are many areas where the Aggregation Pipeline can be applied. These areas of application are what I refer to as the use cases of the Aggregation Pipeline. An application area or use case of the Aggregation Pipeline is in a real-world application that involves Users’ Transactions. 

In building this real-world application, there may be a challenge with data fetching.

For instance, this application is built with a database, and let’s assume MongoDB is the database that is used to store data for this application. MongoDB deals with Collections, and so for this application, we have two Collections. One Collection is for the Users and the second Collection is for the Transactions performed on the application by the Users. 

If this application is been built by you, you will need to keep track of which of the Users perform Transactions on your application! To be able to keep track of the Transaction processes done by Users on your application, you will have the documents in your Transaction Collection linked to the Users in the Users Collection, by the User ID.
Next, you have the User Collection in your application’s database, which is an independent Collection. Then, you have the Transaction Collection which is a dependent Collection that is dependent on the User Collection.

If you have a view on the frontend of your application that will show the Transactions as well as the Users that performed the Transactions, with the Users’ information and not only the Users’ ID, you will need to be able to pull Transactions. Then, those Transactions will pull the Users’ information!

In this case, you will need to create an Aggregation Pipeline that will analyze the data in your database and return your desired result or Aggregated data. 

First, you will create an Aggregation Pipeline that pulls the Transaction. Then, the Aggregation Pipeline created will pull the Users that performed those Transactions while appending the Users’ details to the Transaction Collection.

Now that you have a fully modified document with all the details of the data that you need, you do not need to call Transaction first, or to begin a search for the Users in the Users Collection one by one. So, the Aggregation Pipeline does everything for you, that is, it gets the Transaction and the Users’ information from the Collection. Then, you can get that Aggregated data result to the front-end of your application!

Let’s get more practical and write some code!

The Syntax of the Aggregation Pipeline

In this section, you will learn how to work with the Aggregation Pipeline; the Aggregation Stages, and with MongoDB Collections. To follow along with this tutorial, ensure you have the following:

  • Nodejs, Express, and Git (Install Nodejs, Git)
  • Visual Studio Code editor (Install here)
  • Postman account (Install here)
  • MongoDB (Download here)
  • MongoDB Compass (Download here)

Let’s Begin!

We are going to create a simple application with Nodejs and use MongoDB as the database with the following steps:

Setting up the File Structure

Let’s start with setting up the working directory for this application:

  • In your terminal, type the following command with the name of the folder ‘TransactionApp’:
mkdir TransactionApp
  • Next, change the directory into the ‘TransactionApp` working directory with the following command:
cd TransactionApp
  • Inside the directory, run the following command to open this folder in your Code editor:
code .

The following window will be opened as shown below:

  • With your VS code opened, create a file server.js in the TransactionApp directory as shown below:
  • In your VS Code, at the top of the editor, click on Terminal > New Terminal, and a terminal window will be opened in VS Code:
  • Next, initialize your application with the following command that creates the package.json file, which will contain information about your application:
npm init

After running this command, you will be prompted to enter basic information for your application, like this:

At the prompt, type transaction as the name of your application. Then, click enter for the other prompts, but when you are prompted for the author’s name, type your name as the owner of the application. Then, click ‘Enter’ and your package.json file will be created as follows:

  • Next, you need to install some npm packages as dependencies for your application:
    • To install the express, morgan, body-parser, and mongoose packages for your application, run the following command in your terminal:
npm install express mongoose morgan body-parser

Then, you can check the uses of these packages by clicking on these links: express, mongoose, morgan, body-parser.


Once this command runs successfully, open your package.json file, you will see the packages installed as follows:

  • To use the packages that you just installed in your application, you need to require them in your ‘server.js’ file. In your ‘server.js’ file, include the following code:
const express = require('express');const mongoose = require('mongoose');const morgan = require('morgan');const bodyParser = require('bodyParser');
  • After requiring the packages, you need to connect to your database with the mongoose package that you required. 
  • To choose your connection string to use to connect to your database, using MongoDB Compass, sign in to your account here. Once, you’ve signed in, your dashboard window will be opened as follows:
  • Then once your MongoDB Dashboard is opened, click on the Connect button showing on your screen to choose your connection method. For this tutorial, we are using the MongoDB Compass connection method as shown below:

With that window opened, click the Connect using MongoDB Compass connection method, the following window will then be opened:

Ensure that you have MongoDB Compass installed, then choose the highlighted green I have MongoDB Compass option. In 2 where you can see Copy the connection string, then open MongoDB Compass, below that you will see your database connection string. Copy that connection string to your clipboard, as you will be using it in your application. 


Then, click on the close button to close the window. 

  • Open your MongoDB Compass and paste the connection string you copied to where you see Paste your connection string as seen below:

Paste the connection string and ensure you put the right password, which you can set here and click the Connect green button. This window will be opened showing a list of your Databases:

  • For connecting to your database, in your Code editor open the server.js file and type the following code:
mongoose.connect('mongodb+srv://<your password>:<password>@cluster0.sagwd.mongodb.net/<database Name>', {useNewUrlParser: true, useUnifiedTopology: true})

Replacing ‘<your password>’ with your password and ‘<database Name>’ with the name of your database in MongoDB as follows: 

db.on('error', (err) => {  console.log(err);});
db.once('open', () => {  console.log('Database Connection Established')});
  • Next, we need to instantiate the application with express and use the morgan and body-parser module we installed. Then, declare the PORT that the application will listen to and run on 3000. Type the following code:
const app = express();
app.use(morgan('dev'));app.use(bodyParser.urlencoded({extended: true}));app.use(bodyParser.json());
const PORT = process.env.PORT || 3000
app.listen(PORT, () => {  console.log(`Server is running on port ${PORT}`)});
  • To ensure that your application is running, type the following command in your terminal from VS Code editor as follows:
npm start

When you run the command, you will see the following output in your terminal:

  • To enable any changes made to the code to reflect immediately and run, let’s install another module called ‘nodemon’. So, end the process in the terminal with CTRL C and paste this command in your terminal:
npm install nodemon --save

After installation, check your package.json file to confirm that the nodemon module has been installed. 

  • Next, open your package.json file and replace the start script with the following code:
nodemon server.js

See the following screenshot of the start script in the package.json file below:

  • Run the npm start command in your terminal, to ensure that the application is still running and you will see the following ouput in your terminal:

Now that you have successfully set up your application to run on PORT 3000 and connected your application to your database via mongoose with MongoDB Compass, we can move forward to write some code to get the application working fully!

Setting up the Controllers, Models, and Routes for the Application

Here, we are going to build the routes, controllers, and models for the application with the following steps:

  • In your VS Code, create these three folders within your working directory with the names:
    • Controllers
    • models
    • routes
  • Open your newly created folder ‘models’ and create a file ‘Data.js’ within the directory and type the following code in it:
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const userSchema = new Schema({
    name: {
        type: String
    },
    designation: {
        type: String
    },
    email: {
        type: String
    },
    phone: {
        type: String
    },
    age: {
        type: Number
    }
}, {timeStamps: true})
const User = mongoose.model('User', userSchema)
model.exports = User

Your ‘Data.js’ file should look like this in your code editor:

  • Navigate to the ‘controllers’ folder you created and create a file ‘UserController.js’ and type the following code in it:
const User = require('../models/Data');
// Fetches all Users
const users = (req, res, next) => {
    User.find()
    .then(response => {
        res.json({
            response
        })
    })
    .catch(error => {
        res.json({
            message: 'An error occured!'
        })
    })
}
//Fetches a Single User
const one_user = (req, res, next) => {
    let userID = req.body.userID
    User.findById(userID)
    .then(response => {
        res.json({
            response
        })
    })
    .catch(error => {
        res.json({
            message: 'An error occured!'
        })
    })
}
// Saves user to the database
const store = (req, res, next) => {
    let user = new User({
        name: req.body.name,
        designation: req.body.designation,
        email: req.body.email,
        phone: req.body.phone,
        age: req.body.age
    })
    user.save()
    .then(response => {
        res.json({
            message: 'User added successfully!'
        })
    })
    .catch(error => {
        res.json({
            message: 'An error occured!'
        })
    }) 
}

// Updates user
const update = (req, res, next) => {
    let userID = req.body.userID
    let updatedData = {
        name: req.body.name,
        designation: req.body.designation,
        email: req.body.email,
        phone: req.body.phone,
        age: req.body.age
    }
    User.findByIdAndUpdate(userID, {$set: updatedData})
    .then(() => {
        res.json({
            message: 'User updated Successfully!'
        })
    })
    .catch(error => {
        res.json({
            message: 'An error occured!'
        })
    })
}
// Deletes a user
const destroy = (req, res, next) => {
    let userID = req.body.userID
    User.findByIdAndRemove(userID)
    .then(() => {
        req.json({
            message: 'User deleted Successfully!'
        })
    })
    .catch(error => {
        req.json({
            message: 'An error occured!'
        })
    })
}
module.exports = {
    users, one_user, store, update, destroy
}

Your ‘UserController.js’ file should look like the code in these screenshots below in your code editor:

  • Next, navigate to the ‘routes’ folder and create a file ‘user.js’ within it and type the following code in the ‘user.js’ file:
const express = require('express');
const router = express.Router()
const UserController = require('../controllers/UserController');
router.get('/', UserController.users);
router.post('/one_user', UserController.one_user);
router.post('/store', UserController.store);
router.post('/update', UserController.update);
router.post('/delete', UserController.destroy);
module.exports = router

Your ‘user.js’ file should look like this in your code editor:

  • Import the route in the ‘user.js’ file into your ‘server.js’ file. Immediately after your required statements of your installed packages/modules before the line of code connecting to your database with mongoose, enter the following code:
const UserRoute = require('./routes/user');

Still, in the ‘server.js’ file, after the line of code that listens for the PORT the application is running on, type the following code:

app.use('/api/user', UserRoute)

Your ‘server.js’ file should look like the code in these screenshots below in your code editor:

Now, run your application and ensure that your application is connected to the database by typing the following command in your terminal:

npm start

You should see the following output in your terminal:

Testing the Application’s Routes and Endpoints

Here, we will use the postman tool to test the application. To ensure that the routes and endpoints work, do the following:

  • Open your postman tool and create a Collection with the name ‘Aggregation Tutorial’.
  • Within this Collection, create a ‘POST’ request and paste the following URL:
http://localhost:3000/api/user/store
  • In raw and application/JSON data format for the request body, type the following JSON data:
{
    "name": "Fred Loly",
    "designation": "Software Developer",
    "email": "[email protected]",
    "phone": "2366494049630",
    "age": 23
}
  • Next, click the ‘Send’ button and you should see the message ‘User added successfully’ in the response body as follows:

Repeat the process and add more Users with different names as follows:

{
    "name": "Anita Bells",
    "designation": "Mechanical Engineer",
    "email": "[email protected]",
    "phone": "258909275388",
    "age": 30
}
{
    "name": "Emma Bostain",
    "designation": "Lawyer",
    "email": "[email protected]",
    "phone": "2589093573388",
    "age": 30
}
{
    "name": "Cray Selton",
    "designation": "Lawyer",
    "email": "[email protected]",
    "phone": "385208573388",
    "age": 40
}
{
    "name": "Hilton Motly",
    "designation": "Mechanical Engineer",
    "email": "[email protected]",
    "phone": "2573950975388",
    "age": 36
}
  • Open your MongoDB Compass and you will notice that the users have been saved in the database as follows:

Let’s test another endpoint for getting all the users in the application that have been saved to the database by creating another request in the ‘Aggregation Tutorial’ collection. 

  • Create a ‘GET’ request with the following URL:
http://localhost:3000/api/user

You’d get the following response:


Now that we have successfully built our simple application with nodejs, express, and MongoDB, I will use this application to demonstrate how the Aggregation Pipeline works in MongoDB!

Using the Application to Illustrate How Aggregation Pipeline Works

In the database, we want to access the ‘users’ collection and aggregate the data. We are going to use stages in the aggregation pipeline, we will match all documents inside the ‘users’ collection with ‘designation’ set to ‘Mechanical Engineer’ as the first stage. 


Then the second stage is going to take the aggregated result from the previous stage as input. In this second stage, we will sort by ‘name’ in descending order. Where ‘name’ is a field inside a single document in the ‘users’ collection. 
We are only going to project the ‘name’ field in the ‘users’ collection, specifying in the project stage that we do not want the ‘_id’ shown but the ‘name’ field only. 

So, we are creating an aggregation pipeline that uses the following stages:

  1. Create a new route in the controller file ‘UserController.js’ with the following code:
const getUserData = (req, res, next) => {}

Here, you are creating a new function in your controller file to be used to handle the route for the Aggregated data that will be produced. 

  1. In the function, we will use the ‘Data.js’ model to access the ‘User’ collection and we will use the aggregate() method on it as follows:
User.aggregate();

The Aggregation Pipeline is a MongoDB feature but mongoose gives us access to it. 


Note that, the Aggregation Pipeline is like a regular MongoDB query, and the only difference with Aggregation is that you can manipulate the data in different steps. 

3. To define these stages, we’ll pass in an array of stages in the aggregate() method as follows:

User.aggregate([
        { $match : { designation : 'Mechanical Engineer' } },
        { $sort : {name:-1}  },
        { $project : { _id : 0, name : 1  } }
])

In our Aggregation Pipeline, we used the $match, $sort, and $project stages as follows:

  • First Stage: The $match stage is used to select or filter certain documents, it acts as the filter() object in MongoDB. We are using the $match stage in our Aggregation Pipeline to match the ‘designation’ field in the User Collection when it is equal to ‘Mechanical Engineer’. That is, we are going to match all Documents inside the User Collection with ‘designation’ set to ‘Mechanical Engineer’.
  • Second Stage: The second stage which is the $sort stage is going to take the Aggregated result from the first stage as input. In this stage, we are sorting by the ‘name’ field in the Documents in the User Collection in descending order.
  • Note that, ‘name’ refers to a field in a single Document inside the User MongoDB Collection. 
  • Third Stage: In the $project stage, we only want to project the name field. By setting _id : 0 and name: 1, we only want to show the name field and not the id field. 

The Documents in your MongoDB Collection will pass through these Aggregation Stages in the defined sequence and each of the stages is an object containing the name of the Stage. There are many Aggregation Stages you can choose from. Check out the MongoDB Documentation to see the Query Operators that MongoDB has made available for your use. 

The Documentation is an easy go-to for any stages you decide to use to manipulate any data in your application!

4. Next, you need to return the Aggregated data by using the .then() method to handle the promise and respond with the JSON (JavaScript Object Notation) Aggregated data. Then we also need to handle the errors by using the .catch() method that responds with a JSON message as follows:

.then(response => {
        res.json({
            response
        })
    })
    .catch(error => {
        res.json({
            message: 'An error occured!'
        })
    })

Take a look at the complete code in the Aggregation Pipeline here:

const getUserData = (req, res, next) => {
    User.aggregate([
        { $match : { designation : 'Mechanical Engineer' } },
        { $sort : {name:-1}  },
        { $project : { _id : 0, name : 1  } }
    ])
    .then(response => {
        res.json({
            response
        })
    })
    .catch(error => {
        res.json({
            message: 'An error occured!'
        })
    })
}

5. Now that you have your Aggregated data function in your Controller file, you need to add the function getUserData to the list of functions you are exporting from this ‘UserController.js’ file as follows:

module.exports = {
    users, one_user, store, update, destroy, getUserData
}

6. You need to create a route that will handle this Controller function. So, open your ‘User.js’ file and add the following line of code:

router.get('/getUserData', UserController.getUserData);

7. Next, open your terminal and type the following command to run the application:

npm start

Once you run this command, ensure that you can see the server running on PORT 3000 and Database Connection Established as follows:

Testing Our Aggregation Pipeline With Postman

Now that you have your application up and running, the next thing you need to do is test it with Postman as follows:

  • Open your Postman tool, create a GET request and paste the following URL:
http://localhost:3000/api/user/getUserData
  • Click the highlighted ‘Send’ button on the right and watch the response. Your response should be the Aggregated data as follows:

Conclusion

The Aggregation Pipeline is a great tool for processing and analyzing data in MongoDB Collections and with an easily accessible documentation and reference section to work with Aggregation Pipeline Stages and Aggregation Pipeline Operators in your application. 

In this tutorial, we covered an illustration and explanation of the Aggregation Pipeline, real-world use cases of the Aggregation Pipeline, an explanation of the syntax of the Aggregation Pipeline with code example, using an application built with Node.js and MongoDB to illustrate how the Aggregation Pipeline works and testing the Aggregation Pipeline with Postman. 

With the information you learned in this tutorial, you can set up your own Aggregation Pipeline for your own business or application needs.

Here's what you should read next if you're interested in expanding your knowledge:

FAQs

Q: How does the aggregation pipeline impact database performance?
The aggregation pipeline can affect MongoDB performance, especially with large datasets or complex operations. Properly indexed data and optimized aggregation queries help mitigate performance impacts.
Q: Can aggregation pipelines be used for real-time data processing?
Yes, aggregation pipelines can be used for real-time data processing. This enables the efficient analysis and aggregation of data streams as data is being inserted or updated in MongoDB.
Q: What are the limitations of using the aggregation pipeline in MongoDB?
Limitations include memory usage restrictions per stage, complexity increasing with multiple stages, and potential performance issues with large datasets if not properly optimized or indexed.
Deborah Emeni
Deborah Emeni
Software Engineer

Deborah is a Software Engineer and Technical Writer who specializes in Node.js and JavaScript. She is passionate about technology and sharing knowledge.

Expertise
  • Node.js
  • JavaScript
  • Serverless
  • Next.JS

Ready to start?

Get in touch or schedule a call.