Thoughts on Node middleware being an impure function

Dong Chen
5 min readSep 12, 2018
“selective focus photography of pile of decorative stones” by Jeppe Hove Jensen on Unsplash

This post discusses a usage pattern of middleware in Node.js, and my thoughts on its pros and cons. I’ll start with a brief introduction to the concept of middleware. If you are familiar with middleware, you can jump to [Con of middleware]

What is middleware

Middleware are functions that handle request before the route handler. It has access to request object, response object, and a next function. The main usage of middleware is to augment request or response with new data, and that data will be accessible in subsequent middleware and route handlers. Calling `next` will pass the augmented request and response to the next handler. Calling next with an error object will direct node to error handling. A middleware can be applied to all paths, or only a specific path (so only requests to that path will go through the middleware)

The pattern that a function augments, or mutates, its parameter is known as impure function. Middleware is an impure function, and it has pros and cons.

Pro of middleware

With middleware we can augment the request object by attaching data to it. Some popular usage of middleware include parsing request body, authentication, and logging. In fact, these usages are so common that many third party middleware are developed for share.

With middleware we can avoid duplication if that function is needed in multiple place. For example, if our routes are only accessible to authenticated user, we can apply an authentication middleware to those routes instead of explicitly doing authentication in every route handler.

Con of middleware

Middleware is such a cool pattern that we can turn route handler into a list of middleware:

In each middleware, we get data the client needs and attaches it to request object. Each middleware is a piece of self contained component that gets its job done, and all we need to do is to stack these pieces together. In the last middleware, we can send data back to client

We can reuse all these middleware. Depending on the required response data of that route, we have combine those middleware in different ways as handler. Cool?

Yet it has problems. In general, an impure function has known problems such as testability, reproducibility, etc (see this and this). In the node middleware context, I see the following issues.

First, it is difficult to track where and how req is mutated. For example, you are expecting amount in req.clientData but something goes wrong. How do you debug? You have to look into each middleware, and find out which one attaches amount to req.clientData. Things get worse if the naming of that middleware does not match with what it does.

Second, it is possible that the latter middleware will overwrite data in req object. For example, req.clientData.amount that is set in midA can be accidentally modified or overwritten by midB, like

midB accidentally resets req.clientData and then it will completely overwrite whatever data previous middleware has set to req.clientData!!

Yes, you probably can do this:

This avoids the problem, but still is not ideal.

Third, the ordering of middleware could matter. midB may rely on the data that was generated in midA, thus midB must be placed after midA. Yet this type of dependency is not explicitly represented anywhere. Imagine you have tens of middleware and you want to reorder it, how do you make sure things do not break?

Fourth, promises cannot be evoked in parallel. Middleware are designed to be executed in sequence. If midA and midB each includes an async operation (e.g. service request, database operation) and the operations are independent from each other, ideally we should send the request in parallel. Yet it is not possible with middleware. This slows down server performance unnecessarily.

Suggestion

I don’t really have an ideal solution, but by following some conventions we probably can make better use of goodness of middleware while having less problem.

  1. Use middleware for data that are supposed to be read-only. For example, you can use authentication as a middleware and save the info in req.user (rather than req.clientData.user) because you probably won’t need to modify it in any other middleware.
  2. For data that will be modified, have all the relevant logic in one place. For example, you probably need to save multiple pieces of data in req.clientData. Instead of using multiple middleware to get the data and modify req.clientData, put them in one function. That function can be a middleware, or a route handler.

This way we have a clear view of what gets into client data and from where. The ordering dependencies are clear, and we can have parallel promise.

Conclusion

Being an impure function is a two edged sword for middleware. On one hand, it makes passing data convenient; on the other hand, it is harder to debug, hides dependency relationship between middleware, and prevents parallel async requests. My suggestion is instead of doing all the side effects in distributed middleware, have all the “dirty” things in one place, so that we make dependency explicit, and can easily make parallel calls.

Follow me on Twitter!

If you find my post useful, don’t get to give claps! I’m starting to use twitter, so follow me there https://twitter.com/imDongCHEN

--

--

Dong Chen

Web engineer @robinhood; PhD in Human-Computer Interaction