This post discusses a usage pattern of middleware in Node.js, and my thoughts on its pros and cons. I’ll start with a brief introduction to the concept of middleware. If you are familiar with middleware, you can jump to [Con of middleware]
What is middleware
Middleware are functions that handle request before the route handler. It has access to request
object, response
object, and a next
function. The main usage of middleware is to augment request
or response
with new data, and that data will be accessible in subsequent middleware and route handlers. Calling `next` will pass the augmented request
and response
to the next handler. Calling next
with an error object will direct node to error handling. A middleware can be applied to all paths, or only a specific path (so only requests to that path will go through the middleware)
The pattern that a function augments, or mutates, its parameter is known as impure function. Middleware is an impure function, and it has pros and cons.
Pro of middleware
With middleware we can augment the request
object by attaching data to it. Some popular usage of middleware include parsing request body, authentication, and logging. In fact, these usages are so common that many third party middleware are developed for share.
With middleware we can avoid duplication if that function is needed in multiple place. For example, if our routes are only accessible to authenticated user, we can apply an authentication middleware to those routes instead of explicitly doing authentication in every route handler.
Con of middleware
Middleware is such a cool pattern that we can turn route handler into a list of middleware:
In each middleware, we get data the client needs and attaches it to request
object. Each middleware is a piece of self contained component that gets its job done, and all we need to do is to stack these pieces together. In the last middleware, we can send data back to client
We can reuse all these middleware. Depending on the required response data of that route, we have combine those middleware in different ways as handler. Cool?
Yet it has problems. In general, an impure function has known problems such as testability, reproducibility, etc (see this and this). In the node middleware context, I see the following issues.
First, it is difficult to track where and how req
is mutated. For example, you are expecting amount
in req.clientData
but something goes wrong. How do you debug? You have to look into each middleware, and find out which one attaches amount
to req.clientData
. Things get worse if the naming of that middleware does not match with what it does.
Second, it is possible that the latter middleware will overwrite data in req
object. For example, req.clientData.amount
that is set in midA
can be accidentally modified or overwritten by midB
, like
midB
accidentally resets req.clientData
and then it will completely overwrite whatever data previous middleware has set to req.clientData
!!
Yes, you probably can do this:
This avoids the problem, but still is not ideal.
Third, the ordering of middleware could matter. midB
may rely on the data that was generated in midA
, thus midB
must be placed after midA
. Yet this type of dependency is not explicitly represented anywhere. Imagine you have tens of middleware and you want to reorder it, how do you make sure things do not break?
Fourth, promises cannot be evoked in parallel. Middleware are designed to be executed in sequence. If midA
and midB
each includes an async operation (e.g. service request, database operation) and the operations are independent from each other, ideally we should send the request in parallel. Yet it is not possible with middleware. This slows down server performance unnecessarily.
Suggestion
I don’t really have an ideal solution, but by following some conventions we probably can make better use of goodness of middleware while having less problem.
- Use middleware for data that are supposed to be read-only. For example, you can use authentication as a middleware and save the info in
req.user
(rather thanreq.clientData.user
) because you probably won’t need to modify it in any other middleware. - For data that will be modified, have all the relevant logic in one place. For example, you probably need to save multiple pieces of data in
req.clientData
. Instead of using multiple middleware to get the data and modifyreq.clientData
, put them in one function. That function can be a middleware, or a route handler.
This way we have a clear view of what gets into client data and from where. The ordering dependencies are clear, and we can have parallel promise.
Conclusion
Being an impure function is a two edged sword for middleware. On one hand, it makes passing data convenient; on the other hand, it is harder to debug, hides dependency relationship between middleware, and prevents parallel async requests. My suggestion is instead of doing all the side effects in distributed middleware, have all the “dirty” things in one place, so that we make dependency explicit, and can easily make parallel calls.
Follow me on Twitter!
If you find my post useful, don’t get to give claps! I’m starting to use twitter, so follow me there https://twitter.com/imDongCHEN