Thoughts on Node middleware being an impure function
This post discusses a usage pattern of middleware in Node.js, and my thoughts on its pros and cons. I’ll start with a brief introduction to the concept of middleware. If you are familiar with middleware, you can jump to [Con of middleware]
What is middleware
Middleware are functions that handle request before the route handler. It has access to
response object, and a
next function. The main usage of middleware is to augment
response with new data, and that data will be accessible in subsequent middleware and route handlers. Calling `next` will pass the augmented
response to the next handler. Calling
next with an error object will direct node to error handling. A middleware can be applied to all paths, or only a specific path (so only requests to that path will go through the middleware)
The pattern that a function augments, or mutates, its parameter is known as impure function. Middleware is an impure function, and it has pros and cons.
Pro of middleware
With middleware we can augment the
request object by attaching data to it. Some popular usage of middleware include parsing request body, authentication, and logging. In fact, these usages are so common that many third party middleware are developed for share.
With middleware we can avoid duplication if that function is needed in multiple place. For example, if our routes are only accessible to authenticated user, we can apply an authentication middleware to those routes instead of explicitly doing authentication in every route handler.
Con of middleware
Middleware is such a cool pattern that we can turn route handler into a list of middleware:
In each middleware, we get data the client needs and attaches it to
request object. Each middleware is a piece of self contained component that gets its job done, and all we need to do is to stack these pieces together. In the last middleware, we can send data back to client
We can reuse all these middleware. Depending on the required response data of that route, we have combine those middleware in different ways as handler. Cool?
First, it is difficult to track where and how
req is mutated. For example, you are expecting
req.clientData but something goes wrong. How do you debug? You have to look into each middleware, and find out which one attaches
req.clientData. Things get worse if the naming of that middleware does not match with what it does.
Second, it is possible that the latter middleware will overwrite data in
req object. For example,
req.clientData.amount that is set in
midA can be accidentally modified or overwritten by
midB accidentally resets
req.clientData and then it will completely overwrite whatever data previous middleware has set to
Yes, you probably can do this:
This avoids the problem, but still is not ideal.
Third, the ordering of middleware could matter.
midB may rely on the data that was generated in
midB must be placed after
midA. Yet this type of dependency is not explicitly represented anywhere. Imagine you have tens of middleware and you want to reorder it, how do you make sure things do not break?
Fourth, promises cannot be evoked in parallel. Middleware are designed to be executed in sequence. If
midB each includes an async operation (e.g. service request, database operation) and the operations are independent from each other, ideally we should send the request in parallel. Yet it is not possible with middleware. This slows down server performance unnecessarily.
I don’t really have an ideal solution, but by following some conventions we probably can make better use of goodness of middleware while having less problem.
- Use middleware for data that are supposed to be read-only. For example, you can use authentication as a middleware and save the info in
req.clientData.user) because you probably won’t need to modify it in any other middleware.
- For data that will be modified, have all the relevant logic in one place. For example, you probably need to save multiple pieces of data in
req.clientData. Instead of using multiple middleware to get the data and modify
req.clientData, put them in one function. That function can be a middleware, or a route handler.
This way we have a clear view of what gets into client data and from where. The ordering dependencies are clear, and we can have parallel promise.
Being an impure function is a two edged sword for middleware. On one hand, it makes passing data convenient; on the other hand, it is harder to debug, hides dependency relationship between middleware, and prevents parallel async requests. My suggestion is instead of doing all the side effects in distributed middleware, have all the “dirty” things in one place, so that we make dependency explicit, and can easily make parallel calls.
Follow me on Twitter!
If you find my post useful, don’t get to give claps! I’m starting to use twitter, so follow me there https://twitter.com/imDongCHEN