How to personalise Next on scale

Next logo with a vector graphic that depicts a server stack

Have you ever wondered how you can offer personalised pages on scale with your Next application? Regardless of whether you are using static pages, server rendered pages or the new app router, each option comes with a downfall: what to do when different users should get different versions of the page on scale?

You may have faced this issue for instance while implementing A/B testing. A set of users must be shown page A and another set page B. Doesn’t that mean that every request must hit the server that calculates the correct version of the page for that request? No, it doesn’t! There’s one central concept you must know to understand how personalisation on scale is intended to work with Next.

The solution is simple on paper but may be challenging to implement. To understand why the required effort is worth it, let’s first talk about what personalisation is and why it is so important for web applications developed today.

What does personalisation mean

Personalisations are any actions you take that are intended to optimise the page content for the user that’s accessing it. Examples of personalisation are content recommendations, A/B testing or customisation based on geolocation. Personalisation has had an important role in optimising page content for users for a long time.

With the onset of AI based services, the options around personalisation will grow. By using AI, you can create curated content for customer groups or customers. Whereas previously you would have to meticulously write a single post that works for all customers, you may now be able to leverage AI to create versions that are optimised for the customer segments that are most important to you.

Scaling a personalised Next application

How does Next scale? The answer is with caching–even with personalised content. You may believe otherwise as this is something Next alone can’t get done!

Let’s imagine that we have to A/B test two different titles for our web application’s home page:

  1. “Check out our app, you’ll love it! 🤩” and
  2. “You wouldn’t understand how cool the site is anyway, don’t bother”

Currently the home page uses the pages router and is a static page created by the getStaticProps function. We want half of our visitors to visit version 1 and half to visit version 2. To do this, we can use a pseudorandom routine that sends a visitor to one or the other.

import { InferGetStaticPropsType } from 'next';

type Props = InferGetStaticPropsType<typeof getStaticProps>;

export default function HomePage({ title }: Props): JSX.Element {
  return <h1>{title}</h1>;
}

export function getStaticProps() {
  const variant = Math.random() <= 0.5 ? 'A' : 'B';
  const title = getTitleForVariant(variant);

  return {
    props: {
      title,
    },
  };
}

const getTitleForVariant = (variant: 'A' | 'B'): string =>
  ({
    A: 'Variant A',
    B: 'Variant B',
  }[variant]);

With this approach, on average every other static generation results in variant A being generated, and every other in variant B.

Didn’t you say static generation? We want that variation on every other visit!

You are right! With no incremental static generation (ISR) enabled, the site the user sees will be the one that happens to be generated build time. With ISR toggled on, the user will just see the variant that happened to be generated during the last static generation of the page. This doesn’t work!

To make sure that the variant is calculated on every page visit, we must change from a static page into a server rendered page.

export function getServerSideProps() {
  const variant = Math.random() <= 0.5 ? 'A' : 'B';
  const title = getTitleForVariant(variant);

  return {
    props: {
      title,
    },
  };
}

Eureka! It works. Now the content of the page switches roughly on every other request.

But it’s disappointing that we had to switch our very static page over to server side rendering. Now it’s consuming server resources on every request when the first result could just be reused!

To mitigate this, we can try to introduce a caching layer which stops the request from reaching the server when there has been a recent request to the page the user is trying to access. Instead of requesting the result from the server, the result is served from the cache. The Next documentation provides instructions on how to achieve this. Let’s add a caching header as instructed. The content of the header depends on the caching platform in use.

export function getServerSideProps({ res }: GetServerSidePropsContext) {
  res.setHeader('Cache-Control', 'public, s-maxage=300');

  const variant = Math.random() <= 0.5 ? 'A' : 'B';
  const title = getTitleForVariant(variant);

  return {
    props: {
      title,
    },
  };
}

Now, after the application is run in an infrastructure that infers this cache header and does what it asks, our users will be served a result from the cache. Our server no longer does unnecessary work.

You have the same problem as before! Users get served what’s in the cache, not a random variant.

Ah, shucks. You are correct. Users get served with what’s in the cache and the content of the cache updates when the content expires. At that time a request will reach the server where either an A or B result is generated, which is subsequently stored into the cache and served to following users.

In a traditional network infrastructure for a web application, the cache layer comes before any business logic. That means user-based personalisation can never be efficiently cached. In this model, the personalisation in the cache is the personalisation that the user is served with.

If that’s true, how can caching be the answer to scaling!?

This is among the reasons for why Vercel’s marketing is full of the word edge. Vercel’s platform comes with edge middleware capabilities. When you serve your Next app with Vercel, your Next middleware gets automatically run on the edge as edge middleware. That comes with many implications. First and foremost, edge computing is intended to allow a cloud application to execute closer to the geographic location of the user. This decreases latency due to a shorter travel time. Instead of reaching a datacenter in Ireland from Dusseldorf Germany, your request may be served from an edge network node that’s in Dusseldorf.

This kind of edge infrastructure can be used to serve cached content. But with edge computing, the edge node may also contain application logic. Instead of all logic existing in the datacenter, some of it is pushed out and duplicated to the edges. Typically, this logic is a form of serverless function and has some limitations when it comes to for instance execution time.

On Vercel’s platform request flow is orchestrated so that edge middleware (“serverless function run close to the geographic location of the user”) runs before the cache layer is checked for a hit. This means that you can operate on the request before it’s forwarded to the caching mechanism. In essence, you get to pick which result you want to serve to the user from the cache.

Vercel has built a platform that does many difficult things for you. One example of that is that the platform hoists your middleware into an edge middleware which gets executed before the cache layer is checked for a hit. Note that it doesn’t necessarily matter that the middleware is executed on the edge. What matters is that we can write logic that changes a request before it reaches the cache layer.

The solution

When you want to scale with Next, you should populate your cache with the different variants you want to offer and then pick a suitable result from the cached results for a given user. This is a feature of the platform you use to serve your Next application, not Next itself.

Let’s assume that we are using Vercel’s platform and adjust our example to deal with that. Now that we have a way to leverage caching, let’s revert to a statically generated page.

Whereas previously the site was saved as index under pages root, we’ll now move it to pages/home-page/[variant].tsx. We’ll also change the variant determination logic to use the new variant parameter.

export function getStaticPaths() {
  return {
    paths: [{ params: { slug: "a" } }, { params: { slug: "b" } }],
  };
}

export function getStaticProps({ params }: GetStaticPropsContext) {
  const variant = params?.variant?.toUppercase() as "A" | "B";
  const title = getTitleForVariant(variant);

  return {
    props: {
      title,
    },
  };
}

In addition, we have to create a middleware.ts file to the root of the project. It’s where we personalise the returned page for a given request.

import { NextRequest, NextResponse } from "next/server";

export const config = {
  matcher: ["/"],
};

export default function middleware(req: NextRequest) {
  const bucketCookie = req.cookies.get("bucket")?.value;
  const bucket = bucketCookie ?? selectBucket();

  // Create a rewrite to the page matching the bucket
  const url = req.nextUrl.clone();
  url.pathname = `/home-page/${bucket}`;
  const res = NextResponse.rewrite(url);

  if (!bucketCookie) {
    res.cookies.set("bucket", bucket);
  }

  return res;
}

const selectBucket = () => (Math.random() <= 0.5 ? "a" : "b");

This middleware has been configured to only run on the application index. It first checks whether the user has been assigned to a bucket previously (stored in cookies). If not, it selects a bucket for the user. It then creates a rewrite to the intended version of the home page. If the user did not have their bucket set in the cookies, it’s set. Finally, the result is returned. The user will see that they are in the root page of the web application, but will have been served either home-page/a or home-page/b.

What if we are not using Vercel?

If you can’t use Vercel, you may have to configure the cloud infrastructure you use on your own. What you need to do depends on the platform you are on. Some platforms may offer similar support for Next applications as Vercel does, but nearly not all. In some cases, you may not be able to implement a similar pattern in full or at all. It may take considerable effort on some platforms.

Note that this isn’t some trick by Vercel to make money. Leveraging caching for scaling has been a continuing trend in the industry. This approach enables personalisation while relying on cache to control your costs and performance. Some application logic is moved in front of the cache when it would traditionally be behind it. The downside in this pattern is its complexity. Managing application logic in front of the cache is not a straightforward thing to do. Vercel has managed to provide an option that hides the complexity, resulting in a great DX. It’s commendable how most developers don’t even need to understand what conceptual leap it takes to move the middleware in front of the cache.

What would it take when using Azure?

As a way of showcasing the complexity, let’s think about how this could be achieved while using Azure as your cloud platform. We’ll not go in depth; we’ll just quickly touch on a possible high-level architecture that could get it done.

When it comes to Azure itself, you need a caching mechanism, something that redirects the request to the logic that sits before the cache and the logic before the cache. In principle this setup could look like the following:

  1. Azure Front Door
  2. Azure Function
  3. Azure Front Door Cache
  4. Azure App Service (Next)

In this approach Azure Front Door would route a request to an Azure Function which would include the logic that would be in an edge middleware on Vercel. It would then forward the request to Azure Front Door and the caching that’s built into it. If no cache hit is found, the request would reach the actual service.

That’s the infrastructure side. Already, we need to have an engineer working on this that has a wide knowledge base. They need to know how Next works, but also be very familiar with Azure products and be able to combine a few of them to get the intended outcome.

What the Azure solution does not include

But this alone does not cut it. Let’s take our focus away from the cloud architecture and look at the development experience. If we’ve used containers to deploy the application onto servers, we now need to introduce a new architectural pattern: serverless functions. This is, again, something that places pressure on the skills of the engineering team. Even if the engineers are well versed in the technology, the project may now need to deal with concepts from both the containerised applications, server run applications and serverless applications. This is a lot of complexity to add just to be able to run some logic before the cache.

But it does not end there. In Next with Vercel, the middleware file is hoisted into an edge middleware. In our model there’s no such support. In Vercel’s model the logic before the cache can be developed on the same tooling as the rest of the application is–it’s part of the same project. If we just add a serverless function in the mix per the instructions Azure gives us, we may end up with a separate project for the logic before the cache. In that case we have something very custom we must document well and maintain ourselves. I’m not saying that it’s not possible to hoist the middleware in the same way as has been done on Vercel’s platform. I’m just highlighting that it’s something that needs to be built by your own team.

Note that after all that, we still haven’t talked about the edge features. With this setup you do not have edge support. Azure has at least some edge capabilities, but again, your own team would need to develop the support if those features are relevant to you.

Afterword

With all that said, there are a few conclusions to be made. First, if you are using Vercel, you are in luck as implementing this pattern of scaling will not require much extra setup. If you are serving your application on some other platform, you may have to go through more effort to use this model for scaling.

Secondly, this approach to scaling only works with content that caches well. There are cases of personalisation where the served content can’t be effectively cached. For instance, some machine learning models want every request to reach its service so that different content can be tested on the fly. In that case you have to ignore the cache. Some content can’t be cached for security or privacy reasons such as user specific data. In the aforementioned cases you need to optimise in other ways.

Lastly, as always, you must ask yourself whether you need the scaling. In some cases, it may not be worth your money. You can for instance try to use the middleware tricks introduced in this post without any custom cache, in which case you might be able to avoid any infrastructure related work.