Optimizing site performance by "lazy loading" images

Recently, I've been spending some time making performance improvements to my site. In my previous blog post on this topic, I described my progress optimizing the JavaScript and CSS usage on my site, and concluded that image optimization was the next step.

Last summer I published a blog post about my vacation in Acadia National Park. Included in that post are 13 photos with a combined size of about 4 MB.

When I benchmarked that post with https://webpagetest.org, it showed that it took 7.275 seconds (blue vertical line) to render the page.

The graph shows that the browser downloaded all 13 images to render the page. Why would a browser download all images if most of them are below the fold and not shown until a user starts scrolling? It makes very little sense.

As you can see from the graph, downloading all 13 images take a very long time (purple horizontal bars). No matter how much you optimize your CSS and JavaScript, this particular blog post would have remained slow until you optimize how images are loaded.

Webpagetest images february before

"Lazy loading" images is one solution to this problem. Lazy loading means that the images aren't loaded until the user scrolls and the images come into the browser's viewport.

You might have seen lazy loading in action on websites like Facebook, Pinterest or Medium. It usually goes like this:

  • You visit a page as you normally would, scrolling through the content.
  • Instead of the actual image, you see a blurry placeholder image.
  • Then, the placeholder image gets swapped out with the final image as quickly as possible.
An animated GIF of a user scrolling a webpage and a placeholder images being replaced by the final image

To support lazy loading images on my blog I do three things:

  1. Automatically generate lightweight yet useful placeholder images.
  2. Embed the placeholder images directly in the HTML to speed up performance.
  3. Replace the placeholder images with the real images when they become visible.

Generating lightweight placeholder images

To generate lightweight placeholder images, I implemented a technique used by Facebook: create a tiny image that is a downscaled version of the original image, strip out the image's metadata to optimize its size, and let the browser scale the image back up.

To create lightweight placeholder images, I resized the original images to be 5 pixels wide. Because I have about 10,000 images on my blog, my Drupal-based site automates this for me, but here is how you create one from the command line using ImageMagick's convert tool:

$ convert -resize 5x -strip original.jpg placeholder.jpg
  • -resize 5x resizes the image to be 5 pixels wide while maintaining its aspect ratio.
  • -strip removes all comments and redundant headers in the image. This helps make the image's file size as small as possible.

The resulting placeholder images are tiny — often shy of 400 bytes.

Large metal pots with wooden lids in which lobsters are boiled
The original image that we need to generate a placeholder for.
An example placeholder image shows brown and black tones
The generated placeholder, scaled up by a browser from a tiny image that is 5 pixels wide. The size of this placeholder image is only 395 bytes.

Here is another example to illustrate how the colors in the placeholders nicely match the original image:

A sunrise with beautiful reds and black silhouettes
An example placeholder image that shows red and black tones

Even though the placeholder image should only be shown for a fraction of a second, making them relevant is a nice touch as they suggest what is coming. It's also an important touch, as users are very impatient with load times on the web.

Embedding placeholder images directly in HTML

One not-so-well-known feature of the <img> element is that you can embed an image directly into the HTML document using the data URL scheme:

<img src="
  DAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQICAQECAQEB
  AgICAgICAgICAQICAgICAgICAgL/2wBDAQEBAQEBAQEBAQECAQEBAgICAgICA
  gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgL/wA
  ARCAADAAUDAREAAhEBAxEB/8QAFAABAAAAAAAAAAAAAAAAAAAACf/EAB8QAAM
  AAQMFAAAAAAAAAAAAAAECAwcEBQYACAkTMf/EABUBAQEAAAAAAAAAAAAAAAAA
  AAIG/8QAJBEAAQIFAgcAAAAAAAAAAAAAAQURAAIDBDEGFBIVQUVRcYH/2gAMA
  wEAAhEDEQA/AAeyb5HO8o8lSLZd01Jz2nbKoK4yxDVvZqYl7uaV4CWdmZQSSS
  ST86FJBsafEJK15KD05ioNk4G6Yeg0V9bVCmZpXt08sB2hJ8DJ2Tn7H/2Q==" />

Data URLs are composed of four parts: the data: prefix, a media type indicating the type of data (image/jpg), an optional base64 token to indicate that the data is base64 encoded, and the base64 encoded image data itself.

data:[<media type>][;base64],<data>

To base64 encode an image from the command line, use:

$ base64 placeholder.jpg

To base64 encode an image in PHP, use:

$data =  base64_encode(file_get_contents('placeholder.jpg'));

What is the advantage of embedding a base64 encoded image using a data URL? It eliminates HTTP requests as the browser doesn't have to set up new HTTP connections to download the images. Fewer HTTP requests usually means faster page load times.

Replacing placeholder images with real images

Next, I used JavaScript's IntersectionObserver to replace the placeholder image with the actual image when it comes into the browser's viewport. I followed Jeremy Wagner's approach shared on Google Web Fundamentals Guide on lazy loading images — with some adjustments.

It starts with the following HTML markup:

<img class="lazy" src="placeholder.jpg" data-src="original.jpg" />

The three relevant pieces are:

  1. The class="lazy" attribute, which is what you'll select the element with in JavaScript.
  2. The src attribute, which references the placeholder image that will appear when the page first loads. Instead of linking to placeholder.jpg I embed the image data using the data URL technique explained above.
  3. The data-src attribute, which contains the URL to the original image that will replace the placeholder when it comes in focus.

Next, we use JavaScript's IntersectionObserver to replace the placeholder images with the actual images:

document.addEventListener('DOMContentLoaded', function() {
  var lazyImages = [].slice.call(document.querySelectorAll('img.lazy'));

  if ('IntersectionObserver' in window) {
    let lazyImageObserver = new IntersectionObserver(
      function(entries, observer) {
        entries.forEach(function(entry) {
          if (entry.isIntersecting) {
            let lazyImage = entry.target;
            lazyImage.src = lazyImage.dataset.src;
            lazyImageObserver.unobserve(lazyImage);
          }
        });
    });

    lazyImages.forEach(function(lazyImage) {
      lazyImageObserver.observe(lazyImage);
    });
  }
  else {
    // For browsers that don't support IntersectionObserver yet,
    // load all the images now:
    lazyImages.forEach(function(lazyImage) {
      lazyImage.src = lazyImage.dataset.src;
    });
  }
});

This JavaScript code queries the DOM for all <img> elements with the lazy class. The IntersectionObserver is used to replace the placeholder image with the original image when the img.lazy elements enter the viewport. When IntersectionObserver is not supported, the images are replaced on the DOMContentLoaded event.

By default, the IntersectionObserver's callback is triggered the moment a single pixel of the image enters the browser's viewport. However, using the rootMargin property, you can trigger the image swap before the image enters the viewport. This reduces or eliminates the visual or perceived lag time when swapping a placeholder image for the actual image.

I implemented that on my site as follows:

const config = {
    // If the image gets within 250px of the browser's viewport, 
    // start the download:
    rootMargin: '250px 0px',
  };

let lazyImageObserver = new IntersectionObserver(..., config);

Lazy loading images drastically improves performance

After making these changes to my site, I did a new https://webpagetest.org benchmark run:

A diagram that shows page load times for dri.es before making performance improvements

You can clearly see that the page became a lot faster to render:

  • The document is complete after 0.35 seconds (blue vertical line) instead of the original 7.275 seconds.
  • No images are loaded before the document is complete, compared to 13 images being loaded before.
  • After the document is complete, one image (purple horizontal bar) is downloaded. This is triggered by the JavaScript code as the result of one image being above the fold.

Lazy loading images improves web page performance by reducing the number of HTTP requests, and consequently reduces the amount of data that needs to be downloaded to render the initial page.

Is base64 encoding images bad for SEO?

Faster sites have a SEO advantage as page speed is a ranking factor for search engines. But, lazy loading might also be bad for SEO, as search engines have to be able to discover the original images.

To find out, I headed to Google Search Console. Google Search Console has a "URL inspection" feature that allows you to look at a webpage through the eyes of Googlebot.

I tested it out with my Acadia National Park blog post. As you can see in the screenshot, the first photo in the blog post was not loaded. Googlebot doesn't seem to support data URLs for images.

A screenshot that shows Googlebot doesn't render placeholder images that are embedded using data URLs

Is IntersectionObserver bad for SEO?

The fact that Googlebot doesn't appear to support data URLs does not have to be a problem. The real question is whether Googlebot will scroll the page, execute the JavaScript, replace the placeholders with the actual images, and index those. If it does, it doesn't matter that Googlebot doesn't understand data URLs.

To find out, I decided to conduct an experiment. For the experiment, I published a blog post about Matt Mullenweg and me visiting a museum together. The images in that blog post are lazy loaded and can only be discovered by Google if its crawler executes the JavaScript and scrolls the page. If those images show up in Google's index, we know there is no SEO impact.

I only posted that blog post yesterday. I'm not sure how long it takes for Google to make new posts and images available in its index, but I'll keep an eye out for it.

If the images don't show up in Google's index, lazy loading might impact your SEO. My solution would be to selectively disable lazy loading for the most important images only. (Note: even if Google finds the images, there is no guarantee that it will decide to index them — short blog posts and images are often excluded from Google's index.)

Conclusions

Lazy loading images improves web page performance by reducing the number of HTTP requests and data needed to render the initial page.

Ideally, over time, browsers will support lazy loading images natively, and some of the SEO challenges will no longer be an issue. Until then, consider adding support for lazy loading yourself. For my own site, it took about 40 lines of JavaScript code and 20 lines of additional PHP/Drupal code.

I hope that by sharing my experience, more people are encouraged to run their own sites and to optimize their sites' performance.

Two internet entrepreneurs walk into an old publishing house

A month ago, Matt Mullenweg, co-founder of WordPress and founder of Automattic, visited me in Antwerp, Belgium. While I currently live in Boston, I was born and raised in Antwerp, and also started Drupal there.

We spent the morning together walking around Antwerp and visited the Plantin Moretus Museum.

The museum is the old house of Christophe Plantin, where he lived and worked around 1575. At the time, Plantin had the largest printing shop in the world, with 56 employees and 16 printing presses. These presses printed 1,250 sheets per day.

Today, the museum hosts the two oldest printing presses in the world. In addition, the museum has original lead types of fonts such as Garamond and hundreds of ancient manuscripts that tell the story of how writing evolved into the art of printing.

The old house, printing business, presses and lead types are the earliest witnesses of a landmark moment in history: the invention of printing, and by extension, the democratization of publishing, long before our digital age. It was nice to visit that together with Matt as a break from our day-to-day focus on web publishing.

An old printing press at the Plantin Moretus Museum

Dries and Matt in front of the oldest printing presses in the world

An old globe at the Plantin Moretus Museum

Why the EU Copyright Directive is a threat to the Open Web

An image of a copyright sign

After much debate, the EU Copyright Directive is now moving to a final vote in the European Parliament. The directive, if you are not familiar, was created to prohibit spreading copyrighted material on internet platforms, protecting the rights of creators (for example, many musicians have supported this overhaul).

The overall idea behind the directive — compensating creators for their online works — makes sense. However, the implementation and execution of the directive could have a very negative impact on the Open Web. I'm surprised more has not been written about this within the web community.

For example, Article 13 requires for-profit online services to implement copyright filters for user-generated content, which includes comments on blogs, reviews on commerce sites, code on programming sites or possibly even memes and cat photos on discussion forums. Any for-profit site would need to apply strict copyright filters on content uploaded by a site's users. If sites fail to correctly filter copyrighted materials, they will be directly liable to rights holders for expensive copyright infringement violations.

While implementing copyright filters may be doable for large organizations, it may not be for smaller organizations. Instead, small organizations might decide to stop hosting comments or reviews, or allowing the sharing of code, photos or videos. The only for-profit organizations potentially excluded from these requirements are companies earning less than €10 million a year, until they have been in business for three years. It's not a great exclusion, because there are a lot of online communities that have been around for more than three years and don't make more than €10 million a year.

The EU tends to lead the way when it comes to internet legislation. For example, GDPR has proven successful for consumer data protection and has sparked state-by-state legislation in the United States. In theory, the EU Copyright Directive could do the same thing for modern internet copyright law. My fear is that in practice, these copyright filters, if too strict, could discourage the free flow of information and sharing on the Open Web.

Optimizing site performance by reducing JavaScript and CSS

I've been thinking about the performance of my site and how it affects the user experience. There are real, ethical concerns to poor web performance. These include accessibility, inclusion, waste and environmental concerns.

A faster site is more accessible, and therefore more inclusive for people visiting from a mobile device, or from areas in the world with slow or expensive internet.

For those reasons, I decided to see if I could improve the performance of my site. I used the excellent https://webpagetest.org to benchmark a simple blog post https://dri.es/relentlessly-eliminating-barriers-to-growth.

A diagram that shows page load times for dri.es before making performance improvements

The image above shows that it took a browser 0.722 seconds to download and render the page (see blue vertical line):

  • The first 210 milliseconds are used to set up the connection, which includes the DNS lookup, TCP handshake and the SSL negotiation.
  • The next 260 milliseconds (from 0.21 seconds to 0.47 seconds) are spent downloading the rendered HTML file, two CSS files and one JavaScript file.
  • After everything is downloaded, the final 330 milliseconds (from 0.475 seconds to 0.8 seconds) are used to layout the page and execute the JavaScript code.

By most standards, 0.722 seconds is pretty fast. In fact, according to HTTP Archive, it takes more than 2.4 seconds to download and render the average web page on a laptop or desktop computer.

Regardless, I noticed that the length of the horizontal green bars and the horizontal yellow bar was relatively long compared to that of the blue bar. In other words, a lot of time is spent downloading JavaScript (yellow horizontal bar) and CSS (two green horizontal bars) instead of the HTML, including the actual content of the blog post (blue bar).

To fix, I did two things:

  1. Use vanilla JavaScript. I replaced my jQuery-based JavaScript with vanilla JavaScript. Without impacting the functionality of my site, the amount of JavaScript went from almost 45 KB to 699 bytes, good for a savings of over 6,000 percent.
  2. Conditionally include CSS. For example, I use Prism.js for syntax highlighting code snippets in blog posts. prism.css was downloaded for every page request, even when there were no code snippets to highlight. Using Drupal's render system, it's easy to conditionally include CSS. By taking advantage of that, I was able to reduce the amount of CSS downloaded by 47 percent — from 4.7 KB to 2.5 KB.

According to the January 1st, 2019 run of HTTP Archive, the median page requires 396 KB of JavaScript and 60 KB of CSS. I'm proud that my site is well under these medians.

File type Dri.es before Dri.es after World-wide median
JavaScript 45 KB 669 bytes 396 KB
CSS 4.7 KB 2.5 KB 60 KB

Because the new JavaScript and CSS files are significantly smaller, it takes the browser less time to download, parse and render them. As a result, the same blog post is now available in 0.465 seconds instead of 0.722 seconds, or 35% faster.

After a new https://webpagetest.org test run, you can clearly see that the bars for the CSS and JavaScript files became visually shorter:

A diagram that shows page load times for dri.es reduced after making performance improvements

To optimize the user experience of my site, I want it to be fast. I hope that others will see that bloated websites can come at a great cost, and will consider using tools like https://webpagetest.org to make their sites more performant.

I'll keep working on making my website even faster. As a next step, I plan to make pages with images faster by using lazy image loading.

Headless CMS: REST vs JSON:API vs GraphQL

An abstract image of three boxes

The web used to be server-centric in that web content management systems managed data and turned it into HTML responses. With the rise of headless architectures a portion of the web is becoming server-centric for data but client-centric for its presentation; increasingly, data is rendered into HTML in the browser.

This shift of responsibility has given rise to JavaScript frameworks, while on the server side, it has resulted in the development of JSON:API and GraphQL to better serve these JavaScript applications with content and data.

In this blog post, we will compare REST, JSON:API and GraphQL. First, we'll look at an architectural, CMS-agnostic comparison, followed by evaluating some Drupal-specific implementation details.

It's worth noting that there are of course lots of intricacies and "it depends" when comparing these three approaches. When we discuss REST, we mean the "typical REST API" as opposed to one that is extremely well-designed or following a specification (not REST as a concept). When we discuss JSON:API, we're referring to implementations of the JSON:API specification. Finally, when we discuss GraphQL, we're referring to GraphQL as it used in practice. Formally, it is only a query language, not a standard for building APIs.

The architectural comparison should be useful for anyone building decoupled applications regardless of the foundation they use because the qualities we will evaluate apply to most web projects.

To frame our comparisons, let's establish that most developers working with web services care about the following qualities:

  1. Request efficiency: retrieving all necessary data in a single network round trip is essential for performance. The size of both requests and responses should make efficient use of the network.
  2. API exploration and schema documentation: the API should be quickly understandable and easily discoverable.
  3. Operational simplicity: the approach should be easy to install, configure, run, scale and secure.
  4. Writing data: not every application needs to store data in the content repository, but when it does, it should not be significantly more complex than reading.

We summarized our conclusions in the table below, but we discuss each of these four categories (or rows in the table) in more depth below. If you aggregate the colors in the table, you see that we rank JSON:API above GraphQL and GraphQL above REST for Drupal core's needs.

REST JSON:API GraphQL
Request efficiency Poor; multiple requests are needed to satisfy common needs. Responses are bloated. Excellent; a single request is usually sufficient for most needs. Responses can be tailored to return only what is required. Excellent; a single request is usually sufficient for most needs. Responses only include exactly what was requested.
Documentation, API explorability and schema Poor; no schema, not explorable. Acceptable; generic schema only; links and error messages are self-documenting. Excellent; precise schema; excellent tooling for exploration and documentation.
Operational simplicity Acceptable; works out of the box with CDNs and reverse proxies; few to no client-side libraries required. Excellent; works out of the box with CDNs and reverse proxies, no client-side libraries needed, but many are available and useful. Poor; extra infrastructure is often necessary client side libraries are a practical necessity, specific patterns required to benefit from CDNs and browser caches.
Writing data Acceptable; HTTP semantics give some guidance but how specifics left to each implementation, one write per request. Excellent; how writes are handled is clearly defined by the spec, one write per request, but multiple writes is being added to the specification. Poor; how writes are handled is left to each implementation and there are competing best practices, it's possible to execute multiple writes in a single request.

If you're not familiar with JSON:API or GraphQL, I recommend you watch the following two short videos. They will provide valuable context for the remainder of this blog post:

Request efficiency

Most REST APIs tend toward the simplest implementation possible: a resource can only be retrieved from one URI. If you want to retrieve article 42, you have to retrieve it from https://example.com/article/42. If you want to retrieve article 42 and article 72, you have to perform two requests; one to https://example.com/article/42 and one to https://example.com/article/72. If the article's author information is stored in a different content type, you have to do two additional requests, say to https://example.com/author/3 and https://example.com/author/7. Furthermore, you can't send these requests until you've requested, retrieved and parsed the article requests (you wouldn't know the author IDs otherwise).

Consequently, client-side applications built on top of basic REST APIs tend to need many successive requests to fetch their data. Often, these requests can't be sent until earlier requests have been fulfilled, resulting in a sluggish experience for the website visitor.

GraphQL and JSON:API were developed to address the typical inefficiency of REST APIs. Using JSON:API or GraphQL, you can use a single request to retrieve both article 42 and article 72, along with the author information for each. It simplifies the developer experience, but more importantly, it speeds up the application.

Finally, both JSON:API and GraphQL have a solution to limit response sizes. A common complaint against typical REST APIs is that their responses can be incredibly verbose; they often respond with far more data than the client needs. This is both annoying and inefficient.

GraphQL eliminates this by requiring the developer to explicitly add each desired resource field to every query. This makes it difficult to over-fetch data but easily leads to very large GraphQL queries, making (cacheable) GET requests impossible.

JSON:API solves this with the concept of sparse fieldsets or lists of desired resource fields. These behave in much the same fashion as GraphQL does, however, when they're omitted JSON:API will typically return all fields. An advantage, though, is that when a JSON:API query gets too large, sparse fieldsets can be omitted so that the request remains cacheable.

REST JSON:API GraphQL
Multiple data objects in a single response Usually; but every implementation is different (for Drupal: custom "REST Export" view or custom REST plugin needed). Yes Yes
Embed related data (e.g. the author of each article) No Yes Yes
Only needed fields of a data object No Yes; servers may choose sensible defaults, developers must be diligent to prevent over-fetching. Yes; strict, but eliminates over-fetching, at the extreme, it can lead to poor cacheability.

Documentation, API explorability and schema

As a developer working with web services, you want to be able to discover and understand the API quickly and easily: what kinds of resources are available, what fields does each of them have, how are they related, etc. But also, if this field is a date or time, what machine-readable format is the date or time specified in? Good documentation and API exploration can make all the difference.

REST JSON:API GraphQL
Auto-generated documentation Depends; if using the OpenAPI standard. Depends; if using the OpenAPI standard (formerly, Swagger). Yes; various tools available.
Interactivity Poor; navigable links rarely available. Acceptable; observing available fields and links in its responses enable exploration of the API. Excellent; autocomplete feature, instant results or compilation errors, complete and contextual documentation.
Validatable and programmable schema. Depends; if using the OpenAPI standard. Depends; the JSON:API specification defines a generic schema, but a reliable field-level schema is not yet available. Yes; a complete and reliable schema is provided (with very few exceptions).

GraphQL has superior API exploration thanks to GraphiQL (demonstrated in the video above), an in-browser IDE of sorts, which lets developers iteratively construct a query. As the developer types the query out, likely suggestions are offered and can be auto-completed. At any time, the query can be run and GraphiQL will display real results alongside the query. This provides immediate, actionable feedback to the query builder. Did they make a typo? Does the response look like what was desired? Additionally, documentation can be summoned into a flyout, when additional context is needed.

On the other hand, JSON:API is more self-explanatory: APIs can be explored with nothing more than a web browser. From within the browser, you can browse from one resource to another, discover its fields, and more. So, if you just want to debug or try something out, JSON:API is usable with nothing more than cURL or your browser. Or, you can use Postman (demonstrated in the video above) — a standalone environment for developing on top of an any HTTP-based API. Constructing complex queries requires some knowledge, however, and that is where GraphQL's GraphiQL shines compared to JSON:API.

Operational simplicity

We use the term operational simplicity to encompass how easy it is to install, configure, run, scale and secure each of the solutions.

The table should be self-explanatory, though it's important to make a remark about scalability. To scale a REST-based or JSON:API-based web service so that it can handle a large volume of traffic, you can use the same approach websites (and Drupal) already use, including reverse proxies like Varnish or a CDN. To scale GraphQL, you can't rely on HTTP caching as with REST or JSON:API without persisted queries. Persisted queries are not part of the official GraphQL specification but they are a widely-adopted convention amongst GraphQL users. They essentially store a query on the server, assign it an ID and permit the client to get the result of the query using a GET request with only the ID. Persisted queries add more operational complexity, and it also means the architecture is no longer fully decoupled — if a client wants to retrieve different data, server-side changes are required.

REST JSON:API GraphQL
Scalability: additional infrastructure requirements Excellent; same as a regular website (Varnish, CDN, etc). Excellent; same as a regular website (Varnish, CDN, etc). Usually poor; only the simplest queries can use GET requests; to reap the full benefit of GraphQL, servers needs their own tooling.
Tooling ecosystem Acceptable; lots of developer tools available, but for the best experience they need to be customized for the implementation. Excellent; lots of developer tools available; tools don't need to be implementation-specific. Excellent; lots of developer tools available; tools don't need to be implementation-specific.
Typical points of failure Fewer; server, client. Fewer; server, client. Many; server, client, client-side caching, client and build tooling.

Writing data

For most REST APIs and JSON:API, writing data is as easy as fetching it: if you can read information, you also know how to write it. Instead of using the GET HTTP request type you use POST and PATCH requests. JSON:API improves on typical REST APIs by eliminating differences between implementations. There is just one way to do things and that enabled better, generic tooling and less time spent on server-side details.

The nature of GraphQL's write operations (called mutations) means that you must write custom code for each write operation; unlike JSON:API the specification, GraphQL doesn't prescribe a single way of handling write operations to resources, so there are many competing best practices. In essence, the GraphQL specification is optimized for reads, not writes.

On the other hand, the GraphQL specification supports bulk/batch operations automatically for the mutations you've already implemented, whereas the JSON:API specification does not. The ability to perform batch write operations can be important. For example, in our running example, adding a new tag to an article would require two requests; one to create the tag and one to update the article. That said, support for bulk/batch writes in JSON:API is on the specification's roadmap.

REST JSON:API GraphQL
Writing data Acceptable; every implementation is different. No bulk support. Excellent; JSON:API prescribes a complete solution for handling writes. Bulk operations are coming soon. Poor; GraphQL supports bulk/batch operations, but writes can be tricky to design and implement. There are competing conventions.

Drupal-specific considerations

Up to this point we have provided an architectural and CMS-agnostic comparison; now we also want to highlight a few Drupal-specific implementation details. For this, we can look at the ease of installation, automatically generated documentation, integration with Drupal's entity and field-level access control systems and decoupled filtering.

Drupal 8's REST module is practically impossible to set up without the contributed REST UI module, and its configuration can be daunting. Drupal's JSON:API module is far superior to Drupal's REST module at this point. It is trivial to set up: install it and you're done; there's nothing to configure. The GraphQL module is also easy to install but does require some configuration.

Client-generated collection queries allow a consumer to filter an application's data down to just what they're interested in. This is a bit like a Drupal View except that the consumer can add, remove and control all the filters. This is almost always a requirement for public web services, but it can also make development more efficient because creating or changing a listing doesn't require server-side configuration changes.

Drupal's REST module does not support client-generated collection queries. It requires a "REST Views display" to be setup by a site administrator and since these need to be manually configured in Drupal; this means a client can't craft its own queries with the filters it needs.

JSON:API and GraphQL, clients are able to perform their own content queries without the need for server-side configuration. This means that they can be truly decoupled: changes to the front end don't always require a back-end configuration change.

These client-generated queries are a bit simpler to use with the JSON:API module than they are with the GraphQL module because of how each module handles Drupal's extensive access control mechanisms. By default JSON:API ensures that these are respected by altering the incoming query. GraphQL instead requires the consumer to have permission to simply bypass access restrictions.

Most projects using GraphQL that cannot grant this permission use persisted queries instead of client-generated queries. This means a return to a more traditional Views-like pattern because the consumer no longer has complete control of the query's filters. To regain some of the efficiencies of client-generated queries, the creation of these persisted queries can be automated using front-end build tooling.

REST JSON:API GraphQL
Ease of installation and configuration Poor; requires contributed module REST UI, easy to break clients by changing configuration. Excellent; zero configuration! Poor; more complex to use, may require additional permissions, configuration or custom code.
Automatically generated documentation Acceptable; requires contributed module OpenAPI. Acceptable; requires contributed module OpenAPI. Excellent; GraphQL Voyager included.
Security: content-level access control (entity and field access) Excellent; content-level access control respected. Excellent; content-level access control respected, even in queries. Acceptable; some use cases require the consumer to have permission to bypass all entity and/or field access.
Decoupled filtering (client can craft queries without server-side intervention) No Yes Depends; only in some setups and with additional tooling/infrastructure.

What does this mean for Drupal's roadmap?

Drupal grew up as a traditional web content management system but has since evolved for this API-first world and industry analysts are praising us for it.

As Drupal's project lead, I've been talking about adding out-of-the-box support for both JSON:API and GraphQL for a while now. In fact, I've been very bullish about GraphQL since 2015. My optimism was warranted; GraphQL is undergoing a meteoric rise in interest across the web development industry.

Based on this analysis, for Drupal core's needs, we rank JSON:API above GraphQL and GraphQL above REST. As such, I want to change my recommendation for Drupal 8 core. Instead of adding both JSON:API and GraphQL to Drupal 8 core, I believe only JSON:API should be added. That said, Drupal's GraphQL implementation is fantastic, especially when you have the developer capacity to build a bespoke API for your project.

On the four qualities by which we evaluated the REST, JSON:API and GraphQL modules, JSON:API has outperformed its contemporaries. Its web standards-based approach, its ability to handle reads and writes out of the box, its security model and its ease of operation make it the best choice for Drupal core. Additionally, where JSON:API underperformed, I believe that we have a real opportunity to contribute back to the specification. In fact, one of the JSON:API module's maintainers and co-authors of this blog post, Gabe Sullice (Acquia), recently became a JSON:API specification editor himself.

This decision does not mean that you can't or shouldn't use GraphQL with Drupal. While I believe JSON:API covers the majority of use cases, there are valid use cases where GraphQL is a great fit. I'm happy that Drupal is endowed with such a vibrant contributed module ecosystem that provides so many options to Drupal's users.

I'm excited to see where both the JSON:API specification and Drupal's implementation of it goes in the coming months and years. As a first next step, we're preparing the JSON:API to be added to Drupal 8.7.

Special thanks to Wim Leers (Acquia) and Gabe Sullice (Acquia) for co-authoring this blog post and to Preston So (Acquia) and Alex Bronstein (Acquia) for their feedback during the writing process.

db