Why the EU Copyright Directive is a threat to the Open Web

An image of a copyright sign

After much debate, the EU Copyright Directive is now moving to a final vote in the European Parliament. The directive, if you are not familiar, was created to prohibit spreading copyrighted material on internet platforms, protecting the rights of creators (for example, many musicians have supported this overhaul).

The overall idea behind the directive — compensating creators for their online works — makes sense. However, the implementation and execution of the directive could have a very negative impact on the Open Web. I'm surprised more has not been written about this within the web community.

For example, Article 13 requires for-profit online services to implement copyright filters for user-generated content, which includes comments on blogs, reviews on commerce sites, code on programming sites or possibly even memes and cat photos on discussion forums. Any for-profit site would need to apply strict copyright filters on content uploaded by a site's users. If sites fail to correctly filter copyrighted materials, they will be directly liable to rights holders for expensive copyright infringement violations.

While implementing copyright filters may be doable for large organizations, it may not be for smaller organizations. Instead, small organizations might decide to stop hosting comments or reviews, or allowing the sharing of code, photos or videos. The only for-profit organizations potentially excluded from these requirements are companies earning less than €10 million a year, until they have been in business for three years. It's not a great exclusion, because there are a lot of online communities that have been around for more than three years and don't make more than €10 million a year.

The EU tends to lead the way when it comes to internet legislation. For example, GDPR has proven successful for consumer data protection and has sparked state-by-state legislation in the United States. In theory, the EU Copyright Directive could do the same thing for modern internet copyright law. My fear is that in practice, these copyright filters, if too strict, could discourage the free flow of information and sharing on the Open Web.

Optimizing site performance by reducing JavaScript and CSS

I've been thinking about the performance of my site and how it affects the user experience. There are real, ethical concerns to poor web performance. These include accessibility, inclusion, waste and environmental concerns.

A faster site is more accessible, and therefore more inclusive for people visiting from a mobile device, or from areas in the world with slow or expensive internet.

For those reasons, I decided to see if I could improve the performance of my site. I used the excellent https://webpagetest.org to benchmark a simple blog post https://dri.es/relentlessly-eliminating-barriers-to-growth.

A diagram that shows page load times for dri.es before making performance improvements

The image above shows that it took a browser 0.722 seconds to download and render the page (see blue vertical line):

  • The first 210 milliseconds are used to set up the connection, which includes the DNS lookup, TCP handshake and the SSL negotiation.
  • The next 260 milliseconds (from 0.21 seconds to 0.47 seconds) are spent downloading the rendered HTML file, two CSS files and one JavaScript file.
  • After everything is downloaded, the final 330 milliseconds (from 0.475 seconds to 0.8 seconds) are used to layout the page and execute the JavaScript code.

By most standards, 0.722 seconds is pretty fast. In fact, according to HTTP Archive, it takes more than 2.4 seconds to download and render the average web page on a laptop or desktop computer.

Regardless, I noticed that the length of the horizontal green bars and the horizontal yellow bar was relatively long compared to that of the blue bar. In other words, a lot of time is spent downloading JavaScript (yellow horizontal bar) and CSS (two green horizontal bars) instead of the HTML, including the actual content of the blog post (blue bar).

To fix, I did two things:

  1. Use vanilla JavaScript. I replaced my jQuery-based JavaScript with vanilla JavaScript. Without impacting the functionality of my site, the amount of JavaScript went from almost 45 KB to 699 bytes, good for a savings of over 6,000 percent.
  2. Conditionally include CSS. For example, I use Prism.js for syntax highlighting code snippets in blog posts. prism.css was downloaded for every page request, even when there were no code snippets to highlight. Using Drupal's render system, it's easy to conditionally include CSS. By taking advantage of that, I was able to reduce the amount of CSS downloaded by 47 percent — from 4.7 KB to 2.5 KB.

According to the January 1st, 2019 run of HTTP Archive, the median page requires 396 KB of JavaScript and 60 KB of CSS. I'm proud that my site is well under these medians.

File type Dri.es before Dri.es after World-wide median
JavaScript 45 KB 669 bytes 396 KB
CSS 4.7 KB 2.5 KB 60 KB

Because the new JavaScript and CSS files are significantly smaller, it takes the browser less time to download, parse and render them. As a result, the same blog post is now available in 0.465 seconds instead of 0.722 seconds, or 35% faster.

After a new https://webpagetest.org test run, you can clearly see that the bars for the CSS and JavaScript files became visually shorter:

A diagram that shows page load times for dri.es after making performance improvements

To optimize the user experience of my site, I want it to be fast. I hope that others will see that bloated websites can come at a great cost, and will consider using tools like https://webpagetest.org to make their sites more performant.

I'll keep working on making my website even faster. As a next step, I plan to make pages with images faster by using lazy image loading.

Headless CMS: REST vs JSON:API vs GraphQL

An abstract image of three boxes

The web used to be server-centric in that web content management systems managed data and turned it into HTML responses. With the rise of headless architectures a portion of the web is becoming server-centric for data but client-centric for its presentation; increasingly, data is rendered into HTML in the browser.

This shift of responsibility has given rise to JavaScript frameworks, while on the server side, it has resulted in the development of JSON:API and GraphQL to better serve these JavaScript applications with content and data.

In this blog post, we will compare REST, JSON:API and GraphQL. First, we'll look at an architectural, CMS-agnostic comparison, followed by evaluating some Drupal-specific implementation details.

It's worth noting that there are of course lots of intricacies and "it depends" when comparing these three approaches. When we discuss REST, we mean the "typical REST API" as opposed to one that is extremely well-designed or following a specification (not REST as a concept). When we discuss JSON:API, we're referring to implementations of the JSON:API specification. Finally, when we discuss GraphQL, we're referring to GraphQL as it used in practice. Formally, it is only a query language, not a standard for building APIs.

The architectural comparison should be useful for anyone building decoupled applications regardless of the foundation they use because the qualities we will evaluate apply to most web projects.

To frame our comparisons, let's establish that most developers working with web services care about the following qualities:

  1. Request efficiency: retrieving all necessary data in a single network round trip is essential for performance. The size of both requests and responses should make efficient use of the network.
  2. API exploration and schema documentation: the API should be quickly understandable and easily discoverable.
  3. Operational simplicity: the approach should be easy to install, configure, run, scale and secure.
  4. Writing data: not every application needs to store data in the content repository, but when it does, it should not be significantly more complex than reading.

We summarized our conclusions in the table below, but we discuss each of these four categories (or rows in the table) in more depth below. If you aggregate the colors in the table, you see that we rank JSON:API above GraphQL and GraphQL above REST for Drupal core's needs.

Request efficiency Poor; multiple requests are needed to satisfy common needs. Responses are bloated. Excellent; a single request is usually sufficient for most needs. Responses can be tailored to return only what is required. Excellent; a single request is usually sufficient for most needs. Responses only include exactly what was requested.
Documentation, API explorability and schema Poor; no schema, not explorable. Acceptable; generic schema only; links and error messages are self-documenting. Excellent; precise schema; excellent tooling for exploration and documentation.
Operational simplicity Acceptable; works out of the box with CDNs and reverse proxies; few to no client-side libraries required. Excellent; works out of the box with CDNs and reverse proxies, no client-side libraries needed, but many are available and useful. Poor; extra infrastructure is often necessary client side libraries are a practical necessity, specific patterns required to benefit from CDNs and browser caches.
Writing data Acceptable; HTTP semantics give some guidance but how specifics left to each implementation, one write per request. Excellent; how writes are handled is clearly defined by the spec, one write per request, but multiple writes is being added to the specification. Poor; how writes are handled is left to each implementation and there are competing best practices, it's possible to execute multiple writes in a single request.

If you're not familiar with JSON:API or GraphQL, I recommend you watch the following two short videos. They will provide valuable context for the remainder of this blog post:

Request efficiency

Most REST APIs tend toward the simplest implementation possible: a resource can only be retrieved from one URI. If you want to retrieve article 42, you have to retrieve it from https://example.com/article/42. If you want to retrieve article 42 and article 72, you have to perform two requests; one to https://example.com/article/42 and one to https://example.com/article/72. If the article's author information is stored in a different content type, you have to do two additional requests, say to https://example.com/author/3 and https://example.com/author/7. Furthermore, you can't send these requests until you've requested, retrieved and parsed the article requests (you wouldn't know the author IDs otherwise).

Consequently, client-side applications built on top of basic REST APIs tend to need many successive requests to fetch their data. Often, these requests can't be sent until earlier requests have been fulfilled, resulting in a sluggish experience for the website visitor.

GraphQL and JSON:API were developed to address the typical inefficiency of REST APIs. Using JSON:API or GraphQL, you can use a single request to retrieve both article 42 and article 72, along with the author information for each. It simplifies the developer experience, but more importantly, it speeds up the application.

Finally, both JSON:API and GraphQL have a solution to limit response sizes. A common complaint against typical REST APIs is that their responses can be incredibly verbose; they often respond with far more data than the client needs. This is both annoying and inefficient.

GraphQL eliminates this by requiring the developer to explicitly add each desired resource field to every query. This makes it difficult to over-fetch data but easily leads to very large GraphQL queries, making (cacheable) GET requests impossible.

JSON:API solves this with the concept of sparse fieldsets or lists of desired resource fields. These behave in much the same fashion as GraphQL does, however, when they're omitted JSON:API will typically return all fields. An advantage, though, is that when a JSON:API query gets too large, sparse fieldsets can be omitted so that the request remains cacheable.

Multiple data objects in a single response Usually; but every implementation is different (for Drupal: custom "REST Export" view or custom REST plugin needed). Yes Yes
Embed related data (e.g. the author of each article) No Yes Yes
Only needed fields of a data object No Yes; servers may choose sensible defaults, developers must be diligent to prevent over-fetching. Yes; strict, but eliminates over-fetching, at the extreme, it can lead to poor cacheability.

Documentation, API explorability and schema

As a developer working with web services, you want to be able to discover and understand the API quickly and easily: what kinds of resources are available, what fields does each of them have, how are they related, etc. But also, if this field is a date or time, what machine-readable format is the date or time specified in? Good documentation and API exploration can make all the difference.

Auto-generated documentation Depends; if using the OpenAPI standard. Depends; if using the OpenAPI standard (formerly, Swagger). Yes; various tools available.
Interactivity Poor; navigable links rarely available. Acceptable; observing available fields and links in its responses enable exploration of the API. Excellent; autocomplete feature, instant results or compilation errors, complete and contextual documentation.
Validatable and programmable schema. Depends; if using the OpenAPI standard. Depends; the JSON:API specification defines a generic schema, but a reliable field-level schema is not yet available. Yes; a complete and reliable schema is provided (with very few exceptions).

GraphQL has superior API exploration thanks to GraphiQL (demonstrated in the video above), an in-browser IDE of sorts, which lets developers iteratively construct a query. As the developer types the query out, likely suggestions are offered and can be auto-completed. At any time, the query can be run and GraphiQL will display real results alongside the query. This provides immediate, actionable feedback to the query builder. Did they make a typo? Does the response look like what was desired? Additionally, documentation can be summoned into a flyout, when additional context is needed.

On the other hand, JSON:API is more self-explanatory: APIs can be explored with nothing more than a web browser. From within the browser, you can browse from one resource to another, discover its fields, and more. So, if you just want to debug or try something out, JSON:API is usable with nothing more than cURL or your browser. Or, you can use Postman (demonstrated in the video above) — a standalone environment for developing on top of an any HTTP-based API. Constructing complex queries requires some knowledge, however, and that is where GraphQL's GraphiQL shines compared to JSON:API.

Operational simplicity

We use the term operational simplicity to encompass how easy it is to install, configure, run, scale and secure each of the solutions.

The table should be self-explanatory, though it's important to make a remark about scalability. To scale a REST-based or JSON:API-based web service so that it can handle a large volume of traffic, you can use the same approach websites (and Drupal) already use, including reverse proxies like Varnish or a CDN. To scale GraphQL, you can't rely on HTTP caching as with REST or JSON:API without persisted queries. Persisted queries are not part of the official GraphQL specification but they are a widely-adopted convention amongst GraphQL users. They essentially store a query on the server, assign it an ID and permit the client to get the result of the query using a GET request with only the ID. Persisted queries add more operational complexity, and it also means the architecture is no longer fully decoupled — if a client wants to retrieve different data, server-side changes are required.

Scalability: additional infrastructure requirements Excellent; same as a regular website (Varnish, CDN, etc). Excellent; same as a regular website (Varnish, CDN, etc). Usually poor; only the simplest queries can use GET requests; to reap the full benefit of GraphQL, servers needs their own tooling.
Tooling ecosystem Acceptable; lots of developer tools available, but for the best experience they need to be customized for the implementation. Excellent; lots of developer tools available; tools don't need to be implementation-specific. Excellent; lots of developer tools available; tools don't need to be implementation-specific.
Typical points of failure Fewer; server, client. Fewer; server, client. Many; server, client, client-side caching, client and build tooling.

Writing data

For most REST APIs and JSON:API, writing data is as easy as fetching it: if you can read information, you also know how to write it. Instead of using the GET HTTP request type you use POST and PATCH requests. JSON:API improves on typical REST APIs by eliminating differences between implementations. There is just one way to do things and that enabled better, generic tooling and less time spent on server-side details.

The nature of GraphQL's write operations (called mutations) means that you must write custom code for each write operation; unlike JSON:API the specification, GraphQL doesn't prescribe a single way of handling write operations to resources, so there are many competing best practices. In essence, the GraphQL specification is optimized for reads, not writes.

On the other hand, the GraphQL specification supports bulk/batch operations automatically for the mutations you've already implemented, whereas the JSON:API specification does not. The ability to perform batch write operations can be important. For example, in our running example, adding a new tag to an article would require two requests; one to create the tag and one to update the article. That said, support for bulk/batch writes in JSON:API is on the specification's roadmap.

Writing data Acceptable; every implementation is different. No bulk support. Excellent; JSON:API prescribes a complete solution for handling writes. Bulk operations are coming soon. Poor; GraphQL supports bulk/batch operations, but writes can be tricky to design and implement. There are competing conventions.

Drupal-specific considerations

Up to this point we have provided an architectural and CMS-agnostic comparison; now we also want to highlight a few Drupal-specific implementation details. For this, we can look at the ease of installation, automatically generated documentation, integration with Drupal's entity and field-level access control systems and decoupled filtering.

Drupal 8's REST module is practically impossible to set up without the contributed REST UI module, and its configuration can be daunting. Drupal's JSON:API module is far superior to Drupal's REST module at this point. It is trivial to set up: install it and you're done; there's nothing to configure. The GraphQL module is also easy to install but does require some configuration.

Client-generated collection queries allow a consumer to filter an application's data down to just what they're interested in. This is a bit like a Drupal View except that the consumer can add, remove and control all the filters. This is almost always a requirement for public web services, but it can also make development more efficient because creating or changing a listing doesn't require server-side configuration changes.

Drupal's REST module does not support client-generated collection queries. It requires a "REST Views display" to be setup by a site administrator and since these need to be manually configured in Drupal; this means a client can't craft its own queries with the filters it needs.

JSON:API and GraphQL, clients are able to perform their own content queries without the need for server-side configuration. This means that they can be truly decoupled: changes to the front end don't always require a back-end configuration change.

These client-generated queries are a bit simpler to use with the JSON:API module than they are with the GraphQL module because of how each module handles Drupal's extensive access control mechanisms. By default JSON:API ensures that these are respected by altering the incoming query. GraphQL instead requires the consumer to have permission to simply bypass access restrictions.

Most projects using GraphQL that cannot grant this permission use persisted queries instead of client-generated queries. This means a return to a more traditional Views-like pattern because the consumer no longer has complete control of the query's filters. To regain some of the efficiencies of client-generated queries, the creation of these persisted queries can be automated using front-end build tooling.

Ease of installation and configuration Poor; requires contributed module REST UI, easy to break clients by changing configuration. Excellent; zero configuration! Poor; more complex to use, may require additional permissions, configuration or custom code.
Automatically generated documentation Acceptable; requires contributed module OpenAPI. Acceptable; requires contributed module OpenAPI. Excellent; GraphQL Voyager included.
Security: content-level access control (entity and field access) Excellent; content-level access control respected. Excellent; content-level access control respected, even in queries. Acceptable; some use cases require the consumer to have permission to bypass all entity and/or field access.
Decoupled filtering (client can craft queries without server-side intervention) No Yes Depends; only in some setups and with additional tooling/infrastructure.

What does this mean for Drupal's roadmap?

Drupal grew up as a traditional web content management system but has since evolved for this API-first world and industry analysts are praising us for it.

As Drupal's project lead, I've been talking about adding out-of-the-box support for both JSON:API and GraphQL for a while now. In fact, I've been very bullish about GraphQL since 2015. My optimism was warranted; GraphQL is undergoing a meteoric rise in interest across the web development industry.

Based on this analysis, for Drupal core's needs, we rank JSON:API above GraphQL and GraphQL above REST. As such, I want to change my recommendation for Drupal 8 core. Instead of adding both JSON:API and GraphQL to Drupal 8 core, I believe only JSON:API should be added. That said, Drupal's GraphQL implementation is fantastic, especially when you have the developer capacity to build a bespoke API for your project.

On the four qualities by which we evaluated the REST, JSON:API and GraphQL modules, JSON:API has outperformed its contemporaries. Its web standards-based approach, its ability to handle reads and writes out of the box, its security model and its ease of operation make it the best choice for Drupal core. Additionally, where JSON:API underperformed, I believe that we have a real opportunity to contribute back to the specification. In fact, one of the JSON:API module's maintainers and co-authors of this blog post, Gabe Sullice (Acquia), recently became a JSON:API specification editor himself.

This decision does not mean that you can't or shouldn't use GraphQL with Drupal. While I believe JSON:API covers the majority of use cases, there are valid use cases where GraphQL is a great fit. I'm happy that Drupal is endowed with such a vibrant contributed module ecosystem that provides so many options to Drupal's users.

I'm excited to see where both the JSON:API specification and Drupal's implementation of it goes in the coming months and years. As a first next step, we're preparing the JSON:API to be added to Drupal 8.7.

Special thanks to Wim Leers (Acquia) and Gabe Sullice (Acquia) for co-authoring this blog post and to Preston So (Acquia) and Alex Bronstein (Acquia) for their feedback during the writing process.

Drupal helps rescue ultra marathon runner

I'm frequently sent examples of how Drupal has changed the lives of developers, business owners and end users. Recently, I received a very different story of how Drupal had helped in a rescue operation that saved a man's life.

The Snowdonia Ultra Marathon website

In early 2018, Race Director Mike Jones was looking to build a new website for the Ultra-Trail Snowdonia ultra marathon. He reached out to a good friend and developer, Rob Edwards, to lead the development of the website.

A photo of a runner at the Ultra-trail Snowdonia ultramarathon
© Ultra-trail Snowdonia and No Limits Photography

Rob chose Drupal for its flexibility and extensibility. As an organization supported heavily by volunteers, open source also fit the Snowdonia team's belief in community.

The resulting website, https://apexrunning.co/, included a custom-built timing module. This module allowed volunteers to register each runner and their time at every aid stop.

A runner goes missing

Rob attended the first day of Ultra-Trail Snowdonia to ensure the website ran smoothly. He also monitored the runners at the end of the race to certify they were all accounted for.

Monitoring the system into the early hours of the morning, Rob noticed one runner, after successfully completing checkpoints one and two, hadn't passed through the third checkpoint.

A photo of a runner at the Ultra-trail Snowdonia ultramarathon
© Ultra-trail Snowdonia and No Limits Photography

Each runner carried a mobile phone with them for emergencies. Mike attempted to make contact with the runner via phone to ensure he was safe. However, this specific area was known for its poor signal and the connection was too weak to get through.

After some more time eagerly watching the live updates, it was clear the runner hadn't reached checkpoint four and more likely hadn't ever made it past checkpoint three. The Ogwen Mountain Rescue were called to action.

Due to the terrain and temperature, searching for the lost runner on foot would be too slow. Instead, the mountain rescue volunteers used a helicopter to scan the area and locate the runner.

How Drupal came to rescue

The area covered by runners in an ultra marathon like this one is vast. The custom-built timing module helped rescuers narrow down the search area; they knew the runner passed the second checkpoint but never made it to the third.

After following the fluorescent orange markers in the area pinpointed by the Drupal website, the team quickly found the individual. He had fallen and become too injured to carry on. A mild case of hypothermia had set in. The runner was airlifted to the hospital for appropriate care. The good news: the runner survived.

Without Drupal, it might have taken much longer to notify anyone that a runner had gone missing, and there would have been no way to tell when he had dropped off.

NFC and GPS devices are now being explored for these ultra marathon runners to carry with them to provide location data as an extra safety precaution. The Drupal system will be used alongside these devices for more accurate time readings, and Rob is looking into an API to pull this additional data into the Drupal website.

Stories about Drupal having an impact on organizations and individuals, or even helping out in emergencies, drive my sense of purpose. Feel free to keep sending them my way!

Special thanks to Rob Edwards, Poppy Heap (CTI Digital) and Paul Johnson (CTI Digital) for their help with this blog post.

Pulling the plug on Facebook

Facebook social decay
© Andrei Lacatusu

Exactly one year ago, I decided to use social media less and blog more. I uninstalled the Facebook application from my phone, but kept my Facebook account for the time being.

The result is that I went from checking Facebook several times a day to once or twice a month.

Facebook can't be trusted

At the time I uninstalled the Facebook application from my phone, Mark Zuckerberg promised that he would fix Facebook. He didn't.

The remainder of 2018 was filled with Facebook scandals, including continued mishandling of personal data and privacy breaches, more misinformation, and a multitude of shady business practices.

Things got worse, not better.

The icing on the cake is that a few weeks ago we learned that Facebook knowingly duped children and their parents out of money, in some cases hundreds or even thousands of dollars, and often refused to give the money back.

And just last week, it was reported that Facebook had been collecting users' data by getting people to install a mobile application that gave Facebook root access to their network traffic.

It's clear that Facebook can't be trusted. And for that reason, I'm out.

I deleted my Facebook account twenty minutes ago.

An image of Facebook's account deletion confirmation screen

Social media's dark side

Social media, in general, have been enablers of community, transparency and positive change, but also of abuse, hate speech, bullying, misinformation, government manipulation and more. In just the past year, more and more users have woken up to the dark side of social media. Open Web and privacy advocates, on the other hand, have seen this coming for awhile.

Technological change is a wonderful thing, as it can bring unprecedented improvements to billions around the globe. As a technologist, I believe in the power of the web to improve the world for many, but we also need to make sure that technology disruption is positive for all of us.

Last week, we heard that Facebook intends to further blend Instagram and WhatsApp with Facebook. If I were to guess, they want to make it harder to split up Facebook later (and harder for users to know what is happening with their data). Regulators should be all over this right now.

My social detox

I plan to stay off Facebook indefinitely, unless maybe there is a new CEO and better regulatory oversight.

I already stopped using Twitter to share personal updates and use it almost exclusively for Drupal-related updates. It remains a valuable channel to reach many people, but I wouldn't categorize my use as social anymore.

For now, I'm still on Instagram, but it's hard to ignore that Instagram is owned by Facebook. I will probably uninstall that next.

A call to rejoin the Open Web

Instant gratification and network effects have made social media successful, at the sacrifice of blogs and the Open Web.

I've always been driven by a sense of idealism. I'm optimistic that the movement away from social media is good for the Open Web.

Since I scaled back my use of social media a year ago, I blogged more, re-subscribed to many RSS feeds, and grew increasingly interested in the IndieWeb — all small shifts back to the Open Web's roots.

I plan to continue to work on my POSSE plan, and hope to share more thoughts on this topic in the coming weeks.

I'd love to see thousands more people join or rejoin the Open Web, and help innovate on top of it.