Overton: My name is William Overton. I’m a Senior Serverless Solutions Architect at Fastly. About 10 years ago, Fastly was founded with a clear vision in mind, and that was to create a platform that enables developers to move data and applications to the edge. We created a type of CDN that didn’t really exist at the time. A big differentiator was that we gave developers control like no other CDN before us. With traditional CDNs, config changes could take over an hour to propagate globally. That was just simply not acceptable for modern applications where things can change often, and they can change very quickly. When we launched, we handed developers instant config rollout, and fine-grained precision into how their services worked.
How Does All This Work?
How did it all work? Our delivery platform is based on Varnish, which is configured via VCL. Varnish is a caching HTTP proxy, not unique to Fastly. It is something similar to Nginx. It is able to serve HTTP requests with custom responses based on config provided by our customers. This is what VCL looks like, or the Varnish Configuration Language. It works like a normal programming language, and it allows developers to have conditional logic, and still request metadata such as headers, and also control their configuration as code rather than huge XML files or magic buttons of the UI. It’s awesome. Fastly customers have been able to do things that they simply couldn’t do before, such as A/B testing, advanced caching, edge redirects, and authentication at the edge, and a lot more as well. It can be limited. In VCL, there’s no for-loops. There’s also no recursion, and this can limit the sorts of applications that developers can build at the edge. There’s also no access to the body. Say you wanted to manipulate some JSON going through an API, or change some HTML, you can’t do it with VCL. There’s only one language, VCL. If a developer wants to use something else, they simply can’t.
What Is the Next Gen of Edge Computing?
What does the next generation of edge computing look like? First, we need to look at the requirements. What are the things that we want to include in our new platform? The first thing is, what languages should we support? We want to support every language possible, so that we can have the best developer ecosystem, you don’t have to learn new tools just to use the platform, and to have a wide market. Where should the code run? Everywhere. We don’t want to have our platform be so heavy that it has to run in fewer larger data centers.
What about latency? It needs to be really quick. If we’re going to be adding latency into the flow of HTTP requests through our platform, that would be a real problem.
What are the options? Containers. Containers are great. They’re incredibly versatile, and already very popular and well understood. However, they’re just too heavy for what we want to do. The infrastructure needed for every service in Fastly to live inside a container is ridiculous. We’ll be talking about hundreds of thousands of containers across that platform. Not only is that incredibly CPU intensive, managing all of them will be a very difficult task. They’re also not as secure as many people believe. There have been plenty of Docker escape vulnerabilities over the years. In order to mitigate that and run containers securely, we’d also end up having to run thousands of virtual machines. That simply won’t work.
The Bytecode Alliance – Wasmtime
You may have noticed I was saying Lucet was rather than is. What was the reason for that? When Fastly made Lucet, we knew it’d be useful beyond that platform. That’s one of the reasons we formed the Bytecode Alliance, an open source community focused on improving WebAssembly, alongside Mozilla, Intel, Google, Microsoft, Amazon, and more. After the formation of the Bytecode Alliance, it turned out Mozilla had also been working on a WebAssembly compiler and a runtime called Wasmtime. Instead of separately working on two competing systems, the Bytecode Alliance decided to unite the two to get the best of both worlds. Recently, Lucet was end-of-life’d.
Everything I just told you about Lucet is still true, because all that logic was incorporated into Wasmtime.
Using Wasmtime. Say we want to run a small rustup like this in WebAssembly, rather than on the native system. All we need to do is use a toolchain native to that language. In this case, it’s Rust, so we’ll use the Rust commands. We’ll run rustup to add the Wasm32-wasi target to our system, if we don’t already have it yet, only needs to be done once on this system. Then we’ll use rustc to compile our language into a WebAssembly file. Then, Wasmtime to run our code. Now we get the output in our console. It ran. It’s fine. That code was executed as a WebAssembly rather than a native application.
What It Gives Us (Startup Time)
What does that give us? The first thing is a really fast startup time. If we look at containers, they’re just simply not an option here, with how long they take to start up. The only way we could make this work is if we kept containers running for a long time to handle multiple requests. We don’t want to do that from a security point of view. Ignoring that, we would also still have long startup delays with doing any scaling operation. Imagine we had 10 containers for a service, as soon as we needed 100, because of increased load, that will cause a lot of cold starts trying to scale. V8 is definitely respectable here, but WebAssembly is simply quicker. WebAssembly being so quick to start gives us a really powerful ability. That’s that. We can run it anywhere. WebAssembly starts incredibly quickly, and is incredibly light to run. That means we can run customer code on our existing delivery platforms at the edge of the network, close to users. We don’t need to have large data centers for compute like we would with containers or virtual machines. When the developer deploys code to Fastly, we can make it immediately global in every one of our points of presence.
Having code deployed globally means that it can run really close to users, and that gives us the ability to offload work from the client onto the server without any latency. For example, this application here on the left takes inputs from the user in the form of markdown, and then on the right renders HTML. “Every time I type! A request goes to the server!” You can see that, as I type, it looks like it’s happening locally, it’s that quick. If I open up the DevTools here, we can start making requests. See that all of these complete really quickly, probably the average time being 12 or 13 seconds. With this power, we can do things differently. If we have a phone app that might be on an underpowered device, but we still want to do processes that want to generate images, before, we’d have to go all the way to the origin to do that, or do it slowly on the device. Right now we can do it in the edge, which is still very close to the user, and certainly communication time is almost negligible. We can still run on a powerful environment, and also a known secure environment. A lot of the time we do logic in the origin, because we can’t trust the client to do it. Now we have this middle ground in the edge inside the network, rather than in the origin or the client.
Another thing we can do, because that code is able to start so quickly, is actually write a CDN config in code. This is a framework over it called Flight Path. What it allows you to do is define routes and middleware that you want to run on requests. This is some example code for Flight Path. At the core of it, there’s this router object. What that allows you to do is bind functions that will run under certain conditions. First, here, we have a router.use. That creates a middleware, which is a function that will run every time no matter the request. What we do here is simply add a header, which is the x-powered-by, and say FlightPath. Then after that, we have the router.get, then a path, a HTTP URL path, and then a function that will handle that request. The get in this function name is the HTTP method. This will trigger on a HTTP GET request to the route/. Our code will return the string Hello World as the response using res.send. Giving developers this power is really important. It used to be that CDN configuration and application code were handled by completely separate teams with different skills. With the rise of DevOps, we are seeing developers also be in control of their infrastructure. Allowing developers to configure the CDN in a language they’re already experts in dramatically speeds up development time and allows for a lot more possibilities.
Languages Integrated by Fastly
How Do We Run Untrusted Code Safely?
I’ve talked for ages about how we can run WebAssembly code. I also want to cover some of the security benefits that WebAssembly gives us. All we are building is a platform where developers can write any code and run it in our platform on shared infrastructure. To do that, we need to be incredibly sure that we’re running this code in a safe environment. I want to give a quick rundown of how a typical program does anything useful. Applications don’t want to have to reinvent the wheel every time they do something on the system. If they want to read a file or draw to the screen, play music, they talk to the kernel. The kernel is what offers the APIs for the outside world as far as applications are concerned. This is useful because it means your note taking app doesn’t have to include file system drivers for saving files. You can simply go, “Kernel, please put these bytes in a file called notes.txt,” and it’ll work. The problem with this is that the kernel can’t know if something is malicious. If a program says, please give me the contents of this file and send it to this IP address. The kernel [inaudible 00:20:41]. This might be exactly what you want to happen, but could also be something we definitely do not want to happen, like in this example here. An application that you download, once it’s running on your system, can’t do anything. If you expect it to be that simple note taking app, but instead it decides to open your Bitcoin wallet and send it to an IP address, the Kernel is fine with that.
What about WebAssembly? WebAssembly is always sandboxed by default. This means there is no input or output of any kind, including stdin and stdout. That means you can run untrusted code safely, because all that code could do is spin CPU cycles doing maps. However, the important part is that WebAssembly is sandboxed by default. If we couldn’t talk to the outside world in any way, WebAssembly wouldn’t be very useful. The way you talk to the outside world is by using imported functions from the host environment. That means that a host environment could run some WebAssembly and offer it functions that it can use. Print to the screen, offer it a printing function. If your code isn’t supposed to be reading or writing files, simply don’t import those functions into the WebAssembly guest. Your application can read a notes file, but it can’t open network connections.
This allows us to run completely untrusted code by giving it only the functions that we want it to have, and even defining what those functions are. We can give it known safe functions. If a library you were using suddenly goes rogue and starts including malicious code, it won’t suddenly gain new capabilities. If you haven’t given it permission to read and write files, even though the code has changed, it suddenly doesn’t gain these permissions. In this example here, we have this malicious program that wants to read Bitcoins and send them over the wire. It can’t because it simply doesn’t have those permissions, even though the host environment does have those capabilities.
The Future of Wasm
What about the future for WebAssembly? What can we do with some of the new exciting things in the specifications? While we’re talking about importing functions from the host, we should talk about WASI, the WebAssembly System Interface. The exciting thing about WASI is that it supplies a standard interface that languages can adopt, meaning your Rust app and your Go app can both interface with the host, in the same way when they need to talk to the system. This could be really useful because a language can implement WASI without the developer having to worry about it. Your code can read a file in a way that just works. It doesn’t have to do it in the special way that the host environment is expecting.
One of the cool parts of WASI that I’m really excited about is that WASI doesn’t define what the implementation of those functions are. They merely define the specification, meaning a call like read_file, doesn’t have to actually go to disk. We could instead implement read_file to do anything. For example, read_file could instead go to a database. Instead of using some local disk storage, it instead goes to an external datastore. Meaning the WebAssembly module of the developer’s code gets to think it’s writing to the file system, when it’s actually not. This is really important because it means that there’s no vendor lock-in anymore. You don’t have to use a proprietary API to interface with the host system. If you’re running in Fastly, you wouldn’t need a Fastly SDK. That means that code not made for Fastly will suddenly work in Fastly. It also means code that’s running in Fastly would work in other places. That’s the dream that Fastly is after. We want our competitors to adopt this technology, because that just means there’s more developers using WebAssembly every day for building applications.
Going a step further with this, WebAssembly components are portable, meaning they don’t have to be included at compile time. They could instead be linked dynamically when you run the application. Any application could use WebAssembly components. For example, a database could use components instead of stored functions. If you’re working with a database like MySQL, why don’t you let developers write the code that runs inside the database, instead of using SQL? You don’t have to have SQL experts in your Rust team to be able to use the system. We can take this onto websites, could use a Wasm code, instead of offering webhooks. It’ll make it a lot easier to integrate. Instead of using webhooks and having to host services externally, and keep them up, monitor them, make sure they’re always available, just run your code inside the platform that was going to call the webhooks. The amazing people at Shopify are using this technology for their new store system. The sorts of like a chain here writes.
We could take this approach and do something like that at Fastly. Fastly currently offers a lot of services via the host environment, such as WAFs and image processing. These services could become WebAssembly modules themselves, which could be included automatically in a user service. This would be really valuable because we no longer have to maintain these running services. It would become much harder to have a WAF outage, because the WAF is no longer a running process. It’s instead a library being used in customer code, the same with image processing, the same with other things that Fastly wants to add.
Why make this something purely internal? Fastly could, in the future, if we wanted to, make a public marketplace where developers could create their own components that can be dynamically added to running services. If we were to create something like a request to modify a component interface, we could offer components that other people write that we can check, match this component interface as a plug and play style add-ons for Fastly. All enabled plugins will just be included in services at runtime. Some examples of that include something like a redirect plugin, or authentication plugin. This isn’t something we are committing today, just something that we could do with this technology.
The Next Generation of Fastly
Fastly really is using WebAssembly wherever possible. We’re building the next generation of our products using Compute@Edge on the WebAssembly platform. Such as Fastly Next-Gen WAF, powered by Signal Sciences, and even our core CDN service, by compiling VCL, our current scripting language, into WebAssembly, and running it at Compute@Edge, so we only have one unified platform that we have to maintain. Importantly, we’re doing this while being committed to adopting open standards wherever possible. We’re not making black box systems and licensing them to people. We’re working with open source projects like Wasmtime and WASI, and encouraging others to do the same.
Questions and Answers
Eberhardt: You’ve got two types of customer, you’ve got the ones who get interested in the tech and the others that just want to run stuff quickly, easily, cheaply on the edge network.
I first heard the term serverless when that pattern emerged from the big cloud vendors. I always thought it was a bit of a misnomer. It was serverless, but really, under the hood, there was quite a lot of infrastructure being spun up. Yes, there was a server underneath. I know you do effectively have a server there but it does feel much more like a better embodiment of the word serverless, the way that you’re doing it, that genuinely lightweight approach.
Overton: It’s always really interesting because before I ever joined Fastly, I worked at a company where we did a big migration onto AWS Lambda. When you start doing that, you find out the truth that it’s actually Docker containers. Then they scale them for you, and they handle that for you. In the current state of where serverless is today, it doesn’t really matter what the idea is. I want to deploy my serverless code, ok, where? No, I don’t care about where, I just want to deploy it. It’s like, pick a region. It’s like, ok, I’ll pick a region. It’s like, what concurrency do you want? It’s like, hang on. You get into these points where you end up configuring just as much infrastructure as you do with containers.
Eberhardt: There’ll be state bleeding between sessions, because really, we are running servers under the hood, and you’re exposed to that.
Overton: Where Fastly just did it completely differently, is, we don’t have containers, we don’t have virtual machines. Every server that Fastly is using to serve web traffic, has the customer code available to it. When a request comes in, that machine will run it in the same way that those machines will handle our customer CDN config. It’s all multitenant at the highest level, so when you deploy a WebAssembly function to Fastly, it’s either deployed or not deployed. Those are your two options. There’s no deployed with this concurrency. If you’re doing one request per second, or 100,000 requests per second, there’s actually the same amount of infrastructure managing that service.
Eberhardt: I’ve got some stuff running on AWS that I really should move on to Fastly just to see what the experience is like.
The first question is regarding user supplied WebAssembly versus webhooks. Can WebAssembly be used to mine Bitcoin or perform nefarious activities?
Overton: It all boils down to the access you give to WebAssembly. You know how we talked about importing host functions in? If I’m operating a payment provider and I want someone to be able to add a function that triggers when someone buys something, does that function need to be having GPU access? Probably not. Does it need to be able to make arbitrary web requests? Probably no. You would import the functions into it that it needs to do. Technically, WebAssembly is great at doing lots of maps, but you also have a lot of ways to limit that. For example, when we run code in Fastly, we don’t run customer code with unlimited memory, and unlimited amounts of time. We say, for this request, you can have 100 megabytes of memory, and you can run for 1 second maximum. You can protect yourself by doing that. Just because you’re running WebAssembly doesn’t mean you have to give someone unlimited power. You can give them a very controlled limited environment.
Eberhardt: If I write a serverless function, deploy it to Fastly using WebAssembly, do I have the opportunity to make network requests within my serverless function?
Overton: In Fastly, you do, but only because we’ve given you that ability. We took the time to create a function for making web requests and import it into the guest environment. If we didn’t do that, you won’t be able to do it.
Eberhardt: You’d be quite limited if you couldn’t do that.
Overton: With a CDN, it’d be quite limiting, yes, not being able to do web traffic.
Eberhardt: How do we scale up or down individual components in WebAssembly? I think maybe they’re asking more about serverless than WebAssembly. The general question is, how does serverless scale?
Overton: I’ve already talked about a lot, but within Firstly, there is no concurrency or capacity for a service, either all the machines have the service ready, or none of them do, deployed or not deployed. We basically ship the code in a shared storage that all the service can view it, and they have it in memory ready to go. They can just serve the request because one machine will handle requests for every customer. It just depends on how much traffic that machine gets, or how much code it runs. We take the scaling factors away from you. Obviously, Fastly as a CDN, we have to do a lot of scaling work. We operate on the whole of our customer base, so we have quite smooth curves. Where if you’re a singular company, say you’re a news website, you can have a big spike in traffic, and have to do lots of scaling. Where at the scale that Fastly is at, one of our customers that’s a news website, having a big spike in traffic is noticeable. Scaling at our level becomes a lot easier, and so we just have physical machines and data centers ready.
Eberhardt: What’s the current state of work on memory management, particularly with regard to cross module sharing? I’m guessing this is a couple of WebAssembly modules sharing linear memory.
Eberhardt: Also, the question was about shared linear memory, which is such a low level concept, most of us users would not be sharing linear memory with the goal of sharing data between a couple of different languages. Generally speaking, that’s the job of the people writing either the compilers or writing the higher level APIs, allow us to share data, things like interface types, and so on, that allow us to share data between WebAssembly modules.
Overton: Even when you are working at the low level, the tooling is designed to hide that away from you. If you use things like wit-bindgen to generate bindings in Rust for using WASI, that will generate a bindings file that handles all the memory management stuff. There are very few use cases where a user really wants to be playing in that space. It’s definitely an important thing that needs to exist. Me as a developer who’s working on my website, or my database or something, I don’t actually want to have to think about that.
See more presentations with transcripts