Transcript
I'm Dan Cerulli. Hello and welcome to this episode of IOU and Explanation. I'm Dan Cerulli, I lead the cloud native business here at Nutanix. We all know how fast technology changes. Well, this podcast aims to cut through the confusion, process the input with an expert, and deliver a clean, meaningful output you can actually use. Joining us today is someone I've known for a very long time and whom I've got tremendous respect for. Frankly, someone who's responsible for writing some of the code that needs explaining. It's none other than Louie Ryan, Chief Technology Officer of Solo.io, co-founder of the Istio project, co-inventor of GRPC, two foundational technologies for modern cloud native service communication. Louie is perfectly positioned to give us an explanation about service communication. Pleasure to be here, Dan. Welcome to the show, Louie. Really glad to have you. Louie and I worked together for seven years or so plus at Google. I was lucky enough to ride his coattails for quite a while and use that momentum to get where I am today. Before we get into any of that, we're going to start with a segment we call the before times. We're going to pick a technology and talk about what life was like at home or at work before that technology entered our lives. I think you're old enough to help me out with. We had a topic coming in, but based on the attire of one of the crew members today, we've shifted topics a little bit. And I want to talk about what life was like before we had connectivity between computers, before you could use the internet to download something to install. Let's talk about life in the floppy disk era. What was work like before you had, you know, high bandwidth connectivity between computers? Well, speaking of somebody who did IT help desk in the era before there was connectivity, it meant carrying around a lot of things. Like you had to go set up somebody's computer, go install Windows on that. Okay, I need 15 floppy disks to install that. Yeah. I have to make sure I have them in the right order and they're all numbered. So I'm just carrying stuff around the whole time. That little box is a floppy disk. Yeah, yeah, it might be. So for those who don't know, a floppy disk could hold, I think 1.2 megabytes on a three and a half inch floppy. Oh yeah, the advanced ones. And so if you were installing something like Windows, yeah, or even Word would be multiple floppy disks. Oh, 20 or 30 disks. Yes, so it would be a box full of these little plastic cartridges and it would say, insert number one, you would insert, and then we'd go, ah, ah. We'd get paid to feed the machine. Your job is to sit there, push the button, and then it spits it back out. Turn the handle so it comes out again, pull it out, stick the next one in, and then maybe it would yell at you for putting the wrong one in. Oh, so turn the handle. You're talking five and a quarter. Oh yeah, there was even a little button on some of them. Right, yeah, those floppy disks were truly floppy. And if you wanted to get data to someone across town or across the country, it would be- A courier. Yeah, a courier. Yeah, we literally had people back and forth. So yeah, the internet and good connectivity in the office and then between offices really changed all that, changed installs, really, really changed it. But life was different in the before times. All right, that's enough of the before times. Now we do have connectivity between computers. Incredible bandwidth. It's led to an explosion of communication between services. We didn't have that back then. As soon as we connected computers, we had the ability to communicate between them. Of course, IP happened, HTTP happened. A lot of things have happened. But I want to talk to you about a few of the foundational technologies to give someone who might've heard these terms but not really know what they mean well, give them some grounding. So I'd like to start with what really revolutionized APIs. We're gonna skip over Corba. We're gonna skip over Calm and Olay. All right, we're gonna get into DECOM. We're gonna go to what really revolutionized in the 21st century, which was RESTful APIs, JSON-based HTTP APIs, and the OpenAPI specification, or as it used to be called, the Swagger spec. So tell me, what is a RESTful API to you? What does it mean when someone says, oh yeah, our company exposes RESTful APIs? When we say something like a RESTful API or an OpenAPI spec, right, it's a specification to somebody else about how they can call a service that you built. All these other technologies that you kind of alluded to that came before it required a lot of specialized tooling that, right, you had to build a certain way. They were kind of quirky. Some were very proprietary. So really at that kind of phase of kind of RESTful APIs, there was a democratization, I would say, right? There was an alignment around JSON, which was already widely used in the browser space for all the tooling on the backend side. And so the tooling became more ubiquitous, more consistent, also easier to read for humans. It's really hard to read a Corba payload on the wire if you've ever tried to look at one. And then because HTTP was in wide use on the web, right, there were already standards around how I would access documents. I would get a document. I could post an update to a document. I could put a new document. And those same basic kind of idempotency principles, right, about how you manipulate the documents could be applied to smaller things, resources. We'd start to use that term in general. And so those joining together, those two technologies kind of created this kind of massive momentum around RESTful services, because it also made them available to the browsers, right? There was also the demand side of this kind of working in tandem. You know, Web 2.0 started to drive demand to build APIs this way. So that's really how it all, you know, that ball got rolling right then. So it allowed, because you didn't have to have that complex tooling, it then let you easily, and then this is where the specification OpenAPI spec comes in, describing for the world, hey, this is what my API looks like. This is how you call me, which opened up vast areas of commerce, right? Suddenly you could transact over the web, right? Yeah, previously you had kind of these closed ecosystems, right? Vendors were saying, if you want to transact with each other, we'll build you a marketplace and you have to come here and abide by our standards. And if you do those things, then you can all interact. And this just blew that wide open and said, no, right? You as an individual company or a person could just build an API and then you could make it available to other people and generate organic demand against it, which is what happened. So OpenAPI just really give people a way to say, look, this is what my surface looks like. So if you need to figure out how to call me, you don't have to call me up on the phone and ask me anymore, right? Like here it is. If you do this, it'll work. Or if it doesn't work, then you can call me up and ask me. That just facilitated a very large ecosystem effect, right? Because people didn't have to constantly interact to start actually consuming stuff. And so that was a really big step forward, right? In that progression of like alignment around technologies, having the right consumption and supporting infrastructure and consumption, and then, okay, democratization of consumption. And you were at Google at the time. Yeah. We worked together and Google exposed web APIs and then eventually published the descriptions in an OpenAPI spec. What did that do for the public at large, the world at large's ability to integrate with Google services and consume them? It just meant we had to do less work to educate people, right? And at Google scale, that's a huge thing, right? Like if there's a hundred million people going to consume your APIs, you can't train them all individually, right? That's just not a practical thing. Yes, there have to be ways for them to learn about them and to build tooling around them, right? Because it's not just, oh, do I read the specification as a human, right? When you start seeing people consume lots of APIs, right? As a consumer, right? They're going to start to automate those processes, right? And they need those kinds of specifications to help them with that automation to make sure they're doing it the right way, that they have controls in place over that, right? And obviously when we start to talk about API management as a business, right? It's all about control. Who can call this API? How often can they call it? Do we charge them when they call it? Right, both from the, like as a provider of the API, but even maybe more so now, as a consumer of APIs, I want those same kind of controls for myself, not just as the provider of those APIs. So that all kind of fed into that kind of nascent API management business that came about. Because there's a standard for how these APIs communicate. Now someone can build intermediaries. Some of them might be debugging things. We have tools like Wireshark that can tell you what was that API call, right? Things that are intermediaries. And even things like how do you publish the documentation, right? You can take a tool that knows how to take that standard description and say, okay, let me publish that documentation. Yeah, or, you know, you've got your security hat on as somebody trying to abuse the API in particular ways. Right, either as an internal consumer of the API or externally, like are they misusing the way the API is intended to be used? Right, you mentioned JSON, which we didn't define. JavaScript Object Notation is an ASCII format for describing the payload of an API, right? It's the information that's going across. And ASCII is really nice because it's, as you said, JSON is very human-readable. You can open up that file or you can look at that API call and say, oh, I can see it's creating an account for Dan Cerulli and he's depositing $100. You can see all that. That's very nice for some things, right? But I want to move on to another technology that you worked heavily on, and that's gRPC. So why don't you start by talking about what led you to create gRPC? Sure, so we think about like RESTful web services, JSON, and browsers interacting with APIs, or maybe other parties who are physically distant from each other talking to APIs. And that model, like browsers and then they had smartphones, they kind of work the same way. When the consumers of your APIs are running on the same physical infrastructure in the same data center as the API that they're calling, then the costs and the overheads that REST adds to an API call start to become much more meaningful, right? If I'm 80 milliseconds away, processing a REST call is not that much of a relative overhead to the API call. When I'm 50 microseconds away, now it's a really big deal. And by processing that REST call, you mean I've got some digital information in computer memory, essentially, right? And I need to send that over and you make place an API call. By processing, you mean, okay, well, that means I have to translate that into ASCII, into JSON, and it goes from maybe a little bit of information to now it's- Yeah, and into some kind of internal programmatic structure that the code is going to be written against to implement the API, right? So eventually, like everything that's legible on the wire becomes something a computer is dealing with. And the cost of that transformation is material. And so when we started looking at gRPC, we were looking at it because at Google, we were building APIs for cloud services that were going to be called by VNs and other hosted compute on Google's cloud. And they're just so close that that overhead is too much realistically. And so that was one of the big motivators, right? Internally, Google had a similar system for its own services calling things and they used this binary representation called Protobuf, which is extremely efficient on the wire. And now you're basically putting your customers in the same position as your own internal services. You're going to need the same set of capabilities. So when you say extremely efficient on the wire, how is that? What do you mean? So it's very easy for some computer code to take a piece of data, put it into a form that can be sent to the network, and then read by the other side. So it's, in kind of layman's terms, it stays in binary. It stays in computer language. You don't translate it to something humans can read and then translate it back from something humans can read. Yeah, it's not like, you know, I take my floppy disk, I print out the document that's on it, and then I send you the document and you scan it back in and put it into a, right? Oh, that's a great analogy. That is a great analogy. Yeah. Instead, you want to ship the floppy disk. It goes over the wire. Yeah, yeah, yeah. Great, great. So why didn't you just do this? Why didn't you just use Stubby? Why didn't you use something proprietary to expose those services? So for those who don't know, Stubby, what Google's internal system was called, like most things that have been built inside large companies over the years, it was fairly proprietary. It had quirks that were specific to the company and that weren't really going to be relevant in an open source product that we were going to give the world. And we also wanted to make sure we aligned with standards. And at the time, there was an evolution going on in the HTTP ecosystem. Something called HTTP2 came out, which is just a way for the web to work more efficiently. And that was a good vehicle for running this type of service on top. So we decided to align gRPC with HTTP2 specification, which was going on in the IETF, the Internet Engineering Task Force. Those are the folks who figure out how the web works, basically. And so we decided to align with that technology because we knew that building on top of widely used standards and infrastructure would help adoption and make building and delivering easier. Just like with REST, right? REST and it sat on top of JSON and HTTP, which were widely adopted technologies. Right. And, you know, speaking commercially from Google's perspective, if you were going to bring a database to market, like Bigtable, and you had a custom protocol to communicate with it, it might give people pause about integrating it. But if that were an open source protocol that everyone was using, then, you know, then you stood a fighting chance, right? Yeah. And, you know, generally, if you want a technology to be successful, not a product, a technology, you should probably open source it, right? Because that's how you're going to foster adoption. That's how you're going to get people comfortable with it, right? And that's how you're also going to get them to help you make it better. It's one of my favorite successes in open source because there was no product, right? And Google wasn't trying to promote gRPC as a product. And yet, every company I've been since Google uses gRPC as a very functional technology. So one way to think of this, the way I think of it in my head is when you're communicating to the outside world, you want lots of developers to be able to consume this. You want it to be very easy, convenient for the developer. And performance isn't the most critical thing. REST and JSON are really good. When you've got more tightly bound things, maybe internally in your own company, performance is really important. A strong contract is really important. You can control both ends of the communication. gRPC is really efficient. Developers really love it. Yeah, and you get like, there's a lot of tooling to help you build in all the languages that you might want to build kind of that style of software in. So you get a kind of big standardized leg up, right? In building that type of software. So if you can control your ecosystem, that's really, really helpful. If we can jump back to before times for a minute, my first job out of college was communicating, writing software that communicated with electric power meters over modem, over modem. And it was literally every different manufacturer had a different protocol. Byte by byte, like I was writing the code, it was byte by byte controlling what went on the wire because there was no standard. It was before HTTP, before REST, obviously before gRPC. And it was an incredibly laborious task. Right, and that was true in pretty much every industry. That was just how everything was done. There was no commonality at all, right? Some people, yes, like they were byte oriented protocols that you had to have a manual to understand. And like, if you dealt with 50 of them, there were 50 manuals. Exactly right, I had a shelf full of them. So now I'd like to get to the last bit of service communication I want to talk to, something else that you've been heavily involved in, both internally at Google with the one platform project and then externally with Istio and what you do now at solo.io, which is around this term that you hear called service mesh. So tell us what is a service mesh? What is that used for? It kind of grew out of the API world, right? So mostly today in the industry, when we think about APIs, we think about like a contract that a company has with the outside of it, right? So your customers, your consumers will call your APIs as Nutanix or someone like that. In reality, inside like a big company like yours, there are lots and lots of little APIs all talking to each other all the time. You know, we often call this the microservices pattern. I don't really care about whether it's micro or macro or how big these things are, but there are lots and lots of internal APIs. And what has become evident over the years is that you need almost the same amount of control over those that you do for your external ones. A large part of that has to do with security and the realization that inside threats are maybe the biggest threat that we face from a security perspective. And so we want security controls on the internal consumption of APIs within our organization. We need to know how much they're being used. We may need to quota them because we may have different business departments and they are going to charge each other money and make those chargebacks. So there are all these kind of internal APIs and they need a system to kind of layer control over the top. And Service Mesh is really designed to do that, right? Our mission is to secure, connect, and observe the API contracts between internal applications. And it's kind of like the same things we talked about with RESTful APIs of the outside world. Who can call this? How much can they call it? And what is it allowed to do? Like, it's kind of the same things, but it's not going through an API proxy. Describe a couple of implementations that you've, maybe the term sidecar and ambient. How do you affect this? So I know that- Yeah, so I mean, to layer control into a network, you're obviously going to send the traffic through something. For want of a better term, that something is always a form of proxy. The real question is how you deploy it, right? And the first iteration in Istio, we used something we called a sidecar proxy, which is next to every unit of running application, we would put a proxy. And so everything going into and out of that application would go through that sidecar. And then there would be a system we call a control plane, and it would send configuration down to all of these proxies at scale, because there are lots of applications running in lots of places, to make sure that all the controls, which were written as policies, are put into effect in that network. That model works pretty well. It provided, it gave us the ability to deliver all of those features, the security and connectivity stuff, but it can be kind of expensive. In terms of processor time. Yes. Because you're deploying a lot. And maintenance in particular. Maintenance became a big problem with sidecars, because if I have to upgrade the technology we use to do it, that means I have to run a rolling update of every application in my fleet. And that's painful, to put it mildly. That is the biggest motivator for why we decided to go to a new model, where we said, okay, we're not going to put a heavyweight proxy next to everything that provides all the features. Instead, we're going to find a way to kind of push a lighter weight version of that down in the infrastructure stack, and put the heavier weight features in the network. Right. And that's what we call now the ambient mesh model in Istio. And that has, at this point, proven itself to be not just more efficient in terms of resources, because people have to pay for computers and memory, but also much easier to install and maintain. And that's probably the biggest value proposition of that rig. And the other benefit of service mesh in general, true even more than ever with ambient, is that the application that's making the call, that service is making the call and the service it's receiving, they don't have to change their code at all. For them, it's all transparent. Just, I make a call and something's going to make sure I'm allowed to, and that I'm not exceeding any rates, that it's going to the right place. The service mesh makes sure that happens. Yeah, so if we kind of consider what were the logical competitors to service meshes, probably the most widely deployed one is, okay, I've paid for an API gateway that's going to give me all the control features that I want. Instead of having applications talk to each other directly in the network, they're all going to go and talk to the API gateway to get back into the network. Go out through the front back door and in through the front door. So we call this pattern hairpinning. And if your friendly neighborhood API management vendor was charging you money for every call, well, usually there's two orders of magnitude more internal calls than there are external calls. So your bill tends to go up a lot. Right. That's not a great model. Also, if your API management thing failed, then your entire network goes out. It's not a great failure mode. That was also the kind of competitive landscape into which this had to fit. And service mesh, I think at this point, has proven that it is a better way of approaching this problem. All right, so we're going to wrap up with hot take, warm take, cold take. And we're going to keep it related to service communication. Something we've talked about a little bit. Everything that we just talked about for the most part is used kind of synchronously. You place a call, something receives that call, whether any of those things we talked about. There's a lot of asynchronous calls that happen these days too. So hot take, cold take, warm take. Asynchronous communication will replace synchronous communication as the dominant form of inter-service communication. Go. Cold take. Cold take on that. Tell me why. Just complexity, right? How you have to think about building your software. If you don't start thinking asynchronously, it's really hard to start rewriting your system to become async when it was sync. The overwhelming majority of software today is intrinsically sync. So that's hard. Async definitely has its place. I think at certain layers in the stack, you'll see it used quite widely. Certainly when you go to the database and you come back out of the database, that's kind of intrinsically async. Like a database is an async thing, right? It's like I write data so somebody else can read it later. We're already used to that pattern at scale. We do that a lot. So people are kind of getting that benefit anyway without needing a dedicated new API mechanism or scheme to do it. Eventing is real, right? Like queues and events, right? A thing happens in a system and I need to be told about it. But the receiving of the event is usually sync-ish. So that's the primary use case, I think. And I think AWS with Lambda kind of proved that out, right? Like for that use case of eventing, where I transition from an async to a sync point. Yeah, it's pretty well established in the industry at this point. All right, that cold take is a hot take. I like it. Thank you very much, Louis. This has been a fantastic conversation. We will be releasing new episodes once a month on YouTube or wherever you get your podcasts. We've conquered service communication. Subscribe and we'll see you here the next time you need an explanation. Thank you.