AWS Live re:Inforce – Security Benefits of the EC2 Nitro Architecture



thanks for joining us today I'm Brandon West I lead the AWS technical evangelism team for the Americas region and I'm joined by Mark hi there hi to be here could you mind introducing yourself I'll tell that you were – no I work in our security organization for our CEO Steve Schmidt I'm director of the office of the chief information security officer cool sounds like a difficult job I get to do all kinds of fun stuff including this which is one of my hobbies nightdress security cool yeah this is this is super interesting to me because I've learned a little bit about what nitro is yeah and I'm really interested to learn more because it seems like such a cool fundamental piece of how we offer compute and many other services yeah absolutely we we launched it at reinvent year-and-a-half ago and there were a couple talks to you that at that point and then another talk laughs so we've been getting the information out about how we'd sort of decompose traditional virtualization into these hardware software components that allow us to do some really cool things but we haven't ever really made the sort of security spin on that story so this conference is our opportunity to take information that's kind of out there but hasn't been coalesced into like here's the security perspective and I'll give a talk tomorrow to talk about this in depth yeah it seems very interesting but let's set the stage for the audience right now let's talk a little bit about some definitions right so we've got ec2 nitro security start with ec2 right hopefully everyone is familiar with it but it's one of the foundational services of AWS right that's I guess the the server full computes not server list compute iliyan there are servers somewhere doing some stuff ec2 is that very foundational technology where we can build other cool things like containers and lambda and all that all running fundamentally on ec2 and actually over time there'll be more and more of our services around the AC – so ec2 is becoming even a building block for the services we provide to customers – so ec2 runs virtual machines on top of host machines yeah and there's a thing called a hypervisor that manages how all these VMs work how they get assets assigned from that host machine how resources are distributed how carve it up basically yeah so nitro for us is our hardware based custom-designed hypervisor is that accurate let's call it a system because in some cases there is no hypervisor okay or the so called bare metal instance is I don't say so-called they are absolutely bare metal instances but what's happened in nitro is we're able to provide you with a bare metal with the full Intel processor and memory and associated capabilities because we've removed some core things that used to be done by hypervisors and put them into dedicated hardware software components got run in the same physical box okay so before where you had to have this hypervisor running on that host machine that was consuming resources itself to manage the state of the virtual machines now we've offloaded that to a separate piece of hardware so that when people spin up their NCT ec2 instances they get the full capability of what they've paid for it that's right in most cases people will still use a hypervisor because these are hefty machines and tons of cores and tons of memory relatively expensive and bare metal has its use cases but the vast majority of our customers will still use two hypervisor scenarios but the difference is that in this case this hypervisor is super small and simple because all it does is divided up CPU and memory and set up connections over the PCI bus to IO devices and then it just kind of goes to sleep and and you know is therefore the sort of creating the security boundaries but doesn't actually do a lot of the work that used to be done in this special copy of an operating system what was called dom0 so ins in terminology every whether it's you know hyper-v VMware's and they all have a copy of an operating system running as a privileged thing which created what was called a device model all our viewers know what our device drivers are right like getting your device drivers to work with a million different things and so the makers of hypervisor said well we're not going to bother with that we're gonna have a privileged operating system that will mediate all access to those billions of things out there which need device drivers and then that privileged operating system will expose virtual direct different devices to all the guests SVM's so that was a way to make it so the hypervisors weren't super complicated with millions of device drivers that they to manage but it had the disadvantage that you've got a full copy of an operating system running on that same host taking CPU taking memory and also it's just kind of a big complicated piece of software which isn't really that great from a security perspective you really want you know minimalistic kinds of things to be most secure so with nitro what we've done is said okay if we can take all the software that used to run on that dom0 and all those driver device models and move that off the main processor board then our hypervisor can just be super thin and minimalistic and just allocate memory and CPU and respond to privileged instructions but otherwise all the work is done by other computers and that same host with specialized hardware and software that we built which we call the nitro system ok and one thing that I think is super cool about this is it's actually custom-designed silicon right yes we acquired a company called Annapurna that's Rend they built this for us so it's it's a it's an ASIC right it's an application-specific integrated circuit yes so designed completely just to do this one task so that's part of how we're able to offer you know ec2 instances at such low cost because we have these economies of scale that let us do stuff like custom-designed silicon which I don't know of many people running on-premises data centers that are designing their home hardware yeah the acquisition of annapurna labs was a real milestone because we'd actually work with him as a supplier for an earlier ec2 instance where we wanted to offload some things they they were the first vendor out there that would enable you to do to express an nvme device which looks like a local SSD drive but they have a special processor that was actually emulating that and that's where we first offloaded EBS processing to their their processor card but once we work with them and realized their abilities and capabilities and we knew that going forward we want to have a lot of custom hardware software combinations it was it was a great step forward and just say you know what become our in-house you know fabulous design firm and now now a lot of these systems to be clear they still have a general-purpose processor they have arm multi-core ARM processors for kind of general software but they can also build those custom Asics when necessary to do that special acceleration like for example we do a lot of encryption offload in these devices because a lot of what they're doing is giving you full line raid encryption at super high rates and encryption is a great thing to build into hardware you have to make it really fast yeah and some of those announcements that we heard from the reinforced keynote we're talking about how how we can essentially sorry I'm the train just left the station there well let's talk about three major i/o devices in these new nitra in the in the nitrous system yep EBS you have local what's called instant storage and you have EPC so your your network all three of those major i/o subsystems now run special hardware and software which do the bulk of the work including all the encryption so the encryption keys for EBS we just announced recently that you can now have an account white flag this just says encrypt all my DBS volumes full stop right no no I am policies none of the more complex ways of doing that we had before well in the nitrous systems the it's the eb it's the nitro card that interacts with KMS gets a copy of a data key encrypted an unencrypted copy over TLS pulls it in keeps the unencrypted copy for actual line raid encryption but also stores an encrypted copy of the data key with the volume and that way the volume in the instance can have different lifetimes but all of that work is being done in a coprocessor or just another computer reason i hesitate to call them coprocessor just because now the nitro computers are sort of in charge of the whole system and you can almost say that the mainboard with the intel or AMD processor is the coprocessor of customer workloads right but that's not as trusted as these nitro systems which securely boot the system main maintain system integrity scan the firmware to make sure it's safe all those things you would want to do to maintain a highly secure system is done by the nitrous system and then eventually software comes up on the mainboard but that's the less trusted part of these of these devices so that's what make whenever I start to call the nitrous cards coprocessors I feel kind of funny it's like you know they're kind of like in some ways the main processor yet on these systems but they do these specialized offload tasks including line read equip ssin of EBS line raid encryption of instance storage even like an i3 instance which has millions of AI ops and can you know Tara megabits per second of i/o to local storage is encrypting every single one of those packets before it writes it to local storage and that's all done in these accelerators so this is how we're able to offer that encryption by default but still provide the guaranteed throughput right we've said you're going to get and the thing we mentioned today that was in the keynote and then Colm gave a talk about encryption features so in our most advanced instance types there the in families they have the latest most powerful networking capability a hundred you get bit per second networking they also have a sufficiently powerful nitro processor that they can actually encrypt all the traffic at a hundred gigabits per second so now just by default you don't have to do anything whenever an in instance type is talking to another in instance type they recognize that fact and they encrypt everything in between them even if they're in the same rack much less the same data center so to get that kind of performance you have to have dedicated hardware so we can't have like a big bang upgrade where like everything's encrypted in a data center but over time and pretty rapidly will begin aging in the newest generations of hardware and within a year or two most of the traffic you'll see in an ec2 environment will be encrypted all the time regardless of any customer making a decision of it or not just in time for the launch of some quantum computing service to come along example I think we're still a few years away from that cool so that's that's sort of the encryption side of security talking about storing data at rest and in motion and how nitro can help with both of those things right what are some of the other security things that are enabled by using nitro well I mentioned one already but it's just a nice defense in depth so normally the hypervisor protects the virtual machine service from the guest operating systems doing something directly to the hardware you try to update the firmware that's an illegal instruction you'll you know it'll just fail there's various other things you can do but you know it's possible that hypervisors could have bugs in them so it would really be nice if you kind of knew that the firmware was protected even if there wasn't a hypervisor and by the fact that we've launched these bare metal types we've built the protections now into the fundamental hardware platform so that the nitro controller every time it allocates an instance and boots it basically for bare-metal that would be basically rebooting between customers or in the case of non bare metal every time the system boots what the what the nitro car does is it goes in it scans all the firmware on the motherboard and says does this hash of this rumour match a valid known good configuration if it doesn't throws an alarm and says you know sorry I'm not available which has never happened yeah keep your fingers crossed except in test mode but we do have that that ability to detect any tampering with the hardware by using another root root of trust that's not the main board and the main processor now there are things like trusted compute with in typical motherboards but they're really complicated pieces of harbor and software BIOS and TPMS and Intel chips and all these different things they have to cooperate for that to get the equivalent of like a secure boot and we're able to really like isolate that off to a much simpler system that we control everything top to bottom software and hardware and use that to verify the integrity of the main systems and those little nitro controllers they boot off a little a little humble SATA Drive and it's physically in that box and it just all it does is boot the Nitra controller then it goes away so so there's like no way for that to be tampered with from the main system so we built a lot of checks and controls in to make you know for secure content II and then the next thing I would mention as well is we built the system in a way such that there what what are call passive systems so like when we want to talk to a nitro system the control plane always reaches out to the to the nitro car so it's it's it's always it's listening on the network but it's not proactively reaching out on the network so we constantly check it we we health check it we check it for COD watch metrics we you know if we send it in API call saying hey create a new EBS volume that will happen but what's not happening is that code on the nitro processor is not proactively going out to the network and doing things kind of initiating things which is a really good security property because if we ever see initiation of any traffic from that device like there's something wrong here and we can deal with that very effectively and similarly the hypervisor itself are sitting sitting on that main board it doesn't have access to the to the ec2 privilege Network it literally has to go through the nitro processor to get to the network and so there's no way for even bad software on the motherboard to actually reach out except inside the PC encapsulation so like you can be inside your V PC okay then use then you have BBC pull logs you've got traffic mirroring you've got other ways of dealing with potentially bad things happening but you're never going to get out to that core trusted substrate which is really a very cool property of these systems yeah that's that's extremely cool so basically the the internal ec2 network is completely protected from anything that might happen in user space yeah yeah or in kernel space like kernel space yeah of these of these hosts with these special properties yeah very very cool and of course we'd built fire cracker on top of the bare metal instance type right because now we have your fully inside of ec2 you have cloud watch metrics you've got instance metadata service you got V PC EBS all those features but you're a bare metal thing and so that's a great platform for our micro VMs which we've launched for a fire cracker where we can have super fastboot suit super dynamic VM environments but running on bare metal high super high performance not having to do layered layered hypervisor which is inefficient but yet still being a fully controlled ec2 instance so it's been a win so fire cracker it's also one of those things that's super interesting to me I is there some origins story that of Nitro and firecracker that's intertwined at all that you can tell us a tale about well I think you know the rise of containerization and lambda you know quickly made clear to us that we while we need the the strong isolation properties of full virtualization we had to make up a way to do that that was really fast and really inexpensive to start and stop and there's you know very simple device models and so forth so you know the goal was can we create VM technology that's literally as fast as containers and that's what firecracker enables and so it's really the the need to run you know containers and lambda type functions with with full tinted isolation and we can even go beyond that with like lambda we can actually isolate at the function level right so you can have seven different functions and each one of those will run into separate via micro VM so you get you get even better security properties than using ec2 boundaries so it was a really good decision from the start to always use VM isolation for customers to separate them but we also knew that there was a lot of inefficiency because sometimes these functions just last a few milliseconds and you have like viens full-blown VM sitting around waiting for that kind of you know occasional invocation is just not super efficient so mike rizzo firecracker is a nice bridge between the traditional virtualization and in the future of you know function as a service and containerization cool one of my favorite things about firecracker also is that it's open source so you can you can dig into it and play around with spinning up your own micro VMs if you want to yeah not something I'm personally interested in I'd much rather pay someone that's good at all of this security of the cloud stuff that we've been talking about because as you can tell it's a lot it's not an easy thing to do yeah it's a ton of focus on security those have to go look at deeply at that open source project you'll notice some funny things like the the name of the virtual machine monitor so it uses KVM primitives and creates a VM but there's a process running there kind of managing that VM and the process is called jailor okay why is that well because we want very good isolation yeah so it uses lots some really cool features that even within selinux you have isolation features and the actual container or lambda still runs in a container inside a copy of Linux inside you know managed by this this might managed by the micro VM so it's a very security focus but super efficient model and then if you go far enough you eventually hit the simulation that we're running in and VM that's running that single layer upon layer upon layer it's VMs all the way down great well this has been super enlightening for me I've learned a ton about yeah about what it is about how it helps inform our security posture at AWS we have a few minutes left is there anything else about night show that is just a super fun fact you want to share something you think our um you said a lot of these cool details but if you step back what you really see is we've applied like a micro services architecture to hardware design because each of these components has separate pieces of software separate hardware that is built by separate teams that can be deployed in different ways they're like building blocks at the hardware software level that can be composed into actual running hosts or systems and and we at separation of duties like when team doesn't have the right to call a certain api's that other teams can you know their code can so you get basically the micro segmentation microarchitecture microservices push down into the actual design of the hardware with many of the same benefits like I can I can ship features every two weeks and you can be every four weeks and it doesn't matter because as long as we meet these contracts now in this case sometimes the contracts instead of being like API software context their Hardware command contracts like hey I look like an in via me controller on the on the system bus I look just like an SSD drive are you an assist you god no I'm an EBS thing yeah but I met the contract I you send me in via me commands I do data storage for you that's awesome you know so we've separated out those things into these components and the ability now to recompose and rejigger things and create new things out of these building blocks will enable us to be more innovative and move forward much more rapidly should move its insights more rapidly etc so yeah super cool it so many things always end up reminding me of the UNIX philosophy right have a piece of code that does one thing well you need to do another thing create a new program don't extend the existing code and then create common interfaces that everything can chameleon communicate but right through and feels like keep this API contracts right yes exactly don't constantly change your binary or other interface right so yeah well I I hadn't yet thought of that model applied to hardware but I like it yeah right on stuff well thank you so much mark I had to here at time chatting with you hope everyone out there enjoyed it as well yeah thanks for watching take care

Leave a Reply

Your email address will not be published. Required fields are marked *

Tags: , , , , , , ,