gvisor

Sentry

  • 对上(容器应用)提供所有系统操作,但不是透传给底层宿主机

    • 只提供受限的API

    • 通过隔离提供安全性

  • File system operations 发送给Gofer

  • Sentry process is started in a restricted seccomp container without access
    to file system resources.

KVM (experimental)-定制kvm

The KVM platform allows the Sentry to act as both guest OS and VMM, switching
back and forth between the two worlds seamlessly

Gofer

  • 使用9P 协议与Sentry通讯,9Pworks both as a distributed file system and as a
    network transparent and language agnostic ‘API’.

  • 对文件系统访问提供额外保护

sandbox

  • Communicate with a Gofer process via a connected socket. The sandbox may
    receive new file descriptors from the Gofer process, corresponding to opened
    files. These files can then be read from and written to by the sandbox.

  • Make a minimal set of host system calls. The calls do not include the
    creation of new sockets (unless host networking mode is enabled) or opening
    files. The calls include duplication and closing of file descriptors,
    synchronization, timers and signal management.

  • Read and write packets to a virtual ethernet device. This is not required if
    host networking is enabled (or networking is disabled).

  • gVisor sandbox will only be able to manipulate virtualized system resources
    (e.g. the system time, kernel settings or filesystem attributes) and not
    underlying host system resources.

Principles: Defense-in-Depth

  • No system call is passed through directly to the host

  • Only common, universal functionality is implemented.

  • The host surface exposed to the Sentry is minimized

    1. The Sentry is not permitted to open new files, create new sockets or
      do many other interesting things on the host

工程实施层限制措施

  1. Unsafe code is carefully controlled

  2. No CGo is allowed. The Sentry must be a pure Go binary

  3. External imports are not generally allowed within the core packages. Only
    limited external imports are used within the setup code.

设计思想有点类似mechanism used by User-Mode Linux (UML).

性能

  • 在SystemCall时候消耗比一般runc大:golang及Sentry进行了拦截处理

  • 网络及文件IO (也由于拦截-转发)要低一些

gVisor supports only x86_64 and requires Linux 4.14.77+ (older Linux)

Kata Containers

kata-runtime creates a QEMU*/KVM virtual machine for each container or pod

kata-agent

  • gRPC server in the guest using a VIRTIO serial or VSOCK interface which QEMU
    exposes as a socket file on the host

  • A kata-agent sandbox is a container sandbox defined by a set of namespaces

  • kata-runtime creates a single container per pod.

Kata Containers proxy (kata-proxy)

代理IO操作

Guest assets

  • Guest kernel

  • Guest image

runtime

  • kata-agent communicates with the other Kata components over gRPC.

  • kata-runtime can run several containers per VM to support container engines
    that require multiple containers running inside a pod.

virtio-fs介绍

  • 在guest之间共享文件系统的方案

  • virtio-fs把文件mmap进qemu的进程地址空间并让不同guest使用DAX访问该内存空间

  • DAX数据访问和元数据的共享内存访问都是通过共享内存的方式避免不必要的VM/hypervisor之间通信(在元数据没有改变的情况下)

    1. Kata Containers utilizes the Linux kernel DAX (Direct Access filesystem)
      feature to efficiently map some host-side files into the guest VM space.

支持的virtual machine monitors (VMMs) and hypervisors.

  • QEMU 1.0 upstream QEMU, with support for hotplug and filesystem sharing

  • NEMU 1.4 Deprecated, removed as of 1.10 release. Slimmed down fork of QEMU,
    with experimental support of virtio-fs

  • Firecracker 1.5 upstream Firecracker, rust-VMM based, no VFIO, no FS
    sharing, no memory/CPU hotplug

  • QEMU-virtio-fs 1.7 upstream QEMU with support for virtio-fs. Will be removed
    once virtio-fs lands in upstream QEMU

  • Cloud Hypervisor 1.10 rust-VMM based, includes VFIO and FS sharing through
    virtio-fs, no hotplug

虚拟机监视器(VMM)

  • Qemu

  • Hyper-V

  • https://github.com/kata-containers/documentation/blob/master/design/virtualization.md

历史

  • launching in December 2017

  • 1.5开始支持shim-v2的

Floating Topic

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐