1. 编译

先clone代码: git clone https://github.com/firecracker-microvm/firecracker

1.1. 编译firecracker

firecracker是rust写的, 但编译不需要本地依赖rust环境, 而是在docker内完成的. 使用了docker imagepublic.ecr.aws/firecracker/fcuvm:v35, 大小3.25G

因为使用了x86_64-unknown-linux-musl做为target, 所以最后的可执行文件是静态链接的

  • 默认debug版本:
    tools/devtool build
    生成:
    build/cargo_target/x86_64-unknown-linux-musl/debug/firecracker 38M, 静态链接, 带符号

  • 指定release版本
    tools/devtool build --release
    生成:
    build/cargo_target/x86_64-unknown-linux-musl/release/firecracker 4.1M, 静态链接, 带符号

1.2. 编译kernel

tools/devtool build_kernel -c resources/guest_configs/microvm-kernel-x86_64-5.10.config -n 8
生成:
build/kernel/linux-5.10/vmlinux-5.10-x86_64.bin 42M, 带符号的linux elf, 模块全部编入kernel.

1.3. 编译rootfs

tools/devtool build_rootfs -s 300MB
生成:
build/rootfs/bionic.rootfs.ext4 300M

2. 运行

2.1. 配置文件方式运行

build/cargo_target/x86_64-unknown-linux-musl/release/firecracker --api-sock /tmp/firecracker.socket --config-file myvmconfig.json
会打印kernel启动过程, 并自动以root登陆

myvmconfig.json内容如下:

{
  "boot-source": {
    "kernel_image_path": "build/kernel/linux-5.10/vmlinux-5.10-x86_64.bin",
    "boot_args": "console=ttyS0 reboot=k panic=1 pci=off",
    "initrd_path": null
  },
  "drives": [
    {
      "drive_id": "rootfs",
      "path_on_host": "build/rootfs/bionic.rootfs.ext4",
      "is_root_device": true,
      "partuuid": null,
      "is_read_only": false,
      "cache_type": "Unsafe",
      "io_engine": "Sync",
      "rate_limiter": null
    }
  ],
  "machine-config": {
    "vcpu_count": 2,
    "mem_size_mib": 1024,
    "smt": false,
    "track_dirty_pages": false
  },
  "balloon": null,
  "network-interfaces": [],
  "vsock": null,
  "logger": null,
  "metrics": null,
  "mmds-config": null
}
  • 跑的是ubuntu, 带systemd的
  • 启动迅速
  • reboot会触发kernel退出, 但并不重启
  • 没有网络接口
  • 根文件系统挂在/dev/vda上
  • VM配置了1024M内存, 但运行时firecracker进程占用95M, 虚拟内存1032M.

2.2. rest API方式运行

firecracker启动的时候要指定一个API socket, 每个VM一个. 使用这个socket, 可以用rest API方式来运行和管理VM.

3. devctr镜像

devctr是开发中使用的镜像, 所有的操作都通过这个镜像完成.

  • 基于ubuntu18
  • 安装了常用的开发工具
    binutils-dev
    clang
    cmake
    gcc
    等等
    
  • 安装了rust
    curl https://sh.rustup.rs -sSf | sh -s -- -y
    rustup target add x86_64-unknown-linux-musl
    rustup component add rustfmt
    rustup component add clippy-preview
    rustup install "stable"
    
  • 使用了开源的init程序, 静态编译版本
    # Add the tini init binary.
    ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION_TAG}/tini-static-amd64 /sbin/tini
    RUN chmod +x /sbin/tini
    WORKDIR  "$FIRECRACKER_SRC_DIR"
    ENTRYPOINT ["/sbin/tini", "--"]
    

4. 顶层cargo

cargo.toml

[workspace]
members = ["src/firecracker", "src/jailer", "src/seccompiler", "src/rebase-snap"]
default-members = ["src/firecracker"]

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"
lto = true

[patch.crates-io]
kvm-bindings = { git = "https://github.com/firecracker-microvm/kvm-bindings", tag = "v0.5.0-1", features = ["fam-wrappers"] }

cargo的build系统会自动维护cargo.lock来描述版本信息. 下面的命令可以更新依赖的版本信息:

$ cargo update            # updates all dependencies
$ cargo update -p regex   # updates just “regex”

5. firecracker/tools/devtool脚本

# By default, all devtool commands run the container transparently, removing
# it after the command completes. Any persisting files will be stored under
# build/.
# If, for any reason, you want to access the container directly, please use
# `devtool shell`. This will perform the initial setup (bind-mounting the
# sources dir, setting privileges) and will then drop into a BASH shell inside
# the container.
#
# Building:
#   Run `./devtool build`.
#   By default, the debug binaries are built and placed under build/debug/.
#   To build the release version, run `./devtool build --release` instead.
#   You can then find the binaries under build/release/.
#
# Testing:
#   Run `./devtool test`.
#   This will run the entire integration test battery. The testing system is
#   based on pytest (http://pytest.org).
#
# Opening a shell prompt inside the development container:
#   Run `./devtool shell`.
#
# Additional information:
#   Run `./devtool help`.

run_devctr函数写的很好. docker -v的z参数表示可以共享, 参考https://docs.docker.com/storage/bind-mounts/#configure-the-selinux-label

# Helper function to run the dev container.
# Usage: run_devctr <docker args> -- <container args>
# Example: run_devctr --privileged -- bash -c "echo 'hello world'"
run_devctr() {
    docker_args=()
    ctr_args=()
    docker_args_done=false
    while [[ $# -gt 0 ]];  do
        [[ "$1"  =  "--" ]] && {
            docker_args_done=true
            shift
            continue
        }
        [[ $docker_args_done =  true ]] && ctr_args+=("$1") || docker_args+=("$1")
        shift
    done

    # If we're running in a terminal, pass the terminal to Docker and run
    # the container interactively
    [[ -t 0 ]] && docker_args+=("-i")
    [[ -t 1 ]] && docker_args+=("-t")

    # Try to pass these environments from host into container for network proxies
    proxies=(http_proxy HTTP_PROXY https_proxy HTTPS_PROXY no_proxy NO_PROXY)
    for i in  "${proxies[@]}";  do
        if [[ !  -z ${!i} ]];  then
            docker_args+=("--env") && docker_args+=("$i=${!i}")
        fi
    done

    # Finally, run the dev container
    # Use 'z' on the --volume parameter for docker to automatically relabel the
    # content and allow sharing between containers.
    docker run "${docker_args[@]}" \
        --rm \
        --volume /dev:/dev \
        --volume "$FC_ROOT_DIR:$CTR_FC_ROOT_DIR:z" \
        --env OPT_LOCAL_IMAGES_PATH="$(dirname "$CTR_MICROVM_IMAGES_DIR")" \
        --env PYTHONDONTWRITEBYTECODE=1 \
        "$DEVCTR_IMAGE"  "${ctr_args[@]}"
}

5.1. cmd_build

默认debug版本, 默认libc是musl target是x86_64-unknown-linux-musl

5.1.1. 先build seccompiler

seccompiler是个单独的binary, 把json转成BPF程序保存到文件中.

    # Build seccompiler-bin.
    run_devctr \
        --user "$(id -u):$(id -g)" \
        --workdir "$CTR_FC_ROOT_DIR" \
        ${extra_args} \
        -- \
        cargo build -p seccompiler --bin seccompiler-bin \
            --target-dir "$CTR_CARGO_SECCOMPILER_TARGET_DIR" \
            "${cargo_args[@]}"
    ret=$?

注:

  • -p seccompiler: 只build seccompiler

5.1.2. 再build rebase-snap

Tool that copies all the non-sparse sections from a diff file onto a base file

    # Build rebase-snap.
    run_devctr \
        --user "$(id -u):$(id -g)" \
        --workdir "$CTR_FC_ROOT_DIR" \
        ${extra_args} \
        -- \
        cargo build -p rebase-snap \
            --target-dir "$CTR_CARGO_REBASE_SNAP_TARGET_DIR" \
            "${cargo_args[@]}"
    ret=$?

5.1.3. build firecracker

    # Build Firecracker.
    run_devctr \
        --user "$(id -u):$(id -g)" \
        --workdir "$CTR_FC_ROOT_DIR" \
        ${extra_args} \
        -- \
        cargo build \
            --target-dir "$CTR_CARGO_TARGET_DIR" \
            "${cargo_args[@]}"
    ret=$?

5.1.4. build jailer

    # Build jailer only in case of musl for compatibility reasons.
    if [ "$libc" == "musl" ];then
        run_devctr \
            --user "$(id -u):$(id -g)" \
            --workdir "$CTR_FC_ROOT_DIR" \
            ${extra_args} \
            -- \
            cargo build -p jailer \
                --target-dir "$CTR_CARGO_TARGET_DIR" \
                "${cargo_args[@]}"
    fi

5.2. build_kernel

比如:./tools/devtool build_kernel -c resources/guest_configs/microvm-kernel-arm64-4.14.config

    # arch不同, vmlinux的format也不同
    arch=$(uname -m)
    if [ "$arch" = "x86_64" ]; then
        target="vmlinux"
        cfg_pattern="x86"
        format="elf"
    elif [ "$arch" = "aarch64" ]; then
        target="Image"
        cfg_pattern="arm64"
        format="pe"

    recipe_url="https://raw.githubusercontent.com/rust-vmm/vmm-reference/$recipe_commit/resources/kernel/make_kernel.sh"
    # 从自己的github的另一个库rust-vmm/vmm-reference下载
    make_kernel.sh
    run_devctr \
        --user "$(id -u):$(id -g)" \
        --workdir "$kernel_dir_ctr" \
        -- /bin/bash -c "curl -LO "$recipe_url" && source make_kernel.sh && extract_kernel_srcs "$KERNEL_VERSION""
    cp "$KERNEL_CFG"  "$kernel_dir_host/linux-$KERNEL_VERSION/.config"
    KERNEL_BINARY_NAME="vmlinux-$KERNEL_VERSION-$arch.bin"

    #真正的make kernel
    run_devctr \
        --user "$(id -u):$(id -g)" \
        --workdir "$kernel_dir_ctr" \
        -- /bin/bash -c "source make_kernel.sh && make_kernel "$kernel_dir_ctr/linux-$KERNEL_VERSION" $format  $target "$nprocs" "$KERNEL_BINARY_NAME""

5.3. build_rootfs

default rootfs size是300M, 用ubuntu18.04, 目标是$flavour.rootfs.ext4 先编译几个c文件, 用作测试?

        run_devctr \
        --workdir "$CTR_FC_ROOT_DIR" \
        -- /bin/bash -c "gcc -o  $rootfs_dir_ctr/init $resources_dir_ctr/init.c && \
        gcc -o  $rootfs_dir_ctr/fillmem $resources_dir_ctr/fillmem.c && \
        gcc -o  $rootfs_dir_ctr/readmem $resources_dir_ctr/readmem.c"

5.3.1. firecracker/resources/tests/init.c

在调用/sbin/openrc-init之前, 向/dev/mem的特定地址(比如aarch64的0x40000000 1G)写入数字123 用于通知VMM kernel已经启动完毕

// Base address values are defined in arch/src/lib.rs as arch::MMIO_MEM_START.
// Values are computed in arch/src/<arch>/mod.rs from the architecture layouts.
// Position on the bus is defined by MMIO_LEN increments, where MMIO_LEN is
// defined as 0x1000 in vmm/src/device_manager/mmio.rs.
#ifdef __x86_64__
#define MAGIC_MMIO_SIGNAL_GUEST_BOOT_COMPLETE 0xd0000000
#endif
#ifdef __aarch64__
#define MAGIC_MMIO_SIGNAL_GUEST_BOOT_COMPLETE 0x40000000
#endif

#define MAGIC_VALUE_SIGNAL_GUEST_BOOT_COMPLETE 123

int main () {
   int fd = open("/dev/mem", (O_RDWR | O_SYNC | O_CLOEXEC));
   int mapped_size = getpagesize();

   char *map_base = mmap(NULL,
        mapped_size,
        PROT_WRITE,
        MAP_SHARED,
        fd,
        MAGIC_MMIO_SIGNAL_GUEST_BOOT_COMPLETE);

   *map_base = MAGIC_VALUE_SIGNAL_GUEST_BOOT_COMPLETE;
   msync(map_base, mapped_size, MS_ASYNC);

   const char *init = "/sbin/openrc-init";

   char *const argv[] = { "/sbin/init", NULL };
   char *const envp[] = { };

   execve(init, argv, envp);
}

5.3.2. firecracker/resources/tests/fillmem.c

Usage: ./fillmem mb_count
先mmap再memset

5.3.3. firecracker/resources/tests/readmem.c

Usage: ./readmem mb_count value

5.3.4. 做镜像

用ubuntu18.04 container的

truncate -s "$SIZE" "$img_file"
mkfs.ext4 -F "$img_file"
docker run -v "$FC_ROOT_DIR:/firecracker" ubuntu:18.04 bash -s <<`EOF`
...
source $resource_dir/setup_rootfs.sh
mount rootfs.ext4 mnt
# 调用了setup_rootfs.sh的函数
# 安装udev systemd-sysv openssh-server iproute2
# 避免登陆
# 其他定制
prepare_fc_rootfs "$rootfs_dir" "$resource_dir"
dirs="bin etc home lib lib64 opt root sbin usr"
for d in $dirs; do tar c "/$d" | tar x -C $mnt_dir; done # 把bin etc等系统目录从container里面拷到rootfs.ext4里面
EOF
  • 注: 使用'EOF'格式的heredoc, 其内部的变量不会展开

results matching ""

    No results matching ""