如何构造你自己的容器---2

财税管理

IT技术

墨家总院

84人赞赏了该文章 732次浏览编辑于2018年12月21日 20:41:52

(本文独家发布在金蝶云社区上)

前言

由于篇幅的原因，这篇文章将介绍另外两个命名空间技术，即PID命名空间和NS (FS) 命名空间。

PID 命名空间

使用用PID命名空间后，你将会重启PID编号系统，说得简单点就是，你得到的新进程的编号可以从“1”开始。这可以被视为在进程标识符(identifier)树中的“chroot”。我们的进程id尽然可以变得如此简短，我们的日常工作应该会变得轻松一些。

书归正传。如何利用PID命名空间呢？其实很简单，和上一篇文章一样: 如何构造你自己的容器—1 。只要在调用clone函数的时候加上“CLONE_NEWPID”标记就可以了。当然将此标记和其他五种标记合在一起用也是可以的。

一旦PID命名空间被激活，子进程里的waitpid的返回结果将一直是“1”。但是这样的话就有两个进程id为1的进程了，那么进程管理是如何处理这个情况呢？

事实上，这确实很像“chroot”，即视角的改变，因为chroot命令就是用来改变当前用户的根目录的。

父进程：所有进程都是可见的，全局的PIDs (init=1, …, child=xxx, ….)
容器也就是子进程: 只有孩子和子孙进程是可见的，本地局部PIDs (child=1, …)

且看代码实例：

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>

#define STACK_SIZE (1024 * 1024)

// sync primitive
int checkpoint[2];

static char child_stack[STACK_SIZE];
char* const child_args[] = {
  "/bin/bash",
  NULL
};

int child_main(void* arg)
{
  char c;

  // init sync primitive
  close(checkpoint[1]);
  // wait...
  read(checkpoint[0], &c, 1);

  printf(" - [%5d] World !\n", getpid());
  sethostname("In Namespace", 12);
  execv(child_args[0], child_args);
  printf("Ooops\n");
  return 1;
}

int main()
{
  // init sync primitive
  pipe(checkpoint);

  printf(" - [%5d] Hello ?\n", getpid());

  int child_pid = clone(child_main, child_stack+STACK_SIZE,
      CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | SIGCHLD, NULL);

  // further init here (nothing yet)

  // signal "done"
  close(checkpoint[1]);

  waitpid(child_pid, NULL, 0);
  return 0;
}

将这段代码编译，运行：

alexander@mohismn-desktop:~/src/container$ gcc -Wall main-3-pid.c -o ns && sudo ./ns
 - [ 4842] Hello ?
 - [    1] World !
root@In Namespace:~/blog# echo "=> My PID: $"
=> My PID: 1
root@In Namespace:~/blog# exit

就如期望的一样，尽管父进程的进程号是“4842”，但是子进程的pid号码是“1”。如果你尝试在子进程里杀掉父进程，你将会失败。

alexander@mohismn-desktop:~/src/container$./ns
 - [ 4842] Hello ?
 - [    1] World !
root@In Namespace:~/blog# kill -KILL 4842
bash: kill: (7823) - No such process
root@In Namespace:~/blog# exit

以上代码实例表明，我们的隔离方案正如我们期望得那样工作。而且就如我之前写得那样，这种行为和“chroot”很像。意味着从父进程运行诸如“top”或“ps exf”等命令，将会显示子进程和它未映射的PID。像“kill”，“cgroups”以及其他机制一样，这是进程控制最基本的特性。

但是有一个情况，在子进程里运行“top”或“ps exf”等命令和在父进程里得到得信息是一样得。这是怎么回事？并不是这样的。这是因为这些工具从真实的“/proc”文件系统获取信息，而它目前尚未隔离。而这个正是下一章节的目标，即文件系统得隔离。

NS (FS) 命名空间

在上一节里，我们利用了PID命名空间，得到了一个新的并且进程号为“1”的子进程。但是就算伴随着PID命名空间的使能，诸如top命令工具仍然没有被隔离，那是因为这些工具依赖从文件系统“/proc”里的信息，而“proc”在众多命名空间之间依然是完全共享的。在这一节，我们将介绍一种新的命名空间机制来解决这个问题：“NS”。这是在linux历史上第一个命名空间。

要激活NS namespace，只需要把“CLONE_NEWNS”标记添加到“clone”调用。不需要其他额外的步骤。它也能和其他namespace组合使用。一旦激活，任何子进程的挂载与卸载操作都将只作用于本身，反之亦然。

让我们开始实验。只要在之前的例子中激活NS：

int child_pid = clone(child_main, child_stack+STACK_SIZE, 
      CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL);

现在，如果我们运行它，我们将解决前一章节的问题。

alexander@mohism-Desktop:~/src/container$ gcc -Wall ns.c -o ns && sudo ./ns
 - [14472] Hello ?
 - [    1] World !
root@In Namespace:~/blog# mount -t proc proc /proc
root@In Namespace:~/blog# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  1.0  0.0  23620  4680 pts/4    S    00:07   0:00 /bin/bash
root        79  0.0  0.0  18492  1328 pts/4    R+   00:07   0:00 ps aux
root@In Namespace:~/blog# exit

“/proc”现在按照我们的预期运行了，而且没有破坏parent。

我们现在将其过程都自动化，写到代码里去：

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/mount.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>

#define STACK_SIZE (1024 * 1024)

// sync primitive
int checkpoint[2];

static char child_stack[STACK_SIZE];
char* const child_args[] = {
  "/bin/bash",
  NULL
};

int child_main(void* arg)
{
  char c;

  // init sync primitive
  close(checkpoint[1]);

  // setup hostname
  printf(" - [%5d] World !\n", getpid());
  sethostname("In Namespace", 12);

  // remount "/proc" to get accurate "top" && "ps" output
  mount("proc", "/proc", "proc", 0, NULL);

  // wait...
  read(checkpoint[0], &c, 1);

  execv(child_args[0], child_args);
  printf("Ooops\n");
  return 1;
}

int main()
{
  // init sync primitive
  pipe(checkpoint);

  printf(" - [%5d] Hello ?\n", getpid());

  int child_pid = clone(child_main, child_stack+STACK_SIZE,
      CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL);

  // further init here (nothing yet)

  // signal "done"
  close(checkpoint[1]);

  waitpid(child_pid, NULL, 0);
  return 0;
}

如果你运行这个坨代码，你应该能够精确地得到和上一个测试代码一样的行为，而且不需要手动重新挂载“/proc”文件系统，也不会弄乱你真实父进程的“/proc”。

赞 84

赞 84 评论收藏

如何构造你自己的容器---2

(本文独家发布在金蝶云社区上)

PID 命名空间

NS (FS) 命名空间

恭喜您！