15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2016 Intel Corporation. 350665deeSJianfeng Tan 450665deeSJianfeng Tan.. _virtio_user_for_container_networking: 550665deeSJianfeng Tan 650665deeSJianfeng TanVirtio_user for Container Networking 750665deeSJianfeng Tan==================================== 850665deeSJianfeng Tan 950665deeSJianfeng TanContainer becomes more and more popular for strengths, like low overhead, fast 1050665deeSJianfeng Tanboot-up time, and easy to deploy, etc. How to use DPDK to accelerate container 1150665deeSJianfeng Tannetworking becomes a common question for users. There are two use models of 1250665deeSJianfeng Tanrunning DPDK inside containers, as shown in 1350665deeSJianfeng Tan:numref:`figure_use_models_for_running_dpdk_in_containers`. 1450665deeSJianfeng Tan 1550665deeSJianfeng Tan.. _figure_use_models_for_running_dpdk_in_containers: 1650665deeSJianfeng Tan 1750665deeSJianfeng Tan.. figure:: img/use_models_for_running_dpdk_in_containers.* 1850665deeSJianfeng Tan 1950665deeSJianfeng Tan Use models of running DPDK inside container 2050665deeSJianfeng Tan 2150665deeSJianfeng TanThis page will only cover aggregation model. 2250665deeSJianfeng Tan 2350665deeSJianfeng TanOverview 2450665deeSJianfeng Tan-------- 2550665deeSJianfeng Tan 2650665deeSJianfeng TanThe virtual device, virtio-user, with unmodified vhost-user backend, is designed 2750665deeSJianfeng Tanfor high performance user space container networking or inter-process 2850665deeSJianfeng Tancommunication (IPC). 2950665deeSJianfeng Tan 3050665deeSJianfeng TanThe overview of accelerating container networking by virtio-user is shown 3150665deeSJianfeng Tanin :numref:`figure_virtio_user_for_container_networking`. 3250665deeSJianfeng Tan 3350665deeSJianfeng Tan.. _figure_virtio_user_for_container_networking: 3450665deeSJianfeng Tan 3550665deeSJianfeng Tan.. figure:: img/virtio_user_for_container_networking.* 3650665deeSJianfeng Tan 3750665deeSJianfeng Tan Overview of accelerating container networking by virtio-user 3850665deeSJianfeng Tan 3950665deeSJianfeng TanDifferent virtio PCI devices we usually use as a para-virtualization I/O in the 4050665deeSJianfeng Tancontext of QEMU/VM, the basic idea here is to present a kind of virtual devices, 4150665deeSJianfeng Tanwhich can be attached and initialized by DPDK. The device emulation layer by 4250665deeSJianfeng TanQEMU in VM's context is saved by just registering a new kind of virtual device 4350665deeSJianfeng Tanin DPDK's ether layer. And to minimize the change, we reuse already-existing 4450665deeSJianfeng Tanvirtio PMD code (driver/net/virtio/). 4550665deeSJianfeng Tan 4650665deeSJianfeng TanVirtio, in essence, is a shm-based solution to transmit/receive packets. How is 4750665deeSJianfeng Tanmemory shared? In VM's case, qemu always shares the whole physical layout of VM 4850665deeSJianfeng Tanto vhost backend. But it's not feasible for a container, as a process, to share 4950665deeSJianfeng Tanall virtual memory regions to backend. So only those virtual memory regions 5050665deeSJianfeng Tan(aka, hugepages initialized in DPDK) are sent to backend. It restricts that only 5150665deeSJianfeng Tanaddresses in these areas can be used to transmit or receive packets. 5250665deeSJianfeng Tan 5350665deeSJianfeng TanSample Usage 5450665deeSJianfeng Tan------------ 5550665deeSJianfeng Tan 5650665deeSJianfeng TanHere we use Docker as container engine. It also applies to LXC, Rocket with 5750665deeSJianfeng Tansome minor changes. 5850665deeSJianfeng Tan 5950665deeSJianfeng Tan#. Write a Dockerfile like below. 6050665deeSJianfeng Tan 6150665deeSJianfeng Tan .. code-block:: console 6250665deeSJianfeng Tan 6350665deeSJianfeng Tan cat <<EOT >> Dockerfile 6450665deeSJianfeng Tan FROM ubuntu:latest 6550665deeSJianfeng Tan WORKDIR /usr/src/dpdk 6650665deeSJianfeng Tan COPY . /usr/src/dpdk 67*79238624SCiara Power ENV PATH "$PATH:/usr/src/dpdk/<build_dir>/app/" 6850665deeSJianfeng Tan EOT 6950665deeSJianfeng Tan 7050665deeSJianfeng Tan#. Build a Docker image. 7150665deeSJianfeng Tan 7250665deeSJianfeng Tan .. code-block:: console 7350665deeSJianfeng Tan 7450665deeSJianfeng Tan docker build -t dpdk-app-testpmd . 7550665deeSJianfeng Tan 7650665deeSJianfeng Tan#. Start a testpmd on the host with a vhost-user port. 7750665deeSJianfeng Tan 7850665deeSJianfeng Tan .. code-block:: console 7950665deeSJianfeng Tan 8035b09d76SKeith Wiles $(testpmd) -l 0-1 -n 4 --socket-mem 1024,1024 \ 8152d6beb9SYong Wang --vdev 'eth_vhost0,iface=/tmp/sock0' \ 8252d6beb9SYong Wang --file-prefix=host --no-pci -- -i 8350665deeSJianfeng Tan 8450665deeSJianfeng Tan#. Start a container instance with a virtio-user port. 8550665deeSJianfeng Tan 8650665deeSJianfeng Tan .. code-block:: console 8750665deeSJianfeng Tan 8850665deeSJianfeng Tan docker run -i -t -v /tmp/sock0:/var/run/usvhost \ 8950665deeSJianfeng Tan -v /dev/hugepages:/dev/hugepages \ 9035b09d76SKeith Wiles dpdk-app-testpmd testpmd -l 6-7 -n 4 -m 1024 --no-pci \ 9150665deeSJianfeng Tan --vdev=virtio_user0,path=/var/run/usvhost \ 9252d6beb9SYong Wang --file-prefix=container \ 9371ac6399STiwei Bie -- -i 9450665deeSJianfeng Tan 9550665deeSJianfeng TanNote: If we run all above setup on the host, it's a shm-based IPC. 9650665deeSJianfeng Tan 9750665deeSJianfeng TanLimitations 9850665deeSJianfeng Tan----------- 9950665deeSJianfeng Tan 10050665deeSJianfeng TanWe have below limitations in this solution: 10150665deeSJianfeng Tan * Cannot work with --huge-unlink option. As we need to reopen the hugepage 10250665deeSJianfeng Tan file to share with vhost backend. 10350665deeSJianfeng Tan * Cannot work with --no-huge option. Currently, DPDK uses anonymous mapping 10450665deeSJianfeng Tan under this option which cannot be reopened to share with vhost backend. 10550665deeSJianfeng Tan * Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages. 106169a9da6SJianfeng Tan If you have more regions (especially when 2MB hugepages are used), the option, 107169a9da6SJianfeng Tan --single-file-segments, can help to reduce the number of shared files. 10850665deeSJianfeng Tan * Applications should not use file name like HUGEFILE_FMT ("%smap_%d"). That 10950665deeSJianfeng Tan will bring confusion when sharing hugepage files with backend by name. 11050665deeSJianfeng Tan * Root privilege is a must. DPDK resolves physical addresses of hugepages 11150665deeSJianfeng Tan which seems not necessary, and some discussions are going on to remove this 11250665deeSJianfeng Tan restriction. 113