Thursday, June 7, 2012

RAM-only PXE boot & the "smallest" diskless Linux box

This is a how to easily create a very small Linux box purely running on RAM memory that boots using PXE. This is an introductory topic (not new at all), for further posts about scalability, load balancing and high availability. That's why I mention clustering very often and also start simple & small by preparing a single node that boots smoothly in a controlled environment.

Why RAM-only & diskless?

I must say that such configuration of a Linux box can bring many advantages if you plan to assemble a cheap cluster without persistence in mind and with low maintenance costs. Consider the following:
  • ram availability: RAM memory is cheaper & fast wrt. the pass years
  • modern hardware, big ram: A lot of hardware support a great amount RAM memory installed, from 1 GB to 128 GB, and go on
  • Linux rocks: 64-bits Linux systems are able to manage a lot of RAM memory efficiently
  • HDD & the planet: Electromechanical HDD have serious implications in energy consumption,  recycling, NOISE and are more susceptible to failures
  • SSD & your wallet: SSD are more advanced wrt. their electromechanical counterparts (less susceptible to physical shock, are silent, and have lower access time and latency) but at present market prices, more expensive per unit of storage. So if you problem is not the storage, just processing, caching and networking, you are in the right place!
  • less is sometimes cheaper: It's not a bad idea, if you have a good chance to buy cheaper nodes by parts/complete w/o HDD
  • crashing doesn't matter: if a misbehaving node crashes you just need to restart it and it'll wake up again in a healthy state. A single node state doesn't wander in time
  • scaling better: adding a node to the cluster is easy, just connect it, enable PXE boot and add an entry in DHCP config
  • network congestion is reduced: the RAM filesystem is copied once per boot to the target node
Life is easy and cluster's maintenance costs are reduced, but remember that this is only if you don't need persistence in every single node, just CPU power, networking and RAM memory.

When don't I need persistence?

I don't have a full inventory of persistence-less & memory-network-only scenarios, but a practical and discrete list, I'm sure you can see the benefits:
  • cryptographic stuff, privacy: you need to run a cryptographic algorithm and ensure a full cleanup of private keys after the execution is complete, a HDD formating is not enough sometimes, and recovery data from a RAM memory after a full power of is very difficult if not impossible. Also an encrypted filesystem on top of the RAM shall be challenging for hackers
  • caching efficiently: if your RAM is enough and your backend cluster is under a constantly growing demand for static content. You can delegate all your caching needs to a dedicated frontend cluster running purely in RAM and release the load of backend servers by processing only dynamic content on this physical layer
  • time only algorithms: many algorithms have only processing power needs and low/medium memory foot print, some of them even only need volatile (non-persistence) memory for allocating data structures   
  • display only apps: some software solutions only need for displaying incoming data via graphs, video streaming, etc.. So a good display, a RAM-only system and a network is enough  

What will I obtain at the end of this guide? A Linux box, named it rambox, purely running in RAM memory, that means a root (/) filesystem mounted in RAM, that's why memory preservation is a priority as well as avoidance of a filesystem full of never-used archives which also increase the memory usage.

We'll also make a customized Kernel compilation to shrink it, with a "minimal" set of features incorporated. Keeping it simple small! At this point you should be careful about omitting mandatory kernel features, there's another set of features that are not mandatory but useful to obtain the best performance. They mainly depend on your hardware, so take care of them.

What's a RAM filesystem? A filesystem mounted on RAM isn't a new invention, is a awesome Kernel feature mostly used to load firmware/modules before starting the normal boot process.  It's called initrd or initramfs, there are differences between both (see references) and we'll be using initramfs.

What do I need?

For this guide I use two KVM-virtualized computers, running in a CentOS 6.2 host with bridged networking. For simplicity, the host and the two guests are in the same subnetwork
  1. pxe: a server computer with CentOS 6.2 amd64 installed, w/ 16 GB HDD, 1 GB RAM, no GUI, networking. With DHCP and TFTP role. Static IP  = 192.168.24.202, subnet =192.168.24.0/24 
  2. rambox: a RAM-only computer, w/o HDD installed, w/ networking. With  cluster node role. Dynamic DHCP-designated IP = 192.168.24.203, subnet =192.168.24.0/24

NOTE: Notice that BIOS used for QEMU and KVM virtual machines (the SeaBIOS) supports an open source implementation of PXE, named gPXE. So KVM-based virtual machine is able to boot via network. Now days almost any motherboard should have a BIOS with PXE support. Ensure that your rambox support it by checking the BIOS setup.

How does it work?

In summary, when the rambox with PXE boot activated wake ups:
  1. the BIOS PXE boot loader requests an address to DHCP server
  2. the DHCP server offers an IP address, a TFPT server IP address (himself), and the Linux PXE boot loader's location on the TFTP server
  3. the BIOS PXE boot loader downloads the Linux PXE boot loader from the TFPT server
  4. the Linux PXE boot loader takes control and uses the same IP configuration to connect to TFTP server and fetch two archives: the kernel and the ramdisk
  5. the Kernel takes control and configures its network interface, statically or by performing a second round of DHCP request, it depends
  6. the Kernel uncompress the ramdisk in memory
  7. the RAM disk is mounted on / and the /init script gets invoked
What do we have to configure and where? In pxe server computer is where everything takes place:
  1. Install and configure a DHCP server with support for PXE extensions
  2. Install and configure a TFTP server
  3. Create a reduced ramdisk with a minimal set of utils and programs
  4. Compile and optionally shrink the Kernel to include support for Kernel-level IP configuration, including NIC drivers
  5. Locate all the stuff in the correct place and wake up the rambox!
There are several detailed explanations of the Linux boot process, some of them are outdated but still useful. At the moment, I won't make a full description of every single step of the boot process, ramdisk, PXE, Kernel-level IP, etc. (see references)    

Hands on Bash

Login in pxe as a sudoer user (named bozz on this guide)

Installing phase

  1. Install the dhcp, tftp-server and syslinux packages, syslinux contains the Linux PXE boot loader:
  2. $ sudo yum install dhcp tftp-server syslinux
    
  3. Additionally, install some tools:
  4. $ sudo yum install bc wget
    
  5. Finally install kernel packages for kernel compilation. These packages ensure that you have all the required tools for the build:
  6. $ sudo yum install kernel-devel
    $ sudo yum groupinstall "Development Tools"
    
    # This is required to enable a make *config command to execute correctly. 
    $ sudo yum install ncurses-devel
    
    # These are required when building a CentOS-6 kernel. 
    $ sudo yum install hmaccalc zlib-devel binutils-devel elfutils-libelf-devel 
    
    # These are required when working with the full Kernel source
    $ sudo yum install rpm-build redhat-rpm-config unifdef
    
    # These are needed by kernel-2.6.32-220.el6
    $ sudo yun install xmlto asciidoc newt-devel python-devel perl-ExtUtils-Embed
    
    

Configure DHCP

  1. Ensure that the dhcpd starts at boot time:
  2. $ sudo chkconfig --level 35 dhcpd on
    $ chkconfig --list dhcpd
    dhcpd              0:off    1:off    2:off    3:on    4:off    5:on    6:off
    
  3. Edit dhcp.conf adding PXE specific options:
  4. $ sudo nano /etc/dhcp/dhcpd.conf
    it should finally look like this:
    # dhcpd.conf
    #
    # DHCP configuration file for ISC dhcpd
    #
    
    # Use this to enble / disable dynamic dns updates globally.
    ddns-update-style none;
    
    # Definition of PXE-specific options
    # Code 1: Multicast IP address of boot file server
    # Code 2: UDP port that client should monitor for MTFTP responses
    # Code 3: UDP port that MTFTP servers are using to listen for MTFTP requests
    # Code 4: Number of seconds a client must listen for activity before trying
    #         to start a new MTFTP transfer
    # Code 5: Number of seconds a client must listen before trying to restart
    #         a MTFTP transfer
    option space PXE;
    option PXE.mtftp-ip               code 1 = ip-address;  
    option PXE.mtftp-cport            code 2 = unsigned integer 16;
    option PXE.mtftp-sport            code 3 = unsigned integer 16;
    option PXE.mtftp-tmout            code 4 = unsigned integer 8;
    option PXE.mtftp-delay            code 5 = unsigned integer 8;
    option PXE.discovery-control      code 6 = unsigned integer 8;
    option PXE.discovery-mcast-addr   code 7 = ip-address;
    
    subnet 192.168.24.0 netmask 255.255.255.0 {
    
      class "pxeclients" {
        match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
        option vendor-class-identifier "PXEClient";
        vendor-option-space PXE;
    
        # At least one of the vendor-specific PXE options must be set in
        # order for the client boot ROMs to realize that we are a PXE-compliant
        # server.  We set the MCAST IP address to 0.0.0.0 to tell the boot ROM
        # that we can't provide multicast TFTP (address 0.0.0.0 means no
        # address).
        option PXE.mtftp-ip 0.0.0.0;
    
        # This is the name of the file the boot ROMs should download.
        filename "pxelinux.0";
    
        # This is the name of the server they should get it from.
        next-server 192.168.24.202;
      }
    
      pool {
        max-lease-time 86400;
        default-lease-time 86400;
        range 192.168.24.203 192.168.24.203;
        deny unknown clients;
      }
    
      host rambox {
        hardware ethernet 08:00:07:26:c0:a5;
        fixed-address 192.168.24.203;
        hostname rambox01.home.dev;
      }
    
    }
    
    NOTE: In this configuration the nodes will always use the same IP addresses leased by their MAC and the nodes with an unknown hardware address will be rejected. You can easily change this behavior by replacing "deny unknown clients" directive with "allow unknown clients" and deleting all the hosts entries.

Configuring TFPT

  1. To enable the TFTP server, edit /etc/xinetd.d/tftp replacing the word yes on the disable line with the word no. Then save the file and exit the editor:
  2. $ sudo nano /etc/xinetd.d/tftp
    it should finally look like:
    # default: off
    # description: The tftp server serves files using the trivial file transfer \
    # protocol.  The tftp protocol is often used to boot diskless \
    # workstations, download configuration files to network-aware printers, \
    # and to start the installation process for some operating systems.
    service tftp
    {
            socket_type             = dgram
            protocol                = udp
            wait                    = yes
            user                    = root
            server                  = /usr/sbin/in.tftpd
            server_args             = -s /var/lib/tftpboot
            disable                 = no
            per_source              = 11
            cps                     = 100 2
            flags                   = IPv4
    }
    
  3. Restart the xinetd daemon to reload configuration files:
  4. $ sudo service xinetd restart
  5. Verify if xinetd is started at boot time, it should be, if not then use chkconfig like the previous step:
  6. $ chkconfig --list xinetd
    xinetd             0:off    1:off    2:off    3:on    4:on    5:on    6:off
    

Concerning the firewall

  • Allow access to TFTP via standard ports:
  • $ sudo iptables -I INPUT -p udp --dport 69 -j ACCEPT
    $ sudo iptables -I INPUT -m state --state NEW -m tcp -p tcp --dport 21 -j ACCEPT
    $ sudo service iptables save
    $ sudo service iptables restart
    

Configuring the PXE environment

  1. Copy the Linux PXE boot loader pxelinux.0 to tftpboot published root directory:
  2. $ sudo cp /usr/share/syslinux/pxelinux.0 /var/lib/tftpboot
  3. Create PXE config directory on TFP root, this directory will contains a single configuration file per node or per subnet:
  4. $ sudo mkdir -p /var/lib/tftpboot/pxelinux.cfg
  5. The Linux PXE boot loader uses its own IP address in hexadecimal format to look for a single configuration file under pxelinux.cfg directory, if its not found it will remove the last octet and try again, repeating until it runs out of octets. That's why I define a helper function to convert IPv4 decimal to an hexadecimal string:
  6. #/**
    # * converts an IPv4 address to hexadecimal format completing the missing 
    # * leading zero
    # * 
    # * @example:
    # *   $ hxip 10.10.24.203
    # *   0A0A18CB
    # *
    # * @param $1: the IPv4 address 
    # */
    hxip() {
      ( bc | sed 's/^\([[:digit:]]\|[A-F]\)$/0\1/' | tr -d '\n' ) <<< "obase=16; ${1//./;}"
    }
    
    test the function via command line:
    $ hxip 192.168.24.203
    C0A818CB
  7. Create PXE Linux config file using the designated IPv4 address in hexadecimal format:
    $ sudo nano /var/lib/tftpboot/pxelinux.cfg/$(hxip 192.168.24.203)
    with the following content:
    DEFAULT bzImage
    APPEND initrd=initramfs.cpio.gz rw ip=dhcp shell
    
    or if you prefer to avoid the second round of DHCP issued by the Kernel:
    DEFAULT bzImage
    APPEND initrd=initramfs.cpio.gz rw ip=192.168.24.203:192.168.24.202:192.168.24.1:255.255.252.0:rambox:eth0:off shell
    
    where DEFAULT provides the Kernel archive and APPEND the Kernel parameters passed on boot:
    • bzImage: is the name of the compressed Kernel image
    • initrd=initramfs.cpio.gz: tells to Linux PXE boot loader to download this file and pass it to the Kernel later which will interpret it to be a compressed ramdisk filesystem image
    • rw: Kernel mounts the ramdisk filesystem in read-write mode
    • ip=dhcp: a Kernel-level IP parameter indicating to perform a DHCP request to obtain a valid network parameters, or alternative you can used a fixed network configuration
    • ip=192.168.24.203:192.168.24.202:192.168.24.1:255.255.252.0:rambox:eth0:off
      • node IP address = 192.168.24.203
      • server IP address = 192.168.24.202
      • default gateway IP address = 192.168.24.1
      • network mask = 255.255.252.0
      • node hostname = rambox
      • device = eth0
      • auto configuration protocol = off
    • shell: a custom parameter added by me to run a shell

Creating a compressed root filesystem

The Kernel support for initramfs allow us to create a customizable boot process to load modules and provide a minimalistic shell that runs on RAM memory. An initramfs disk is nothing else than a compressed cpio archive, that is then either embedded directly into your kernel image, or stored as a separate file which can be loaded by the Linux PXE boot loader. Embedded or not, it should always contains at least:
  • a minimum set of directories:
    • /sbin -- Critical system binaries
    • /bin -- Essential binaries considered part of the system
    • /dev -- Device files, required to perform I/O
    • /etc -- System configuration files 
    • /lib, /lib32, /lib64 -- Shared libraries to provide run-time support 
    • /mnt -- A mount point for maintenance and use after the boot/root system is running
    • /proc -- Directory stub required by the proc filesystem. The /proc directory is a stub under which the proc filesystem is placed
    • /root -- the root's home directory 
    • /sys -- 
    • /tmp -- Temporal directory 
    • /usr -- Additional utilities and applications 
    • /var -- Variable files whose content is expected to continually change during normal operation of the system—such as logs, spool files, and temporary e-mail files.
  • basic set of utilities: sh, ls, cp, mv, etc
  • minimum set of config files: rc, inittab, fstab, etc
  • devices: /dev/hd*, /dev/tty*, etc
  • runtime libraries to provide basic functions used by utilities
Is there any other simple method to create the RAM disk? Creating an initramfs can be also achieved by copying the content of an already installed Linux distro into an empty directory then package it, but you must be aware of carrying undesired and/or useless archives. There other methods, some of them simple, some of them not, but they are outside of the scopte of this guide which aims to show you a handy approch to obtain a lightweight RAM disk and Kernel

Use the following steps to create the initramfs:

  1. Creating a download cache & working zone. Also defining a helper command to download and cache archives:
  2. $ mkdir -p /tmp/cache
    $ mkdir /tmp/wrk
    $ pushd /tmp/wrk
    
    #/** 
    # * Downloads a file to the cache if doesn't exists
    # *
    # * @param $1 the file to download
    # * @param $2 the url where the file is located
    # */
    $ get() {
     [ -f /tmp/cache/$1 ] || wget -t inf -w 5 -c $2/$1 -O /tmp/cache/$1
    }
    
  3. Creating and entering to initramfs root directory:
  4. $ mkdir initramfs
    $ pushd initramfs
    
  5. Creating filesystem's base directories:
  6. $ mkdir -p -m 0755 dev etc/{,init,sysconfig} mnt sys usr/{,local} var/{,www,log,lib,cache} run
    $ mkdir -p -m 0555 {,s}bin lib{,32,64} proc usr/{,s}bin
    $ mkdir -p -m 0700 root
    $ mkdir -p -m 1777 tmp
    $ pushd var
    $ ln -s ../run run
    $ popd
    
  7. Creating /etc/profile to exports environment variables:
  8. $ dd of=etc/profile << EOT
    ## /etc/profile
    
    export PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/sbin"
    EOT
  9. Creating /etc/fstab with various mount points:
  10. $ dd of=etc/fstab << EOT
    devpts  /dev/pts  devpts  nosuid,noexec,gid=5,mode=0620  0 0
    tmpfs   /dev/shm  tmpfs   nosuid,nodev,mode=0755  0 0
    sysfs   /sys      sysfs   nosuid,nodev,noexec  0 0
    proc    /proc     proc    nosuid,nodev,noexec  0 0
    EOT
    $ chmod 0644 etc/fstab
    
    
  11. Configure passwd & group settings:
  12. $ dd of=etc/passwd << EOT
    root:x:0:0:root:/root:/bin/sh
    nobody:x:99:99:NoBody:/none:/bin/false
    www:x:33:33:HTTP Server:/var/www:/bin/false
    EOT
    $ dd of=etc/group << EOT
    root:x:0:
    nobody:x:99:
    www:x:33:
    EOT
    
  13. Configure some host related settings:
  14. $ dd of=etc/host.conf <<< "multi on"
    $ dd of=etc/hostname <<< "rambox"
    $ dd of=etc/hosts << EOT
    127.0.0.1 localhost.localdomain localhost
    127.0.1.1 $(cat etc/hostname)
    EOT
    
  15. Configure timezone:
  16. $ dd of=etc/timezone <<< "America/New_York"
    $ cp /usr/share/zoneinfo/$(cat etc/timezone) etc/localtime
    
  17. Busybox is a handful tool used very often in ramdisks and small devices with very limited resources, providing a self-contained and minimal set of POSIX compatible unix tools in a single executable archive. I'll be using busybox on this guide. Getting busybox and create sh symbolic link:  
  18. $ pushd bin
    $ chmod +w .
    $ bb=busybox-x86_64 && get $bb http://www.busybox.net/downloads/binaries/latest/busybox-x86_64
    $ cp /tmp/cache/$bb busybox && chmod +x busybox
    $ ln -s busybox sh
    $ chmod -w . 
    $ popd
    
  19. Additionally we MAY need an DHCP configuration script, so we'll use busybox's udhcp and simple.script. The we'll create an script named renew_ip that performs all the job:
  20. $ pushd bin
    $ chmod +w .
    $ ss=simple.script
    $ get $ss http://git.busybox.net/busybox/plain/examples/udhcp/$ss
    $ cp /tmp/cache/$ss . && chmod +x $ss 
    
    $ dd of=renew_ip << EOT
    #!/bin/sh
    
    ifconfig eth0 up
    udhcpc -t 5 -q -s /bin/simple.script
    EOT
    
    $ chmod +x renew_ip
    $ chmod -w . 
    $ popd
    
  21. One of the most important phases is /init script execution, this is a simple shell script file that performs all initialization process on the ramdisk. It usually mounts all filesystems listed on fstab, creates device nodes (like udev device manager), loads device firmware and finally remounts another root (/) directory in other device and relaunches the new mounted /sbin/init. This is the point where we intervened, by just launching the shell or by executing our own /sbin/init w/o remounting the root (/). So edit init script and add the following content:
  22. $ nano init
    
    w/ this content:
    #!/bin/sh
    
    # Make all core utils reachable 
    . /etc/profile
    
    # Create all busybox's symb links
    /bin/busybox --install -s
    
    # Create some devices statically
    
    # pts: pseudoterminal slave 
    mkdir dev/pts
    
    # shm
    mkdir dev/shm
    chmod 1777 dev/shm
    
    # Mount the fstab's filesystems.
    mount -av
    
    # Some things don't work properly without /etc/mtab.
    ln -sf /proc/mounts /etc/mtab
    
    # mdev is a suitable replacement of the udev device node creator for loading 
    # firmware
    touch /etc/mdev.conf 
    echo /sbin/mdev > /proc/sys/kernel/hotplug
    mdev -s
    
    # Only renew the IP address via DHCP if you need it. Not needed if Kernel-level 'ip=...' 
    # was used. 
    #renew_ip 
    
    # set hostname
    hostname $(cat /etc/hostname)
    
    # shell launcher 
    shell() {
     echo "${1}Launching shell..." && exec /bin/sh
    }
    
    # launch the shell if the 'shell' parameter was supplied
    grep -q 'shell' /proc/cmdline && shell
    
    # parse kernel command and obtain the init & root parameters
    # if not then use default values
    for i in $(cat /proc/cmdline); do
     par=$(echo $i | cut -d "=" -f 1)
     val=$(echo $i | cut -d "=" -f 2)
     case $par in
      root)
       root=$val
       ;;
      init)
       init=$val
       ;;
     esac
    done
    init=${init:-/sbin/init}
    root=${root:-/dev/hda1}
    
    # if rambox parameter is supplied then keep the ramdisk mounted, ignore root parameter 
    # and run the other init script. Located at /sbin/init by default
    if grep -q 'rambox' /proc/cmdline ; then
      [ -e ${init} ] || shell "Not init found on ramdisk at '${init}'... " 
    
      echo "Keeping the ramdisk since rambox param was supplied & executing init... "
      exec ${init}
      
      #This will only be run if the exec above failed
      shell "Failed keeping the ramdisk and executing '${init}'... "
    fi
    
    # Neither shell nor rambox parameters were supplied then, try to switch to the new
    # root and launch init 
    mkdir /newroot
    mount ${root} /newroot || shell "An error ocurred mounting '${root}' at /newroot... "
    [ -e /newroot${init} ] || shell "Not init found at '${init}'... " 
    
    echo "Resetting kernel hotplugging... "
    :> /proc/sys/kernel/hotplug
    echo "Umounting all... "
    umount -a
    echo "Switching to the new root and executing init... "
    exec switch_root /newroot ${init}
    
    #This will only be run if the exec above failed
    mount -av
    mdev -s
    shell "Failed to switch_root... "
    
    as you may notice, every single step is commented, however this is an overall explanation of the process:
    1. /etc/profile is sourced to export PATH variable and make all executables reachable 
    2. All busybox's symbolic links are created
    3. Some special devices are created by hand
    4. All /etc/fstab filesystems are mounted
    5. The rest of the devices are discovery and created by busybox's mdev
    6. The Kernel command line located at /proc/cmdline is parsed to see if the shell parameter was supplied, is so the shell is immediately launched replacing the current process instance, hence everything else is ignored
    7. The Kernel command line is checked again to see if rambox parameter was supplied, indicating that we want to keep the ramdisk mounted at / and launch the normal /sbin/init process
    8. If neither shell nor rambox parameters were supplied then, try to mount the new root (/) and launch the /sbin/init on this new location
    9. Finally if neither the new root cannot be mounted nor the /sbin/init script cannot be executed, then a shell is launched indicating this situation
    10. If on any of these shell launching steps an error is produced, then a Kernel panic is issued   
  23. Append execution permissions to /init:
  24. $ chmod +x init
    
  25. Change ownership to everything:
  26. $ sudo chown -R root:root *
    
  27. Create the initramfs.cpio.gz compressed archive and copy it to tftp's root directory:
  28. $ sudo find . -print0 | sudo cpio --null -ov --format=newc | gzip -9 > ../initramfs.cpio.gz
    $ sudo cp ../initramfs.cpio.gz /var/lib/tftpboot
    
  29. Go back to working directory:
  30. $ popd
    

Now the Kernel stuff:

What I am about to do with the Kernel is very simple, compile it using a minimal set of features that makes it boot and recognize MY hardware, mainly the NIC device. Hence, depending on your hardware you should probably use a different selection of features for Kernel compiling. So I recommend first to do a once-time installation of any modern Linux distribution (like I did) like CentOS, Gentoo, Fedora, Debian or Ubuntu with a modern Kernel version and check the modules loaded on boot using /sbin/lsmod. Then using this modules list, look for the corresponding Kernel options and INCLUDE them all in the Kernel, making it a solid rock!. That's what I did.

NOTE: In our journey for making the Kernel simple and small, we should be careful in omitting some Kernel critical features and lost the hardware advantages, for example SMP features. So if we really want to use it in a production environment, then a deep research and customization must be done before.

Start Kerneling...    
  1. Download the Kernel sources from the sky:
  2. $ krn=linux-2.6.39
    $ get $krn.tar.xz http://www.kernel.org/pub/linux/kernel/v2.6/$krn.tar.xz
    
  3. Uncompress it into a working directory, named it linux:
  4. $ tar xvf /tmp/cache/$krn.tar.xz -C .
    $ mv $krn linux
    $ pushd linux
    
  5. Clean all configuration settings and enter the menu:
  6. $ make clean
    $ make allnoconfig
    $ make menuconfig
    
  7. An ncurses menu dialog should be opened. Now check a "minimal" set of features, and uncheck the unneeded ones, I'll only list what changes wrt the clean configuration settings. So [*] means explicitly checked to be EMBEDDED it into the Kernel, and [ ] means explicitly unchecked to be not included
    1. General Setup (here the RAM filesystem is the most important feature)
    2. [*] Prompt for development and/or incomplete code/drivers
      (-minimal) Local version - append to kernel release
      [*] Initial RAM filesystem and RAM disk (initramfs/initrd) support
      
    3. Bus options (PCI etc.)  ---> (Enable support for PCI devices, you may add support for your PCI hardware here)
    4. [*] PCI support 
    5. Executable file formats / Emulations  ---> (An important piece!, you won't be able to execute almost anything if you don't check it)
    6. [*] Kernel support for ELF binaries
      
    7. [*] Networking support (Beside of enabling TCP/IP and disable Wireless, IPSec, etc. The most important feature to check here is the  IP-Kernel level auto configuration with DHCP support)
    8.   [ ]   Wireless  ---> 
       Networking options 
          [*] Packet socket                                                                                 
          [*]   Packet socket: mmapped IO                                                                     
          [*] Unix domain sockets
          [*] Transformation sub policy support (EXPERIMENTAL)
          [*] Transformation migrate database (EXPERIMENTAL)
          [*] PF_KEY sockets
          [*]   PF_KEY MIGRATE (EXPERIMENTAL)                                                                                  
          [*] TCP/IP networking
          [*]   IP: multicasting                              
          [*]   IP: advanced router                           
              Choose IP: FIB lookup algorithm (choose FIB_HASH if unsure) (FIB_HASH  
          [*]   IP: policy routing                            
          [*]   IP: equal cost multipath                      
          [*]   IP: verbose route monitoring                                                                               
          [*]   IP: kernel level autoconfiguration                                                           
          [*]     IP: DHCP support
          [*]   IP: tunneling                                 
          [*]   IP: GRE tunnels over IP                       
          [*]     IP: broadcast GRE over IP                   
          [*]   IP: multicast routing                         
          [*]     IP: PIM-SM version 1 support                
          [*]     IP: PIM-SM version 2 support                
          [*]   IP: ARP daemon support                        
          [*]   IP: TCP syncookie support (disabled per default)                         
          [*]   IP: AH transformation                         
          [*]   IP: ESP transformation                        
          [*]   IP: IPComp transformation                                                                            
          [ ]   IP: IPsec transport mode                                                                     
          [ ]   IP: IPsec tunnel mode                                                                        
          [ ]   IP: IPsec BEET mode
          [*]   TCP: advanced congestion control  --->                                                                           
           [*]   CUBIC TCP (NEW) (only cubic)
          [*]   TCP: MD5 Signature Option support (RFC2385) (EXPERIMENTAL) 
          [ ]   The IPv6 protocol  --->                                                                      
      
      
    9. Device Drivers  ---> (RAM block device support and Network device support + Ethernet are the most important things, the remaining stuff is related to my current hardware)
    10.   [*] Block devices  --->
          [*]   RAM block device support
        [*] Multiple devices driver support (RAID and LVM)  --->
          [*]   Device mapper support
        [*] Network device support  ---> 
          [*]   Ethernet (10 or 100Mbit)  --->
          [ ]   Wireless LAN  ---> 
       Character devices  --->
          [*] /dev/kmem virtual device support
          [*] Hardware Random Number Generator Core support
        [*] I2C support  --->
          [*]   I2C device interface
          I2C Hardware Bus support  ---> 
            [*] Intel PIIX4 and compatible (ATI/AMD/Serverworks/Broadcom/SMSC)
       Serial ATA (prod) and Parallel ATA (experimental) drivers (ATA [=n])
          [*]   ATA SFF support (NEW) 
            [*]    Intel ESB, ICH, PIIX3, PIIX4 PATA/SATA support
            [*]    Generic ATA support
      
    11. File systems  --->  (File systems are very important, they support depend on what's your final goal: mount an NFS remotely for a shared storage? use a GlusterFS / Ceph filesystem in top of a NAS? The configuration I used is the simplest one, only support for initramfs and other pseudo filesystem. I recommend to start with this one, then gradually embed your filesystems) 
    12.   [ ] Network File Systems  --->
        Pseudo filesystems  --->
          [*] Virtual memory file system support (former shm fs)
          [*]   Tmpfs POSIX Access Control Lists 
      
    13. [*] Virtualization  ---> (As I mentioned earlier, I'm using a KVM-virtualized hardware with a wide usage of Virtio paravirtualization technology. Virtio adds supports for a paravirtual Ethernet card, a paravirtual disk I/O controller, a balloon device for adjusting guest memory usage, and a VGA graphics interface using SPICE drivers. Virtio drivers for guest machines are included in the Kernel >= 2.6.25, see details here)    
    14.   [*]   PCI driver for virtio devices (EXPERIMENTAL)
        [*]   Virtio balloon driver (EXPERIMENTAL)
      
    15. Device Drivers ---> [for virtualization]
    16.  
        [*] Block devices  --->
          [*]   Virtio block driver (EXPERIMENTAL)
        [*] Network device support  ---> 
          [*]   Virtio network driver (EXPERIMENTAL)
        Character devices  --->
          [*] Virtio console
          [*] Hardware Random Number Generator Core support
          [*]   VirtIO Random Number Generator support
      
    17. Exit the Kernel configuration menu and don't forget to save the settings file.
  8. Compile the Kernel (-j4 means 4 threads devoted to compilation), copy it to TFPT's root directory:
  9. $ make -j4 bzImage
    $ sudo cp arch/x86/boot/bzImage /var/lib/tftpboot/
    
  10. Do cleanup:
  11. $ popd
    $ sudo rm -rf /tmp/wrk
    
  12. Power on the rambox and enjoy it! It should boot smoothly and launch the busybox's shell.
You will find the basic tools at /bin, /sbin, /usr/bin, /usr/sbin, /usr/local/sbin, all these tools are indeed in the PATH environment variable. To renew your IP address just run renew_ip. Finally notice that any Kernel module is loaded since all that you need is embedded.

Enjoy it!

Post install

Perform some checks after install to ensure that everything is OK and measure for resource consumption:

  • Free memory, as you may notice approximately only 12Mb are used:
  • $ free -m
                    total    used    free    shared    buffers
    Mem:             1255      12    1243         0          0 
    -/+ buffers:               12    1243
    Swap:               0       0       0 
    
  • Network configuration/connectivity, both interfaces should be listed, eth0 and lo:
  • $ ifconfig
    eth0      Link encap:Ethernet  HWaddr 08:00:07:26:c0:a5  
              inet addr:192.168.24.203  Bcast:192.168.24.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:378 errors:0 dropped:0 overruns:0 frame:0
              TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:43745 (42.7 KiB)  TX bytes:1180 (1.1 KiB)
              
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:0 (0 B)  TX bytes:0 (0 B)
    
    $ ping -c 3 192.168.24.202
    PING 192.168.24.202 (172.26.24.202) 56(84) bytes of data.
    64 bytes from 192.168.24.202: icmp_req=1 ttl=63 time=0.774 ms
    64 bytes from 192.168.24.202: icmp_req=2 ttl=63 time=0.639 ms
    64 bytes from 192.168.24.202: icmp_req=3 ttl=63 time=0.574 ms
    
    --- 192.168.24.202 ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2001ms
    rtt min/avg/max/mdev = 0.574/0.662/0.774/0.085 ms
    
    
  • Mounted partitions:
  • $ mount
    rootfs on / type rootfs (rw)
    devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=0620)
    tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime,mode=0755)
    sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
    proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
    
  • Disk usage, about 1.1Mb: 
  • $ du -chs /*
    1.1M    /bin
    0       /dev
    28.0K   /etc
    4.0K    /init
    0       /lib
    0       /lib32
    0       /lib64
    0       /linuxrc
    0       /mnt
    0       /proc
    0       /root
    36.0K   /sbin
    0       /sys
    0       /tmp
    20.0K   /usr
    0       /var
    1.1M    total
    
  • Loaded modules, since no module was loaded an empty list or a 'No such a file or directory' message is issued
  • $ lsmod
    lsmod: can't open '/proc/modules': No such file or directory
    
  • Device nodes created by device manager(mdev/udev). The result depends on many factors:
  • $ find /dev | wc -l
    106
    

References

6 comments:

  1. Thanks for this good howto.

    I would suggest a tiny improvement to hxip function to also resolve hostnames if needed :

    hxip() {
    [ "${1//[[:alpha:]]/}" = "$1" ] && ADDR="$1" || ADDR="$(host "$1" | sed -n 's/^.*address //p')"
    ( bc | sed '/^[[:xdigit:]]$/s/^/0/' | tr -d '\n' ) <<< "obase=16; ${ADDR//./;}"; }

    moreover, you should add a link to syslinux project for PXELinux downloads.

    ReplyDelete
    Replies
    1. Thank you very much! All kind of feedback is appreciated. I'll consider all your suggestions and update the post later!

      Delete
  2. Thanks much for this howto! I am a networking intern and I have had a similar idea though its implementation has been rough going... this is the 3rd howto I have tried to follow along these lines, but the only one that has gone fairly smooth! Thanks for the indepth explanations and for taking the time to write it all out. I will comment again when I am done (I'm compiling the kernel right now) and let you know if I could make it work for me. Thanks again for the time you have taken here, much appreciated!

    ReplyDelete
  3. Thank you for the amazing howto.

    I think I found a typo (s/yun/yum/):
    # These are needed by kernel-2.6.32-220.el6
    $ sudo yun install xmlto asciidoc newt-devel python-devel perl-ExtUtils-Embed

    ReplyDelete
  4. Nice and very helpful information i have got from your post. Even your whole blog is full of interesting information which is the great sign of a great blogger.

    Acer - Aspire 15.6" Laptop - 4GB Memory - 500GB Hard Drive

    Acer - Aspire 15.6" Laptop - 6GB Memory - 750GB Hard Drive - Glossy Black

    ReplyDelete
  5. There are many factors which influenced the development of buy memory and ram. Remarkably buy memory and ram is heralded by shopkeepers and investment bankers alike, leading many to state that buy memory and ram is not given the credit if deserves for inspiring many of the worlds famous painters.

    ReplyDelete