In the first part of this article I’ve explained what Atlantis ILIO products and components there are, explained what type of Session Hosts there are and how the Replication Host works. In this part I will dive a bit deeper in the boot process and the how the hosts are managed.
So what happens is a Session- or Replication Host starts-up? What has happened to make the datastore available for the virtual machines?
The boot process of the hosts consists of booting up the operating system (Ubuntu 8.10) , boot to Multi-user mode (runlevel 2) and finally run the /etc/rc.local script. Each Atlantis ILIO host has a specific boot process, yet they share a great similarity.
During the execution of the start-up script /etc/rc.local a number of actions are executed. In table below all actions are listed and explained.
|Network Optimization||For each network interface [ethx] the TCP/IP stack is optimized for IP storage / maximum throughput by enabling TCP Segment Offload (TSO), increasing the ring receive (RX) buffer to 4096, increasing the ring send (tx) buffer to 4096 and setting the transmit queue length (txqueuelen) to 300000.|
|Set RAID minimum network speed||Configures the minimum RAID speed equal to the maximum speed of the network (2Gbit).
This command is only issued on the “Atlantis ILIO Persistent VDI – Session Host” (functiontype DLN) because that’s the only machine using linux-raid (called “Atlantis FastReplication”) as explained in part 1 of this article.
|Reconstruct ILIO cache||The memory cache of ILIO (used for inline de-duplication and assembling small random blocks to large sequential blocks) is reconstructed to ensure a consistent state.
It ensures that the VMs (in the instance of a disk-based Session host or a Replication Host) are read in from the permanent environment, recreated and re-populated.
|Synchronize file systems||This is essentially a checkpoint. The ILIO host is doing a "we’re all good – “start here”. Effectively creating a synchronization point, this is the point in time you rollback to when non-persistent machines are rebooted.|
|Start ILIO Center Agent||The ibert daemon is started (also known as the ILIO Center Agent) to enable ILIO Center to manage and monitor the host.|
|Mount nfsd volume||Mounts the the nfsd filesystem to /proc/fs/nfsd which is required to provide access to the Linux NFS server. The file system consists of a single directory which contains a number of files. These files are actually gateways into the NFS server.|
|Start NFS server||Starts the NFS server daemon on the host to enable remote access to the /exports/ILIO_VirtualDesktops mount point and all other mount points stored in the /etc/exports file.|
|Configure disk||For each disk [/dev/hdx] parameters are set to optimizes the caching, buffering, etc. to ensure a proper writing. *more info needed*|
|Restore SnapClone||Restores a snapshot from folder /mnt/images (linked to /dev/sdb1) to /dev/(z)ram0 (RAM disk) using the PartClone tool.
The snapshot is restored to ensure that all files that where stored on the host, like the configuration (.vmx) and virtual hard disk (.vmdk) are available. Once the files are available the hypervisor host can boot the virtual machine. If no snapshot is restored, and the host is restarted, an empty datastore is presented to the hypervisor host.
This command is only issued on the “Atlantis ILIO Diskless VDI – Session Host”. Normally all files are stored on the RAM disk, which is volatile. Unlike the “Atlantis ILIO Persistent VDI – Session Host” a constant replication with a SAN is not wanted for two reasons 1) files should only persist when issued by the administrator 2) the goal of the diskless solution is to prevent ALL IOPS to the SAN
A special thanks to Andrew Wood (Solutions Engineer @ Atlantis Computing) for providing additional information.
Similarities and differences
As explained in part 1 of this article there are different Atlantis ILIO hosts. To keep things simple we’re focusing on the “Atlantis ILIO Diskless VDI” and the “Atlantis ILIO Persistent VDI” product.
Persistent – Replication Host versus Diskless – Session Host
The start-up process of the “Atlantis ILIO Persistent VDI – Replication Host” is equal to the ”Atlantis ILIO Diskless VDI – Session Host” , with the exception of the SnapClone restore. As already concluded in the first part the Replication Host is basically an improved Session Host, just like a session host it receives files and via a NFS mount-point and stores it on the SAN.
Because the “Atlantis ILIO Diskless VDI – Session Host” only stores files in a volatile RAM disk, all changes are lost after a power-cycle. To retain the virtual machines configuration (.vmx) and hard disk (.vmdk) file an “Atlantis SnapClone” is created by the administrator once the virtual machines are created, the snapshot is restored from disk [/dev/sdb1] to memory [/dev/(z)ram0].
Persistent – Replication Host versus Persistent – Session Host
The Session- and Replication Host of the “Atlantis ILIO Persistent VDI” product again share a lot of similarities but have two differences.
- On the Session Host a minimum speed is for RAID is configured. This command is only issued on the Session Host because that machine uses a linux-raid (called “Atlantis FastReplication”), as explained in part 1 of this article.
- On the Session Host the NFS Server is started after the disks are configured, where on the other hosts it is started earlier in the process. It would make more sense to me that the NFS server is started last on all hosts, but somehow Atlantis Computing choose differently.
With the introduction of Atlantis ILIO Diskless VDI centralized management of the Sessions Hosts became mandatory, primarily to create/restore SnapClones. Atlantis ILIO Center fulfilled this role and introduced health monitoring and reporting. Nonetheless the Session Hosts could operate independently, they where isolated entities.
With the introduction of the Replication Host the Sessions Host are no longer isolated entities, they have a mapping with each other. A Session Cluster maps a Replication Host to a number of Session Hosts. One Atlantis ILIO Center can manage multiple Session Clusters (if needed).
A Session Cluster increases complexity, all components needs to be aware of the mapping between the hosts:
- Each “Atlantis ILIO Persistent VDI” Session Host needs to know to what Replication Host it should connect;
- Each Replication Host should know which Session Hosts to serve a dedicated NFS mount-point;
- The Atlantis ILIO Center needs to know about the relation so it can manage cluster.
The more complex the setup gets the higher the risk is of a faulty configuration or inconsistent state of one of the hosts. Knowing how the relations of a Session Cluster are managed help prevent this situations and help you troubleshoot them.
Atlantis ILIO Center
Session Clusters are created and managed via Atlantis ILIO Center. ILIO Center uses a database to store the configuration, health state and resource usage of all hosts.
To configure a Session Cluster three databases are used:
|cluster||Contains a record for each Session Cluster showing the internal clustername and the ID of the replication host (persistent_entity_id).|
|entities||Contains a record for each Atlantis ILIO session- and replication hosts (as explained here)|
|entitiesMapping||Contains the mapping between the persistent entity (aka the replication host) and the diskless entity (aka the session host)|
The ILIO center database is a relational database where each table contains unique data, a row is identified with an identifier (or ID) and no data entry is redundant (in other words: a value – like a hostname – is stored in one location).
In the diagram above you can see a visual representation of the related tables and columns (not all columns are shown). A Session Cluster is established as follows in the database:
- for each Session- or Replication host an entry is created in the [entities] table, uniquely identified with its identifier (id);
- for each Session Cluster an entry is created in the [Cluster] table. The name of the cluster (clusterName) is an internal name “Cluster16”, the cluster is mapped to the entry in the [entities] table Replication Host (persistent_entity_id);
- for each Session Host an entry is created in the [entitiesMapping] table (diskless_entity_id) to map it to the Replication Host (persistent_entity_id).
An example of a Session Cluster stored in a database:
In Session Cluster management page in Atlantis ILIO center page:
PS: The database can be accessed for monitoring purposes, though this is not officially supported. In this article it is explained how to gain access to the database.
On start-up a Session Host needs to know to which Replication Host (or persistent node) it is mapped to in order to initiate a FastReplication (synchronization of data on SAN with RAMdisk).
Session Hosts are not depending on an active connection with ILIO center. Not having a dependency on the ILIO center has huge benefits, it can operate regardless of having an ILIO center available which is great for scalability. The downside is that the mapping with the Replication Host (including address information) is not retrieved from the ILIO center database but stored on each Session Cluster separately.
As explained in part one an “Atlantis ILIO Persistent VDI – Session Hosts” mounts the folder /persistent/persistentnode via NFS to <ip>:/exports/ILIO_VirtualDesktops/<session-host-name> . The location of the mapped Replication Host is stored in a simple (but effective) way: as a entry in the hosts file with the name “persistentnode”.
127.0.0.1 localhost ilio-sh-001 10.0.0.201 persistentnode
When a Session Host is added to a Session Cluster a process “setupDisklessNode” is executed with the following actions:
|setFunctionType||The host is configured as an “Atlantis ILIO Persistent VDI – Session Host” or DLN. This is set in the file /etc/ilio/iliofunctiontype|
|setEntryForPersistentNodeInDisklessHostsFile||An entry is added / replaced for the persistentnode in the /etc/hosts file with the IP address of the Replication Host (or persistentnode)|
|mountNfsExportFromPnOnDln||The NFS-mount <ip>:/exports/ILIO_VirtualDesktops/<session-host-name> is mapped to /persistent/persistentnode|
|createDeviceOnDln||Loopback device loop0 is mapped to /persistent/persistentnode/zerofile|
|createAndExportCygnusVolumeOnDln||The de-duplication device /dev/md0 is created, formatted and mounted via NFS / iSCSI (stored in /etc/exports) with the name /exports/ILIO_VirtualDesktops The device /dev/md0 is a RAID 1 (mirror) between the RAM disk and the persistent store on the Replication host (see part one)|
|setRclocalFileForPersistentDisklessOnDln||The start-up script /etc/rc.local is changed from an “Atlantis Diskless VDI – Session Host” to an “Atlantis ILIO Persistent VDI – Session Host”
Basically the rc.local file is copied from /opt/ibert/agent/ to /etc. The following files are available:
For information on your Session Host see the /var/log/ibertagent.logfile
An “Atlantis ILIO Persistent VDI – Replication Hosts” exposes an NFS mount-point for each Session Hosts separately (with the name of the session host). A single mount-point is provided for each Session Host to offer the best performance (each mount point has a limit of 16 outstanding I/O’s) and enable granular control of permissions.
When a Session Host is added to the the Session Cluster the Replication Host create a folder in /exports/ILIO_VirtualDesktop with the hostname of the Session Host (for example ilio-sh-001). That folder added as an NFS-mount in the /etc/exports file.
"/exports/ILIO_VirtualDesktops" *(rw,no_root_squash,no_subtree_check,async,insecure,nohide) "/exports/ILIO_VirtualDesktops/ilio-sh-001" *(rw,no_root_squash,no_subtree_check,async,insecure,nohide) "/exports/ILIO_VirtualDesktops/ilio-sh-002" *(rw,no_root_squash,no_subtree_check,async,insecure,nohide) "/exports/ILIO_VirtualDesktops/ilio-sh-003" *(rw,no_root_squash,no_subtree_check,async,insecure,nohide) "/exports/ILIO_VirtualDesktops/ilio-sh-004" *(rw,no_root_squash,no_subtree_check,async,insecure,nohide) "/exports/ILIO_VirtualDesktops/ilio-sh-005" *(rw,no_root_squash,no_subtree_check,async,insecure,nohide)
Do you have any questions that didn’t got answered in part one or part two (this article)? Let me know what and maybe I’ll write another part