Tuesday, May 1, 2018

OCC/OCI-C Orchestration V2 Server Anti-Affinity

Introduction

The Oracle Cloud at Customer offering provides a powerful deployment platform for many different application types.  In the enterprise computing arena it is necessary to deploy clustered applications such that the app can be highly available by providing service even if there is a fault in the platform.  To this end we have to ensure that an application can be distributed over multiple virtual machines and these virtual machines have to reside on different physical hardware.  The Oracle Cloud Infrastructure - Classic (which is what runs on the Oracle Cloud at Customer racks) can achieve this by having it specified in what is called an orchestration.

Orchestration Overview

In OCI-C it is possible to create complex environments using multiple networks and apply firewall rules to the VMs that connect to these networks.  The VMs themselves have many different configuration items to specify the behaviour.  All of this is described in the documentation but the diagram below shows the interactions between various objects and a virtual machine instance.



Relationships between objects in Oracle Compute Cloud Service
OCI-C Object interactions (Shared Network only)

Using orchestrations (v2) you can define each of these objects in a single flat file (json formatted).  This file allows you to specifiy the attributes of each object and also reference each other.  By using a reference the system is then able to work out what order the objects should be created.  i.e. The dependencies.  Say a VM will use a storage volume it can use a reference to the volume meaning that the storage volume must come on-line prior to the instance being created.

For example the following snippet of json file defines a boot disk and then the VM instance definition references the name of the volume, identified by the label name.

 
 {
      "label": "df-simplevm01-bootdisk",
      "type": "StorageVolume",
      "persistent":true,
      "template":
        {
          "name": "/Compute-500019369/don.forbes@oracle.com/df-simplevm01-bootvol",
          "size": "18G",
          "bootable": true,
          "properties": ["/oracle/public/storage/default"],
          "description": "Boot volume for the simple test VM",
          "imagelist": "/oracle/public/OL_7.2_UEKR4_x86_64"
        }
...
 {
      "label": "df-simplevm01",
      "type": "Instance",
      "persistent":false,
      "name": "/Compute-500019369/don.forbes@oracle.com/df-simple_vms/df-simplevm01",
      "template": {
        "name": "/Compute-500019369/don.forbes@oracle.com/df-simplevm01",
        "shape": "oc3",
        "label": "simplevm01",
        "storage_attachments": [
          {
            "index": 1,
            "volume": "{{df-simplevm01-bootdisk:name}}"
          }
        ], 
...


There are a few other things in this that are worth pointing out.  Firstly the naming convention for all objects is essentially /<OCI-Classic account identifier>/<User Name>/<Object name>.   Thus when defining the JSON file content the first part of the name will vary according to what data centre/Cloud at Customer environment you are deploying into and the rest is determined by the users setup and the naming convention you want to utilse.  When deploying between cloud regions/Cloud at Customer these values may change.

The other thing to notice is that I have specifically made the boot disk have a persistence property of true while the VM has a persistence property of false.  The reason for this becomes clear when we consider the lifecycle of objects managed by a V2 orchestration.


Depicts the states of an orchestration
Orchestration V2 lifecycle



An orchestration can be either suspended or terminated once in the active state.  Suspension means that any non-persistent object is deleted while persistent objects remain on-line.  Termination will delete all objects defined in the orchestration.  By making the storage volume persistent we can suspend the orchestration which will stop the VM.  With the VM stopped we can update the VM to change many of its characteristics and then activate the orchestration again which will re-create the VM but using the persistent storage volume meaning the VM retains any data it has written to disk and acts as if it was simply re-booted but now happens to have more cores/memory etc.

Below is screen shot taken from OCI-Classic (18.2.2) showing some of the VM attributes that can be changed.  Essentially all configuration items of the VM can be adjusted.

Basic VM attributes that can be changed in an Orchestration V2

These attributes cannot be changed on a running VM -  it mandates a shutdown.  If the VM had been marked to be persistent then suspending the orchestration would have left the VM on-line and thus with immutable configuration so any change would mandate a termination of the orchestration which would delete things like the storage volumes.  Potentially not the desired behaviour.


Using Server Anti-Affinity

Having considered the usage of orchestrations we can now consider setting up some degree of control over the placement of a VM.    This is covered in the documentation but here we are considering two options, firstly the instruction to place VMs on "different nodes" and secondly to place them on the "same node".  Obviously the primary purpose of this blog posting is to consider the different node approach.

different_node Relationships

One of the general attributes of an object in an orchestration is its relationship to other objects.  The documentation for V2 specifies that the only relationship is the "depends" relationship and is included into an orchestration using the following format:

"relationships": [
  {
    "type": "depends",
    "targets": ["instance1"]
  }
]
 
Other relationships are possible, namely different_node and same_node. For a different_node approach we simply have the type set to the text of "different_node" and then in the targets array specify the instances that must be on different nodes.  Doing this will also setup a depends relationship as this instance placement will depend on the other instance placement.  So for example, with 4 VMs the objects in the orchestration will have the following relationships to ensure they are placed on different physical nodes.

 ...
  {
      "label": "df-simplevm01",
      "type": "Instance",
      "persistent":false,
      "name": "/Compute-500019369/don.forbes@oracle.com/df-simple_vms/df-simplevm01",
      "template": {
        "name": "/Compute-500019369/don.forbes@oracle.com/df-simplevm01",
        "shape": "oc3",
        "label": "simplevm01",
 "relationships":[
  {
  "type":"different_node",
  "instances":[ "instance:{{df-simplevm02:name}}",
         "instance:{{df-simplevm03:name}}",
         "instance:{{df-simplevm04:name}}"
       ]
   
  }
 ...
   {
      "label": "df-simplevm02",
      "name": "/Compute-500019369/don.forbes@oracle.com/df-simple_vms/df-simplevm02",
      "template": {
        "name": "/Compute-500019369/don.forbes@oracle.com/df-simplevm02",
        "shape": "oc3",
        "label": "simplevm02",
 "relationships":[
  {
  "type":"different_node",
  "instances":[ "instance:{{df-simplevm03:name}}",
         "instance:{{df-simplevm04:name}}"
       ]
   
  }
],
...
 {
      "label": "df-simplevm03",
      "type": "Instance",
      "persistent":false,
      "name": "/Compute-500019369/don.forbes@oracle.com/df-simple_vms/df-simplevm03",
      "template": {
        "name": "/Compute-500019369/don.forbes@oracle.com/df-simplevm03",
        "shape": "oc3",
        "label": "simplevm03",
 "relationships":[
  {
  "type":"different_node",
  "instances":[ "instance:{{df-simplevm04:name}}"
       ]
   
  }
],
...
   {
      "label": "df-simplevm04",
      "type": "Instance",
      "persistent":false,
      "name": "/Compute-500019369/don.forbes@oracle.com/df-simple_vms/df-simplevm04",
      "template": {
        "name": "/Compute-500019369/don.forbes@oracle.com/df-simplevm04",
        "shape": "oc3",
        "label": "simplevm04",
...
 
So in this example the 4th VM has no relationships but all the others have a dependence on it and where it is placed.  As such it is the first located VM, then the 3rd VM gets placed on a different node then the second and finally the first.  All the relationships have used the format of an object reference using the label of the object:name to identify the specific instance.

In the public cloud there is notionally an infinite compute resource so a great many VMs can get placed onto different nodes.  In the cloud at customer model there are only so many "model 40" compute units that are subscribed to which puts a physical limit on the number of VMs that can be placed on different nodes.  In the example above there are 4 VMs and in a typlical OCC starter rack there are only 3 nodes so the obvious question is what is the behaviour in this scenario.  The answer is that the orchestration will enter a "transient_error" state as the fourth VM cannot be started on the rack and the orchestration will try to start up the VM on a regular basis.  The error is reported as:-


"cause": "error_state", 
"detail": "System object is in error state: Cannot satisfy both the placement and relationship requirements." 


So in a Cloud at Customer environment you should be aware of how many physical machines are in place and if a cluster size is larger than this then split the dependencies up accordingly.  e.g. With 6 VMs and only 3 nodes make 3 VMs have a different_node relationship to each other and the other three another different_node relationship.

same_node Relationship

Same node relationships are configured in exactly the same way as the different_node relationship is, the only difference being that in this situation the VMs are placed on the same physical node.&nbsp; Obviously using this approach you need to be aware of the physical limits of a single node.&nbsp; Currently the OCC is using an X6-2 server which has 40 cores and 496Gb of memory available for use.&nbsp; Clearly trying to put more VMs than will fit in this space will result in a similar failure to place the VM that occurred when trying to place on a different node.

No comments:

Post a Comment