-
Notifications
You must be signed in to change notification settings - Fork 0
Azure provider design implementation notes
This page describes the Azure provider.
Azure currently has two "stacks": Service Management ("classic"), and Resource Manager ("ARM"). We talk here mostly about the ARM stack, which is replacing the classic stack. Juju has support for the legacy/classic stack, but it is in maintenance-only mode, and any new environments will be ARM-based.
Azure has a concept of "resource groups", which are containers for IaaS resources: machines, networks, disks, etc. Each Juju environment -- including hosted -- is represented by a resource group. Resource groups must be named uniquely within the subscription: we use the naming scheme "juju--environment-".
Some resources are shared by environments, and these will be contained within the "controller resource group" -- the resource group that contains the bootstrap/controller environment. In particular, all machines managed by a controller will be connected to a single "internal" virtual network, and that virtual network (vnet) exists in the controller resource group. Each environment is given its own subnet, but the subnets also exist in the controller resource group; it is a restriction of the Azure network model that subnets and vnets must be co-located.
To destroy an environment we must delete the subnet associated with the environment, and delete the environment's resource group.
Each controller manages a single vnet for internal communications, managing the 10.0.0.0/8 prefix. Each environment is assigned 10.0.0.0/16, 10.1.0.0/16, etc., according to availability.
Each environment is also given its own network security group, which manages firewalls for the environment. There are 100 network security groups in a subscription by default, so there is a default limit of 100 environments. This limit can be raised by contacting Microsoft Azure support.
Each machine is created with a single NIC, attached to the internal subnet. Each NIC also has a public IP assigned. We will probably want to only assign public IPs to controllers by default, and defer assignment of public IPs to machines until they are exposed (and then delete when all ports are unexposed), because public IP addresses are limited (60 public IPs per subscription by default). This should at least be made configurable.
Each environment resource group contains a storage account in which virtual machine images are stored. Storage accounts default to using the "Standard LRS" (Locally Redundant Storage) account type, but this is configurable.
The Azure storage provider has support for volumes. In the future we may extend the storage provider to support Azure File Storage, which would enable shared file systems.
The Azure volume source is dynamic, environment-scoped, and manages persistent volumes. Each Juju volume represents a VHD in the "datavhds" blob container of the environment's storage account. A volume attachment represents a "data disk".
Each service deployed to an environment will create an "availability set" for that service. When a machine is created to host a unit of the service, the machine will join that availability set. Azure ensures that machines in an availability set are (a) not automatically rebooted at the same time (i.e. for infrastructure upgrades); and (b) allocated to redundant hardware, to avoid faults bringing down all service units simultaneously.
Availability sets are similar to "availability zones" in AWS and elsewhere, but dissimilar enough that they do not fit into Juju's abstraction of zones. In particular, charms cannot query what "zone" they are in on Azure.
Azure Resource Manager uses a different system for selecting OS images than the classic stack, and the simplestreams data Canonical publishes is not relevant to ARM. However, Azure provides its own registry for images, which Juju will use.
Images are published with four identifying attributes:
- Publisher (e.g. "Canonical")
- Offer (e.g. "UbuntuServer")
- SKU (e.g. "14.04.3-LTS")
- Version (e.g. "14.04.201510200", or "latest")
Because SKUs do not map directly to series, we must list the SKUs for a publisher+offer, and then choose the best one. We only do this for Ubuntu for now. We have hard-coded the publisher/offer names for Ubuntu Server and Microsoft Windows Server 2012; CentOS should be trivial to do, once we have support for initialising CentOS machines in Azure.
We currently query the image registry each time we create a machine, but this will be changed with the introduction of structured image metadata in state. We will change to having an Azure-specific data source that lists images in the registry; this will be periodically polled for updates, and fed into state. This data will then be presented to the machine provisioner, so it does not have to make the additional network query each time.
Instances naturally represent Virtual Machines in Azure, but there are additional resources for each instance. Each VM is given a single NIC with a static private IP and a dynamic public IP; later this will change with the introduction of extended networking support. Each VM may have zero or more network security rules associated with it.
Due to several restrictions, there are some peculiarities relating to the listing and deletion of instances that requires some explanation. To prevent leaking resources, the provider must continue to report instances until all of the associated resources are deleted: VM, NIC, public IP, etc. The most obvious thing to do would be to delete the VM last, but this, unfortunately, is not possible.
A VM must have at least one NIC attached; it is not possible to delete a NIC while it is attached to a VM. Thus the NICs must be deleted after the VM; at least one, and so it may as well be the case for all of them. When we delete an instance, we first delete the VM and then the remaining resources. We leave the NICs last, and tag NICs with the name (instance ID) of the machines they were created for, so that their presence indicates the presence of an instance in spite of there being no corresponding Virtual Machine.
Testing
Releases
Documentation
Development
- READ BEFORE CODING
- Blocking bugs process
- Bug fixes and patching
- Contributing
- Code Review Checklists
- Creating New Repos
-
MongoDB and Consistency
- [mgo/txn Example] (https://github.com/juju/juju/wiki/mgo-txn-example)
- Scripts
- Update Launchpad Dependency
- Writing workers
- Reviewboard Tips
Debugging and QA
- Debugging Juju
- [Faster LXD] (https://github.com/juju/juju/wiki/Faster-LXD)