Manage Servers Design

Managing Remote Servers Design

Big open issue: maybe generalize to managing all kinds of groups of resources?

Nothing more than standard SSH must be required
- Maybe upload required parts automatically when a command needs to run

Smart displaying of results, such as "OK on all T servers", "OK on N servers, fail on M servers, out of total T servers"

Automatic server groups by
- (In a cloud) a security group, a tag or other properties (regex on name for example)
- (Configuration management: Chef/Puppet) by properties
- (Locally) by /etc/hosts remark or by .ssh/config properties or remarks
- by naming conventions (for example regex defined) for all cases above
- Dynamic by a command output/exit code Think netstat -lpnt | grep -q :8000 or pgrep java or dpkg -l '*apache*' >/dev/null
Smart handling of failures, maybe divide into groups depending on command status / output and then letting to manage these groups. Consider dividing to several "fail" groups depending on the fail mode. Think deploy script that should handle the conditions. Also make one large group for any failures (contains all fail sub-groups).
Hosts group that is automatically updated should show last update time
- ... and updates log, when machines were added/removed
- "Current" hosts group to execute commands on.
- Whenever group is formed, connectivity must be checked and problems notified
- Each host should have statuses, such as pending for EC2 machines (in the shell can be pending till SSH connection is ready)
- Have group history (snapshots of list of hosts in the given group)
- When running a command on a group of hosts, run on one first and then rolling as default behaviour. Maybe stop at certain error rate.

Allow to run commands on remote hosts and connect them with pipes
- Example: @web_servers { cat /var/log/messages } | @management_server { grep MY_EVENT | sort > /tmp/MY_EVENT } That's just for the sake of an example, it would probably be better to grep locally.
- Issue warning if the output of cat can not be pushed or pulled directly between the machines (and therefore will be transferred through the controlling host, where the shell runs)
  - Have shortcut key to setup SSH access required for direct transfer
  - Or.. run temporary SSH daemons to allow this?
- Provide meaningful progress on that, including ETA. This won't be easy but it's worth it.
- Provide processing speeds inspection, CPU graphs, network usage, including graphs. It can be helpful to identify and show slow machines. If it's a cluster,the performance should be similar. If not, it can inidcate a problem.
For remote host or hosts group, give an option to execute command(s) as soon as the host is available. Notify when done, failed or timed out.
Define which commands will run where when using hosts group. Think ec2... on a group of machines which include all ec2 machines: "management" machine, web, app, etc. servers.
Hosts group will be ordered. When running commands, one could specify to run in order or async.
- When commands run in order there should be an option to stop on first fail.

TODO: Overall description

TODO: List of components. For each: description of the role, API, etc.

NGS official website is at https://ngs-lang.org/