Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dub/unify ansol #1

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
655252b
Update README.md
Oct 27, 2022
8a8a699
Update README.md
Oct 27, 2022
f0cebe7
Update README.md
Oct 29, 2022
d91a957
Update README.md
Oct 29, 2022
f0a8326
Update README.md
Oct 29, 2022
4b143b5
Update README.md
Oct 29, 2022
8c95929
Update README.md
Oct 29, 2022
0bb54e1
Update README.md
Oct 29, 2022
60c4bf4
Update README.md
Oct 29, 2022
76cc198
Restructure Ansible role
bitstan-earthyfrodo Oct 29, 2022
7abcd63
Update README.md
Nov 3, 2022
d8e2ea2
Merge pull request #1 from overclock-validator/overclock-validator-ad…
Nov 3, 2022
27f73b9
Update README.md
Nov 3, 2022
d8b3794
Update README.md
Nov 3, 2022
253e4f7
Update README.md
Nov 4, 2022
bec05c7
Update README.md
Nov 4, 2022
69816dd
Update snapshot-finder.py
Nov 5, 2022
11fa60c
Merge branch 'master' of https://github.com/bitstan-earthyfrodo/autoc…
vovkman Nov 24, 2022
6d9e8cf
Merge branch 'bitstan' into helius
vovkman Nov 24, 2022
d1deebf
fix merge conflicts
vovkman Nov 24, 2022
4af7194
Merge pull request #3 from overclock-validator/bitstan-earthyfrodo-ma…
Nov 24, 2022
13dbcca
Remove systuner and format README file
vovkman Nov 24, 2022
2eeff57
update tasks to work properly
vovkman Nov 26, 2022
3e2b300
update readme
vovkman Nov 26, 2022
ba460ed
add new files
vovkman Nov 26, 2022
8b7b8f5
update defaults
vovkman Nov 26, 2022
25d33ea
update docs
vovkman Nov 26, 2022
d2c73a2
remove genesysgo
vovkman Nov 26, 2022
88d55b4
update validator script
vovkman Nov 26, 2022
c546345
Merge pull request #4 from overclock-validator/fix-tasks
vovkman Nov 27, 2022
dca7de2
add configurable base path
vovkman Nov 27, 2022
3811948
update defaults
vovkman Nov 27, 2022
315448f
Merge pull request #5 from overclock-validator/base-path
vovkman Nov 27, 2022
6c16c72
merge ansol. start simple
Jan 14, 2023
a5f4bad
Update README.md
dubbelosix Jan 14, 2023
d08122e
fix
Jan 14, 2023
0cbf3c8
prepend v
Jan 14, 2023
b068d7e
some changes
Jan 14, 2023
c028424
some fstab
Jan 14, 2023
06f5ff6
some more changes
Jan 14, 2023
86576e2
download_start.sh
Jan 14, 2023
b555fa4
ansible options
Jan 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 64 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,81 +1,105 @@
# ansol
# Autoclock RPC

### machine setting
* this process works best on latitude machines
* because the initial state of the machine is cleaner
* disks are named consistently (nvme01, nvme0n2)
* ubuntu installed (preferably ubuntu 20.04, 22.04) - this won't work with centos etc since they don't use aptitude by default
* the login user being ubuntu helps (all the solana operations are done using the solana user that the ansible playbook creates)
* ubuntu should be in the sudoers list
* clean unmounted disks. if your root is on one of partitions and you pass it as an argument, this could potentially be disastrous.
### What is it good for?

* all the above are satisfied by a fresh latitude launch
The goal of the Autoclock RPC ansible playbook is to have you caught up on the Solana blockchain within 15 minutes, assuming you have a capable server and your SSH key ready. It formats/raids/mounts disks, sets up swap, ramdisk (optional), downloads from snapshot and restarts everything. It is currently configured for a Latitude.sh s3.large.x86 (see "Optimal Machine Settings" below), but we hope to adapt it more widely later on. For a more catch-all ansible playbook and in depth guide on RPC's refer to https://github.com/rpcpool/solana-rpc-ansible

* you can launch latitude machines here https://www.latitude.sh/pricing
* recommend the s3.large.x86 - it is one of the most performant nodes for staying at tip
### Optimal Machine Settings

* Specs
* 24 cores or more
* 512 GB ram if you want to use ramdisk/tmpfs and store the accounts db in RAM (we use 300 GB for ram disk). without tmpfs, the ram requirement can be significantly lower. ~256 GB
* 3-4 TB (multiple disks is ok i.e. 2x 1.9TB because the ansible playbook stripes them together)
- Our Latitude.sh s3.large.x86 server starts with the settings below, which we prefer because:

### step 1: ssh into your machine
- the initial state of the machine is cleaner than others that we have tried
- disks are named consistently (nvme01, nvme0n2)
- ubuntu installed (preferably ubuntu 20.04, 22.04) - this won't work with centos, etc. since they don't use aptitude by default
- the login user being ubuntu helps (all the solana operations are done using the solana user that the ansible playbook creates)
- ubuntu is in the sudoer's list
- unmounted disks are clean - if your root is on one of partitions and you pass it as an argument, this could be disastrous

- All the above are satisfied by a fresh s3.large.x86 launch found here: https://www.latitude.sh/pricing
- Zen3 AMD Epyc’s such as the 7443p are considered some of the most performant nodes for keeping up with the tip of the chain at the moment, and support large amounts of RAM.

- Recommended RPC Specs
- 24 cores or more
- 512 GB RAM if you want to use ramdisk/tmpfs and store the accounts db in RAM (we use 300 GB for ram disk). without tmpfs, the ram requirement can be significantly lower (~256 GB)
- 3-4 TB (multiple disks is okay - i.e. 2x 1.9TB - because the ansible playbook stripes them together)

### Step 1: SSH into your machine

### Step 2: Start a screen session

### step 2: start a screen session
```
screen -S sol
```

### step 3: install ansible
### Step 3: Install ansible

```
sudo apt-get install ansible -y
sudo apt-get update && sudo apt-get install ansible -y
```

### step 4: clone the anssol repository
### Step 4: Clone the autoclock-rpc repository

```
git clone https://github.com/dubbelosix/ansol.git
git clone https://github.com/overclock-validator/autoclock-rpc.git
```

### step 5: cd into the ansol folder
### Step 5: cd into the autoclock-rpc folder

```
cd ansol
cd autoclock-rpc
```

### step 6: run the ansible command
* this command can take between 10-20 minutes based on the specs of the machine
* it takes long because it does everything necessary to start the validator (format disks, checkout the solana repo and build, download the latest snapshot etc)
### Step 6: Run the ansible command

- this command can take between 10-20 minutes based on the specs of the machine
- it takes long because it does everything necessary to start the validator (format disks, checkout the solana repo and build it, download the latest snapshot, etc.)
- make sure that the solana_version is up to date (see below)
- check the values set in `defaults/main.yml` and update to the values you want

```
time ansible-playbook runner.yaml --extra-vars='{"solana_version": "v1.13.4", "swap_mb":100000,"raw_disk_list":["/dev/nvme0n1","/dev/nvme1n1"],"setup_disks":true,"download_snapshot":true,"ramdisk_size":300}'
time ansible-playbook runner.yaml
```

#### params explained
* solana_version: which version of solana do we want to run
* swap_mb: megabytes of swap. can set this to 50% of RAM or even lower. 100 GB is fine on a 512 GB RAM machine (variable value is in MB so 100000)
* raw_disk_list: the list of currently unmounted disks that will be wiped, raided, formatted with ext4 and then mounted to /mnt
* ramdisk_size: this is optional and only necessary if you want to use ramdisk for the validator - carves out a large portion of the RAM to store the accountsdb. On a 512 GB RAM instance, this can be set to 300 GB (variable value is in GB so 300)
#### ~ Parameters explained ~

- solana_version: the version of solana that we want to run. Check the Solana Tech discord’s mb-announcements channel for the recommended version.
- swap_mb: megabytes of swap. This can be set this to 50% of RAM or even lower. 100 GB is fine on a 512 GB RAM machine (variable value is in MB so 100000)
- raw_disk_list: the list of currently unmounted disks that will be wiped, raided, formatted with ext4 and then mounted to /mnt
- ramdisk_size: this is optional and only necessary if you want to use ramdisk for the validator - carves out a large portion of the RAM to store the accountsdb. On a 512 GB RAM instance, this can be set to 300 GB (variable value is in GB so 300)
- solana_installer: whether to install solana from the installer. If set to false it will build solana cli from the solana github

### Step 7: Once ansible finishes, switch to the solana user with:

### step 7: after ansible finishes
switch to the solana user with
```
sudo su - solana
```
### step 8: and check the validator status with

### Step 8: Check the status

```
/mnt/solana/target/release/solana-validator --ledger /mnt/solana-ledger monitor
source ~/.profile
solana-validator --ledger /mnt/solana-ledger monitor
ledger monitor
Ledger location: /mnt/solana-ledger
⠉ Validator startup: SearchingForRpcService...
```

### Initially the monitor should just show. this will last for a few minutes and is normal
#### Initially the monitor should just show the below message which will last for a few minutes and is normal:

```
⠉ Validator startup: SearchingForRpcService...
```
### after a while, the message at the terminal should change to

#### After a while, the message at the terminal should change to something similar to this:

```
⠐ 00:08:26 | Processed Slot: 156831951 | Confirmed Slot: 156831951 | Finalized Slot: 156831917 | Full Snapshot Slot: 156813730 |
```

If you see the message above, then everything is working fine! Gratz. you have a new RPC server and you can visit the URL at http://xx.xx.xx.xx:8899/
#### Check whether the RPC is caught up with the rest of the cluster with:

```
solana catchup --our-localhost
```

If you see the message above, then everything is working fine! Gratz. You have a new RPC server and you can visit the URL at http://xx.xx.xx.xx:8899/
9 changes: 9 additions & 0 deletions defaults/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
# defaults file for Solana RPC
solana_version: "v1.13.5"
raw_disk_list: ["/dev/nvme1n1", "/dev/nvme2n1"]
setup_disks: "true"
download_snapshot: "true"
ramdisk_size: 300
swap_mb: "100000"

4 changes: 4 additions & 0 deletions files/download_start.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
sudo systemctl stop sol.service
python3 /mnt/snapshot-finder.py --snapshot_path /mnt/solana-snapshots
sudo systemctl start sol.service
4 changes: 2 additions & 2 deletions montip.py → files/montip.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import json
import time

GENESYSGO = "https://ssc-dao.genesysgo.net/"
MAINNET = "https://api.mainnet-beta.solana.com"
LOCAL = "http://localhost:8899"
PAYLOAD = {"jsonrpc":"2.0","id":1, "method":"getSlot", "params":[{"commitment":"processed"}]}

Expand All @@ -22,7 +22,7 @@ def get_slot(req, jsondata, result, idx):
tlist = []
reqlist = []
resultlist = [0,0]
for c,u in enumerate([LOCAL, GENESYSGO]):
for c,u in enumerate([LOCAL, MAINNET]):
req = urllib.request.Request(u)
req.add_header('Content-Type', 'application/json; charset=utf-8')
req.add_header('Content-Length', content_len)
Expand Down
7 changes: 7 additions & 0 deletions files/restart.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash
if [ $# -eq 0 ]
then
python3 /mnt/snapcheck.py
fi
sudo systemctl stop sol.service
sudo systemctl start sol.service
File renamed without changes.
25 changes: 12 additions & 13 deletions snapshot-finder.py → files/snapshot-finder.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
parser.add_argument('--max_download_speed', type=int,
help='Maximum snapshot download speed in megabytes - https://github.com/c29r3/solana-snapshot-finder/issues/11. Example: --max_download_speed 192')
parser.add_argument('--max_latency', default=40, type=int, help='The maximum value of latency (milliseconds). If latency > max_latency --> skip')
parser.add_argument('--version', type=str, help='version of the snapshot required')
# parser.add_argument('--version', type=str, help='version of the snapshot required')
parser.add_argument('--with_private_rpc', action="store_true", help='Enable adding and checking RPCs with the --private-rpc option.This slow down checking and searching but potentially increases'
' the number of RPCs from which snapshots can be downloaded.')
parser.add_argument('--measurement_time', default=7, type=int, help='Time in seconds during which the script will measure the download speed')
Expand All @@ -52,7 +52,7 @@
MAX_DOWNLOAD_SPEED_MB = args.max_download_speed
SPEED_MEASURE_TIME_SEC = args.measurement_time
MAX_LATENCY = args.max_latency
VERSION = args.version
# VERSION = args.version
SNAPSHOT_PATH = args.snapshot_path if args.snapshot_path[-1] != '/' else args.snapshot_path[:-1]
NUM_OF_MAX_ATTEMPTS = args.num_of_retries
SLEEP_BEFORE_RETRY = args.sleep
Expand Down Expand Up @@ -403,17 +403,16 @@ def main_worker():
if rpc_node["snapshot_address"] in unsuitable_servers:
logger.info(f'Rpc node already in unsuitable list --> skip {rpc_node["snapshot_address"]}')
continue

logger.info("checking version")
try:
if not version_check(rpc_node["snapshot_address"], VERSION):
logger.info("version check failed")
continue
except:
import traceback
print(traceback.format_exc())
print("broken")
logger.info("version check succeeded")
# logger.info("checking version")
# try:
# if not version_check(rpc_node["snapshot_address"], VERSION):
# logger.info("version check failed")
# continue
# except:
# import traceback
# print(traceback.format_exc())
# print("broken")
# logger.info("version check succeeded")
down_speed_bytes = measure_speed(url=rpc_node["snapshot_address"], measure_time=SPEED_MEASURE_TIME_SEC)
down_speed_mb = convert_size(down_speed_bytes)
if down_speed_bytes < MIN_DOWNLOAD_SPEED_MB * 1e6:
Expand Down
3 changes: 1 addition & 2 deletions sol.service → files/sol.service
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,8 @@ RestartSec=1
User=solana
LimitNOFILE=1000000
LogRateLimitIntervalSec=0
Environment="PATH=/mnt/solana/target/release/:/mnt/solana/target/release/:/home/solana/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/snap/bin:/home/solana/.local/bin/:/home/solana/.local/bin/:/home/solana/.local/bin/"
Environment="PATH=/mnt/solana/target/release/:/home/solana/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/snap/bin:/home/solana/.local/bin/:/home/solana/.local/bin/:/home/solana/.local/bin/"
ExecStart=/home/solana/validator.sh

[Install]
WantedBy=multi-user.target

16 changes: 16 additions & 0 deletions files/validator.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash
export SOLANA_METRICS_CONFIG="host=https://metrics.solana.com:8086,db=mainnet-beta,u=mainnet-beta_write,p=password"
exec /mnt/solana/target/release/solana-validator --identity /home/solana/rpc_node.json \
--full-rpc-api \
--rpc-port 8899 \
--entrypoint entrypoint.mainnet-beta.solana.com:8001 \
--limit-ledger-size \
--log /mnt/logs/solana-validator.log \
--accounts /mnt/solana-accounts \
--ledger /mnt/solana-ledger \
--snapshots /mnt/solana-snapshots \
--no-snapshot-fetch \
--known-validator 7Np41oeYqPefeNQEHSv1UDhYrehxin3NStELsSKCT4K2 \
--known-validator GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ \
--known-validator DE1bawNcRJB9rVm3buyMVfr8mBEoyyu73NBovf2oXJsJ \
--known-validator CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S
7 changes: 0 additions & 7 deletions fstab.yaml

This file was deleted.

9 changes: 0 additions & 9 deletions restart.sh

This file was deleted.

62 changes: 5 additions & 57 deletions runner.yaml
Original file line number Diff line number Diff line change
@@ -1,58 +1,6 @@
---
- name: "playbook runner"
hosts: localhost
connection: local
tasks:

- name: var check
ansible.builtin.import_tasks: var_check.yaml

- name: setup disks
ansible.builtin.import_tasks: disks.yaml
when: setup_disks|default(false)|bool == true

- name: create user
ansible.builtin.import_tasks: user.yaml

- name: install dependencies
ansible.builtin.import_tasks: deps.yaml

- name: folders
ansible.builtin.import_tasks: dirs.yaml

- name: swap
ansible.builtin.import_tasks: swap.yaml

- name: ramdisk
ansible.builtin.import_tasks: ramdisk.yaml
when: ramdisk_size is defined

- name: logrotate
ansible.builtin.import_tasks: rotate.yaml

- name: git
ansible.builtin.import_tasks: git.yaml

- name: solana keygen
ansible.builtin.import_tasks: keygen.yaml

- name: file setup
ansible.builtin.import_tasks: file_setup.yaml

- name: snapshot download
ansible.builtin.import_tasks: snapshot_downloader.yaml
when: download_snapshot|default(true)|bool == true

- name: restart without waiting
become: true
become_user: root
shell: /home/solana/restart.sh 1
when: download_snapshot|default(true)|bool == true

- name: restart with waiting
become: true
become_user: root
shell: /home/solana/restart.sh
when: download_snapshot|default(true)|bool == false


- name: "playbook runner"
hosts: localhost
connection: local
roles:
- role: "./"
2 changes: 1 addition & 1 deletion deps.yaml → tasks/deps.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,4 @@
when: cargo_exists is failed
shell: /tmp/sh.rustup.rs -y
args:
executable: /bin/bash
executable: /bin/bash
2 changes: 1 addition & 1 deletion dirs.yaml → tasks/dirs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
state: directory
owner: solana
group: solana
mode: '0777'
mode: "0777"
become: true
become_user: root

Expand Down
19 changes: 16 additions & 3 deletions disks.yaml → tasks/disks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@
apt:
update_cache: yes
pkg:
- mdadm
- mdadm

- name: check raid already exists
become: true
become_user: root
shell: fdisk -l | grep -e 'md0' -e 'md127'
shell: fdisk -l | grep -e 'md'
ignore_errors: yes
register: raid_exists

Expand Down Expand Up @@ -52,4 +52,17 @@
become_user: root
shell: mount /dev/{{ raid_name.stdout }} /mnt
when: mount_mnt.rc != 0


- name: extract raid params
shell: sudo blkid | grep md | grep -oh \".*\" | sed -e 's/"//g' | awk '{print "UUID="$1" /mnt ext4 defaults 1 1"}'
register: raid_config

- name: add raid to fstab
become: true
become_user: root
lineinfile:
dest: /etc/fstab
state: present
line: "{{ raid_config.stdout_lines[0] }}"
insertbefore: 'mnt'
firstmatch: yes
Loading