Skip to content

Commit

Permalink
Add linux kernel tuning sysctl, huges pages, disable THP
Browse files Browse the repository at this point in the history
  • Loading branch information
anayrat committed Oct 7, 2023
1 parent 1be4014 commit 8609579
Show file tree
Hide file tree
Showing 6 changed files with 94 additions and 11 deletions.
2 changes: 2 additions & 0 deletions roles/common/files/systemd-tmpfiles.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
w /sys/kernel/mm/transparent_hugepage/enabled - - - - never
w /sys/kernel/mm/transparent_hugepage/defrag - - - - never
2 changes: 2 additions & 0 deletions roles/common/handlers/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@

- include: sysstat.yml

- include: sysctl.yml

- name: restart systemd-hostnamed
systemd:
daemon_reload: yes
Expand Down
2 changes: 2 additions & 0 deletions roles/common/handlers/sysctl.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- name: systemd-tmpfiles create
command: systemd-tmpfiles --create
13 changes: 4 additions & 9 deletions roles/common/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -670,14 +670,6 @@
notify:
- generate locales

# necessary to run a lot of containers, each which systemd launching several inotify
- name: increase fs.inotify.max_user_instances on host
sysctl:
name: fs.inotify.max_user_instances
value: 1024
sysctl_file: /etc/sysctl.d/ansible.conf
when: "not 'vm' in group_names"

# configure lxfs so that VMs get their own load-average
- name: create systemd override directory for lxcfs
file:
Expand Down Expand Up @@ -709,7 +701,10 @@
- include: munin-node.yml

- include: sysstat.yml
when: "'proxmox' in group_names"

- include: sysctl.yml
when: "'proxmox' in group_names"

- include: ntp.yml
when: "not 'vm' in group_names"

81 changes: 81 additions & 0 deletions roles/common/tasks/sysctl.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# necessary to run a lot of containers, each which systemd launching several inotify
- name: increase fs.inotify.max_user_instances on host
sysctl:
name: fs.inotify.max_user_instances
value: 1024
sysctl_file: /etc/sysctl.d/ansible.conf

- name: Reduce swappiness to 1
sysctl:
name: vm.swappiness
value: 1
sysctl_file: /etc/sysctl.d/ansible.conf

# https://forum.proxmox.com/threads/increase-performance-with-sched_autogroup_enabled-0.41729/
# https://www.postgresql.org/message-id/[email protected]
#
# * sched_migration_cost
#
# The migration cost is the total time the scheduler will consider a
# migrated process "cache hot" and thus less likely to be re-migrated. By
# default, this is 0.5ms (500000 ns), and as the size of the process table
# increases, eventually causes the scheduler to break down. On our
# systems, after a smooth degradation with increasing connection count,
# system CPU spiked from 20 to 70% sustained and TPS was cut by 5-10x once
# we crossed some invisible connection count threshold. For us, that was a
# pgbench with 900 or more clients.
#
# The migration cost should be increased, almost universally on server
# systems with many processes. This means systems like PostgreSQL or
# Apache would benefit from having higher migration costs. We've had good
# luck with a setting of 5ms (5000000 ns) instead.
#
# When the breakdown occurs, system CPU (as obtained from sar) increases
# from 20% on a heavy pgbench (scale 3500 on a 72GB system) to over 70%,
# and %nice/%user is cut by half or more. A higher migration cost
# essentially eliminates this artificial throttle.
#
#
# * sched_autogroup_enabled
#
# This is a relatively new patch which Linus lauded back in late 2010. It
# basically groups tasks by TTY so perceived responsiveness is improved.
# But on server systems, large daemons like PostgreSQL are going to be
# launched from the same pseudo-TTY, and be effectively choked out of CPU
# cycles in favor of less important tasks.
#
# The default setting is 1 (enabled) on some platforms. By setting this to
# 0 (disabled), we saw an outright 30% performance boost on the same
# pgbench test. A fully cached scale 3500 database on a 72GB system went
# from 67k TPS to 82k TPS with 900 client connections.

- name: Set kernel.sched_autogroup_enabled to 0
sysctl:
name: kernel.sched_autogroup_enabled
value: 0
sysctl_file: /etc/sysctl.d/ansible.conf

- name: Set kernel.sched_migration_cost_ns to 5000000
sysctl:
name: kernel.sched_migration_cost_ns
value: 5000000
sysctl_file: /etc/sysctl.d/ansible.conf

# We use systemd-tmpfiles mechanism to write in pseudo filesystem
# https://sleeplessbeastie.eu/2022/11/18/how-to-create-persistent-sysfs-configuration-using-systemd/
# https://wiki.archlinux.org/title/Systemd#systemd-tmpfiles_-_temporary_files
- name: Disable Transparent Huge Pages
copy:
src: 'systemd-tmpfiles.conf'
dest: '/etc/tmpfiles.d/thp.conf'
notify:
- systemd-tmpfiles create

# La mémoire n'est pas allouée/réservée. Le kernel essaiera d'allouer les hugepages si c'est possible, sinon tant pis.
# Ca marche bien au démarrage. Une fois que le serveur tourne et que la mémoire est utilisée pour le cache ou est fragmentée,
# il aura plus de mal à trouver des blocs consécutifs.
- name: Allow 2MB huge pages up to 60% of the RAM
sysctl:
name: vm.nr_overcommit_hugepages
value: "{{ ( ansible_memtotal_mb * 0.6 / 2)|int }}"
sysctl_file: /etc/sysctl.d/ansible.conf
5 changes: 3 additions & 2 deletions roles/common/tasks/sysstat.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
- name: install packages for sysstat
- name: install packages for sysstat and atop
apt: pkg={{ item }} update_cache=yes
with_items:
- sysstat
- xz
- xz-utils
- atop
when: ansible_distribution == 'Debian' or ansible_distribution == 'Ubuntu'

- name: Enable sysstat
Expand Down

0 comments on commit 8609579

Please sign in to comment.