Login nodes of Discoverer HPC clusters¶
This document summarises the login nodes for each Discoverer HPC cluster, their network endpoints, and the access policy (including VPN requirements). It also describes the per-user resource limits on the login nodes (for users and administrators).
Table of contents
List of login nodes (per cluster)¶
| Cluster | Login node host name | SSH port | VPN required | Connectivity | Notes |
|---|---|---|---|---|---|
| Discoverer (CPU) | login.discoverer.bg |
22 | Yes | IPv4 | Access only through VPN to the Discoverer HPC internal network (see below). |
login.bg.discoverer.bg |
2222 | No | IPv4 (BG only) | Alternative entry point for users on Bulgarian academic/research IP networks; no VPN needed. See Logging Into login.bg.discoverer.bg (CPU cluster, from BG Academic IP Networks). | |
| Discoverer+ (GPU) | login-plus.discoverer.bg |
22 | No | IPv4/IPv6 | Direct access from the Internet; VPN is not required. |
Note
The only user authentication mechanism supported on the login nodes is key-based SSH (pre-registered OpenSSH public keys). SSH password-based authentication is not supported. See SSH Access.
IP addresses (reference)¶
If your DNS service does not resolve the host names of the login nodes, you can connect via SSH using the IP addresses below (with the same port numbers as for the host names). Prefer host names when DNS resolution works. When using an IP address, still verify the server key fingerprint before accepting the connection; see login.discoverer.bg and login.bg.discoverer.bg - Key Fingeprints and login-plus.discoverer.bg - Key Fingeprints.
| Host name | IP address |
|---|---|
login.discoverer.bg |
10.101.1.1 (IPv4, internal; reachable only via VPN) |
login.bg.discoverer.bg |
82.119.91.4 (IPv4, port 2222; reachable from Bulgarian academic/research IP networks only, no VPN required) |
login-plus.discoverer.bg |
82.119.91.189 (IPv4); 2001:678:924:dddd::2 (IPv6); reachable from the Internet, no VPN required |
Addresses may be updated by the site; when DNS works, use the host names above for SSH.
Access policy: login.discoverer.bg and VPN¶
The CPU cluster has two SSH entry points:
login.discoverer.bg(port 22) is on the Discoverer HPC internal network and is accessible only through a VPN tunnel to that network. See VPN Access to the SSH server on the CPU cluster login node for VPN setup (Linux, macOS, Windows), then Logging Into login.discoverer.bg (CPU cluster) to connect.login.bg.discoverer.bg(port 2222) is the entry point for users whose client is on a Bulgarian academic or research institution IP network; no VPN is needed. See Logging Into login.bg.discoverer.bg (CPU cluster, from BG Academic IP Networks).
Access policy: login-plus.discoverer.bg (GPU cluster)¶
The GPU cluster login node login-plus.discoverer.bg is reachable from the Internet, no VPN is required. Connect with SSH on port 22. See Logging Into login-plus.discoverer.bg (GPU cluster) for step-by-step instructions.
How to connect (quick reference)¶
- Discoverer (CPU), general: Use VPN, then Logging Into login.discoverer.bg (CPU cluster) (e.g.
ssh login.discoverer.bg -l username). - Discoverer (CPU), Bulgarian academic network: Use Logging Into login.bg.discoverer.bg (CPU cluster, from BG Academic IP Networks) (no VPN required, e.g.
ssh login.bg.discoverer.bg -p 2222 -l username). - Discoverer+ (GPU): Logging Into login-plus.discoverer.bg (GPU cluster) (e.g.
ssh login-plus.discoverer.bg -l username).
Per-user resources on the login nodes¶
limits are enforced so that no single user can consume the whole node. The following limits apply to your session:
- CPU: 400% (two physical cores on SMT/hyperthreaded systems; 100% = one hardware thread, 200% = one physical core).
- Memory: 4G (usage above this is throttled).
- Process count: up to 5000 tasks (processes) per user.
- Block I/O: 10 MB/s read and write on the login node’s local disk (if applicable).
- Concurrent logins: up to 4 simultaneous SSH (and SCP/SFTP) sessions per user. Additional login attempts are refused until a session is closed.
These limits apply to interactive use on the login node. To run larger workloads, submit jobs to the compute nodes via Slurm; see Running jobs or Where and how to compile code. Details for administrators (how limits are configured, how to change them, and how to verify them) are in the section below.
How per-user resources are set on the login nodes¶
Note
For advanced users and administrators.
This section describes how per-user resource limits on the login nodes are implemented (systemd slice drop-ins and cgroups), how to deploy or change them, and how to verify them.
Introduction¶
This document describes the use of systemd slice drop-ins and cgroups to
enforce per-user limits on CPU, memory, process count, and block I/O on
cluster login nodes. Limits are applied to each user’s systemd slice
(user-<UID>.slice) and take effect when the user has an active
session (e.g. SSH, user@<UID>.service).
Linux cgroups (control groups) are a kernel feature that limits and
accounts for resource use (CPU, memory, I/O, number of tasks) of a set
of processes. systemd organises units (slices, services, scopes) into a
cgroup hierarchy and applies resource-control settings from unit files.
Two cgroup versions exist: cgroup v1 uses separate controller
hierarchies (each controller such as cpu, memory, blkio has its own
mount and tree); cgroup v2 uses a single unified hierarchy where all
controllers are attached to one tree. The same systemd configuration
works with both; only the underlying kernel paths and interface files
differ (e.g. v1 uses cpu.cfs_quota_us in the cpu controller; v2 uses
cpu.max in the unified tree). The Verification section below describes
how to tell which version a system uses and how to verify limits on each.
At Discoverer: the login node of the Discoverer CPU cluster runs RHEL 8 with cgroup v1. The login node of the Discoverer+ GPU cluster uses cgroup v2. When deploying or verifying limits, use the paths and steps that match the cgroup version of the node.
The implementation supersedes an earlier approach that set pids.max
directly under /sys/fs/cgroup/pids/. The process limit is now
expressed as TasksMax in the slice unit; no separate pids cgroup
script is required.
Resource limits¶
The following limits are configurable per user slice:
- CPU:
CPUQuota(default 400%, i.e. two physical cores per user on SMT/hyperthreaded systems). The quota is per logical CPU; 100% = one hardware thread, 200% = one full physical core, 400% = two full physical cores. - Memory:
MemoryHigh(default 4G); memory use above this value is throttled. - Process count:
TasksMax(default 5000). - Block I/O:
IOReadBandwidthMaxandIOWriteBandwidthMaxon selected block devices (default 10M, 10 MB/s).
Limits are applied via a drop-in file under
/etc/systemd/system/user-<UID>.slice.d/. When the slice is created
(e.g. at login), systemd applies these settings to the cgroup hierarchy.
If you run the apply script when users already have open sessions, the
script applies CPU, memory and task limits to those existing sessions
immediately (via systemctl set-property --runtime); I/O limits for
already-logged-in users take effect at their next login. For
CPUQuota to be enforced, systemd must have CPU accounting enabled
(see Prerequisites below).
Prerequisites¶
CPU quota: For
CPUQuotato be enforced, systemd must have default CPU accounting enabled. Add or uncomment in/etc/systemd/system.conf:DefaultCPUAccounting=yes
Then run
systemctl daemon-reexec(or reboot). Without this, CPU limits in the drop-ins are ignored. The apply script checks this and exits with instructions if it is not set.The method uses systemd unit files and works with both cgroup v1 and v2; systemd applies the limits to the appropriate cgroup controllers. On cgroup v1, controllers are mounted separately (e.g.
cpu,cpuacct,memory,blkio,pids); on v2, a single unified hierarchy is used.TasksMaxis enforced by systemd and does not require writing to/sys/fs/cgroup/pids/directly.The apply and revert scripts must be executed with root privileges.
I/O limits apply only to local block devices. The variable
LOGIN_LIMITS_IO_TARGETSmay list block device paths and/or mount points (default:/dev/nvme0n1p3). Mount points may be included if they resolve to a local block device; only a result that is a path under/dev/and refers to a block device is used. NFS and Lustre mounts do not resolve to a local block device and are therefore skipped. Examples:findmnt -n -o SOURCE --target /homeyields e.g.10.101.0.247:/home_nfs(NFS); not under/dev/, so skipped.findmnt -n -o SOURCE --target /valhallayields e.g.10.106.13.3@o2ib,...:/cstor1(Lustre); not under/dev/, so skipped./dev/nvme0n1p3is a block device; the limit is applied. The cgroup I/O controller throttles only I/O to local block devices. I/O to NFS or Lustre is not attributed to a local device and cannot be limited by this mechanism on the login node. To inspect the source of a mount:findmnt -n -o SOURCE --target <path>orlsblk.
Default configuration¶
The script uses the following defaults unless overridden by environment variables (see Deployment). The UID range has no default and must be set to cover the POSIX UIDs of the HPC users; it should be updated as the user base grows. Whenever a new user is added to the system, the apply script must be run again (and the UID range extended if the new user’s UID falls outside the current range) so that a drop-in is created for that user and the resource limits apply to them.
| Parameter | Default | Environment variable |
|---|---|---|
| UID range | (required) | LOGIN_LIMITS_UID_START,
LOGIN_LIMITS_UID_END |
| CPUQuota | 400% | LOGIN_LIMITS_CPU_QUOTA |
| MemoryHigh | 4G | LOGIN_LIMITS_MEMORY_HIGH |
| TasksMax | 5000 | LOGIN_LIMITS_TASKS_MAX |
| I/O read/write | 10M | LOGIN_LIMITS_IO_READ,
LOGIN_LIMITS_IO_WRITE |
| I/O targets | /dev/n
vme0n1p3 |
LOGIN_LIMITS_IO_TARGETS |
CPUQuota is per logical CPU: 100% = one hardware thread, 200% = one physical core, 400% = two physical cores (default).
The default 10M (10 MB/s) is sufficient because the login-node local
disk (/dev/nvme0n1p3) is not used for user jobs; limits may be
adjusted to match site policy.
Deployment¶
Apply script¶
The script apply-login-user-limits.sh:
- Iterates over UIDs in the configured range (set via
LOGIN_LIMITS_UID_STARTandLOGIN_LIMITS_UID_END; there is no default); for each UID present on the system (as determined byid), creates/etc/systemd/system/user-<UID>.slice.d/50-login-limits.confwith the configured[Slice]settings. - Resolves
LOGIN_LIMITS_IO_TARGETSto a list of local block devices and addsIOReadBandwidthMaxandIOWriteBandwidthMaxlines for each. - Runs
systemctl daemon-reload. - For UIDs that have an active login session, the script applies CPU,
memory and task limits to the already-running slice with
systemctl set-property --runtime, so those limits take effect in the current session without logging out. I/O limits for existing sessions take effect at next login; the drop-in ensures all limits apply for every future login. The script does not startuser@for every UID in the range, so it does not spawn hundreds ofsystemd --userprocesses (e.g. on RHEL 9 with cgroup v2).
Run the apply script after adding new users to the system so that
drop-ins are created for their UIDs and the resource limits apply to
them. If the new user’s UID is outside the current range, set
LOGIN_LIMITS_UID_START and/or LOGIN_LIMITS_UID_END to include it
(or extend the range and re-run).
Installation and execution (as root):
cp apply-login-user-limits.sh /usr/local/sbin/
chmod +x /usr/local/sbin/apply-login-user-limits.sh
/usr/local/sbin/apply-login-user-limits.sh
To override defaults, set the environment variables before running the script. The UID range must always be set; it can be passed as environment variables or as arguments:
# UID range as arguments (no export needed)
/usr/local/sbin/apply-login-user-limits.sh LOGIN_LIMITS_UID_START=5000 LOGIN_LIMITS_UID_END=5500
# Or set env and optionally other limits
export LOGIN_LIMITS_UID_START=5000
export LOGIN_LIMITS_UID_END=5500
export LOGIN_LIMITS_IO_TARGETS="/dev/nvme0n1p3"
export LOGIN_LIMITS_CPU_QUOTA=400%
export LOGIN_LIMITS_MEMORY_HIGH=4G
export LOGIN_LIMITS_TASKS_MAX=5000
export LOGIN_LIMITS_IO_READ=10M
export LOGIN_LIMITS_IO_WRITE=10M
/usr/local/sbin/apply-login-user-limits.sh
To disable I/O limits, set LOGIN_LIMITS_IO_TARGETS="".
New sessions receive the limits when their slice is created. Existing sessions retain the previous limits until the user logs out or the slice is restarted.
Revert script¶
The script revert-login-user-limits.sh removes the drop-in files and
runs systemctl daemon-reload. By default it also stops
user@<UID>.service for each UID in the range that has no active
login session; this cleans up “ghost” systemd –user processes (e.g. if
an older apply script had started user@ for every UID). Logged-in users
are not affected. To only remove drop-ins without stopping any user@
services, set LOGIN_LIMITS_SKIP_STOP_IDLE_USER_SERVICES=1. The same
UID range as used for the apply script must be set via
LOGIN_LIMITS_UID_START and LOGIN_LIMITS_UID_END (no default).
Installation and execution (as root):
cp revert-login-user-limits.sh /usr/local/sbin/
chmod +x /usr/local/sbin/revert-login-user-limits.sh
export LOGIN_LIMITS_UID_START=5000
export LOGIN_LIMITS_UID_END=5500
/usr/local/sbin/revert-login-user-limits.sh
Manual removal (replace the UID range with the range used when applying limits):
for i in $(seq 5000 5500); do
rm -f /etc/systemd/system/user-${i}.slice.d/50-login-limits.conf
rmdir /etc/systemd/system/user-${i}.slice.d 2>/dev/null || true
done
systemctl daemon-reload
After reverting, existing user sessions continue under the old limits until they end; new sessions have no limits from this mechanism.
Persistence across reboots¶
The limit definitions are stored in drop-in files under
/etc/systemd/system/. These files persist across reboots. At boot,
systemd loads them and applies the limits when each user’s slice is
created (e.g. at login). It is not necessary to re-run the apply script
after rebooting the login node; the limits remain in effect. Re-run the
apply script only when changing the limits or when extending the UID
range to include new HPC users.
Verification¶
- Confirm the drop-in was written
cat /etc/systemd/system/user-<UID>.slice.d/50-login-limits.conf
You should see CPUQuota=..., MemoryHigh=..., TasksMax=....
Replace <UID> with the numeric UID (e.g. 5032).
- Confirm CPU accounting is enabled
systemctl show -p DefaultCPUAccounting --value
This must print yes. If it prints no, CPUQuota is not enforced;
set DefaultCPUAccounting=yes in /etc/systemd/system.conf and run
systemctl daemon-reexec.
- Check MemoryHigh and TasksMax
systemctl show user-<UID>.slice -p MemoryHigh -p TasksMax
How to tell whether cgroup v1 or v2 is used
- Cgroup v1: Controllers are mounted as separate hierarchies. There is
no file
cgroup.controllersunder/sys/fs/cgroup/. The test below prints nothing on v1; mount lines show typecgroup, notcgroup2:
test -f /sys/fs/cgroup/cgroup.controllers && echo "v2 or hybrid"
# v1: no output (file does not exist)
mount | grep cgroup
# v1: multiple lines with type cgroup, e.g. cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (...)
ls /sys/fs/cgroup/
# v1: directories such as cpu,cpuacct, memory, blkio, pids, systemd (no cgroup.controllers)
- Cgroup v2 (unified): A single cgroup2 mount; the root has
cgroup.controllers. The test prints “v2 or hybrid”; mount shows type cgroup2:
test -f /sys/fs/cgroup/cgroup.controllers && echo "v2 or hybrid"
# v2: prints "v2 or hybrid"
mount | grep cgroup
# v2: one line with type cgroup2, e.g. cgroup2 on /sys/fs/cgroup
cat /sys/fs/cgroup/cgroup.controllers
# v2: lists controllers (e.g. cpuset cpu io memory hugetlb pids rdma misc)
- Hybrid: Some systems have both: a cgroup2 mount (e.g. at
/sys/fs/cgroup) and legacy v1 controllers (e.g. under/sys/fs/cgroup/cpu,cpuacct). systemd may use v1 for resource control. Use the verification paths that exist on your system (step 4).
- Check CPU quota in the cgroup
Get the slice’s cgroup path:
CG=$(systemctl show user-<UID>.slice -p ControlGroup --value)
echo "ControlGroup=$CG"
Then read the CPU limit from the kernel. The path depends on whether the system uses cgroup v1 or v2.
Cgroup v1: Controllers are mounted separately (e.g. cpu,
cpu,cpuacct, memory, blkio, pids). The CPU limit is in
the cpu controller. Try:
# Cgroup v1: path under the cpu controller (may be named "cpu" or "cpu,cpuacct")
cat /sys/fs/cgroup/cpu${CG}/cpu.cfs_quota_us
cat /sys/fs/cgroup/cpu${CG}/cpu.cfs_period_us
If those files do not exist, the controller may be mounted as
cpu,cpuacct:
cat /sys/fs/cgroup/cpu,cpuacct${CG}/cpu.cfs_quota_us
cat /sys/fs/cgroup/cpu,cpuacct${CG}/cpu.cfs_period_us
cpu.cfs_quota_us= 100000 andcpu.cfs_period_us= 100000 means 100% of one logical CPU (one hardware thread). -1 means no limit.- On v1,
cpu.maxdoes not exist; that interface is cgroup v2 only.
Cgroup v2 (unified hierarchy): A single tree under
e.g. /sys/fs/cgroup/:
cat /sys/fs/cgroup${CG}/cpu.max
- A line like
100000 100000means 100% of one logical CPU (one hardware thread).maxalone means no limit.
If you are unsure which hierarchy you have, run
find /sys/fs/cgroup -name "user-<UID>.slice" -type d and look for
the slice under a path containing cpu (v1) or only under the root
cgroup path (v2). Then read cpu.cfs_quota_us and
cpu.cfs_period_us (v1) or cpu.max (v2) in that directory.
Limiting the number of parallel SSH logins¶
The number of concurrent SSH (and other login) sessions per user is
limited using PAM (Pluggable Authentication Modules) and the
maxlogins setting in /etc/security/limits.conf. This limit
applies to all session types: interactive SSH, SCP, and SFTP (each
connection counts as one session). There is no separate limit for SCP or
SFTP; standard PAM and sshd do not support different caps per session
type. MaxSessions in sshd_config does not limit per-user
logins; it limits multiplexed channels over a single TCP connection.
Configuration¶
In
/etc/security/limits.confadd a line such as:# Limit number of interactive bash sessions: * hard maxlogins 4
This limits all users (except root) to 4 concurrent logins. The limit is checked at login time; changing it does not terminate existing sessions.
Ensure sshd uses PAM and that
pam_limits.sois loaded for session setup.- On the Discoverer CPU cluster login node (login.discoverer.bg,
RHEL 8):
UsePAM yesis the default insshd_config. No edit or restart is required. - On the Discoverer+ login node (login-plus.discoverer.bg): create
/etc/ssh/sshd_config.d/02-use_pam.confwithUsePAM yes, then runsystemctl restart sshd.
In
/etc/pam.d/sshdthere must be a session line that loadspam_limits.so. On RHEL the defaultsshdPAM file does not include it directly. Add:session required pam_limits.so
Place it with the other
sessionrules (e.g. aftersession required pam_selinux.so open env_paramsand beforesession include password-auth). If you add or change this line, restart sshd on that node.- On the Discoverer CPU cluster login node (login.discoverer.bg,
RHEL 8):
Verification (parallel SSH logins)¶
Show the running sshd configuration (as root):
sshd -T
Options appear as key-value pairs (e.g.
usepam yes). To check only UsePAM:sshd -T | grep -i usepam
Example output:
usepam yes.Check that the limit is in place:
grep maxlogins /etc/security/limits.conf
Check that PAM applies limits for sshd:
grep pam_limits /etc/pam.d/sshd
If nothing is found,
pam_limits.somay be loaded via an include (e.g.password-authorpostlogin). Rungrep -r pam_limits /etc/pam.d/to see where it is loaded.To see how many login sessions a user has (e.g. on systemd):
loginctl list-sessions loginctl show-user <username>
When the limit is exceeded, the user sees an error such as “Too many logins for ‘username’” and the login is refused.
References for parallel login limits: limits.conf(5),
pam_limits(8); Red Hat solutions 7003283 (limit concurrent login
sessions) and 59562 (restrict maximum simultaneous ssh logins per user).
I/O controller availability¶
On some systems the cgroup I/O controller is not enabled or systemd does
not expose IOReadBandwidthMax and IOWriteBandwidthMax. If
systemctl daemon-reload or the script reports an error related to
these options, the IOReadBandwidthMax and IOWriteBandwidthMax
lines in the drop-in may be removed or commented out. CPU and memory
limits remain in effect.
References¶
- systemd.resource-control(5):
CPUQuota,MemoryHigh,TasksMax,IOReadBandwidthMax,IOWriteBandwidthMax. - findmnt(8): resolution of mount sources.
- cgroup v2 I/O controller: throttling of block device I/O only; network filesystem I/O is not subject to these limits on the client.
Getting help¶
See Getting help for support and contact information.