Resource limits on login-plus.discoverer.bg
===========================================

.. contents:: Table of contents
   :depth: 3


Overview
--------

``login-plus.discoverer.bg`` is the login node of the Discoverer+ GPU cluster. It is a shared
gateway used by all users of the cluster. It is not a compute node.
Its purpose is strictly limited to the following tasks:

- connecting to the cluster via SSH;
- submitting, monitoring, and cancelling SLURM jobs;
- checking project resource allocation and quota;
- managing files and directories under the project's storage space.

Any workload beyond these lightweight operations — including IDEs, AI coding agents, interactive
notebooks, long-running scripts, or any process that sustains significant CPU or memory consumption
— does not belong on the login node. Such workloads degrade the experience for every other user
sharing the node and, as described in the sections below, are subject to automatic throttling.


Per-user resource limits
------------------------

To protect the shared environment, the login node enforces per-user resource limits via
*systemd cgroup v2 user slices*. Every user session is placed inside a slice named
``user-<UID>.slice``, and the following limits apply to the aggregate of all processes in that
slice:

+------------------------------------------------+------------------------+------------------------------------------+
| Parameter                                      | Default limit          | Description                              |
+================================================+========================+==========================================+
| ``CPUQuota``                                   | 200%                   | Maximum CPU allocation (2 logical        |
|                                                |                        | threads equivalent)                      |
+------------------------------------------------+------------------------+------------------------------------------+
| ``MemoryHigh``                                 | 4.0 GB                 | Soft memory ceiling; throttling begins   |
|                                                |                        | above this threshold                     |
+------------------------------------------------+------------------------+------------------------------------------+
| ``TasksMax``                                   | 5000                   | Maximum number of concurrent processes   |
|                                                |                        | and threads                              |
+------------------------------------------------+------------------------+------------------------------------------+
| ``IOReadBandwidthMax`` /                       | 10 MB/s per direction  | Local block device I/O throttle          |
| ``IOWriteBandwidthMax``                        |                        |                                          |
+------------------------------------------------+------------------------+------------------------------------------+

These limits apply to all users within the designated UID range and are enforced continuously.
They are not negotiable and will not be raised to accommodate unsupported workloads.

.. note::
   The ``MemoryHigh`` parameter is a *soft* ceiling, not a hard kill threshold. When a user
   slice exceeds it, the kernel begins throttling memory allocation and aggressively reclaiming
   pages. Processes are not immediately terminated; instead they stall in uninterruptible sleep
   (kernel state ``D``), which increments the system load average regardless of CPU utilisation.
   This behaviour is intentional: it preserves enough system headroom for the affected user to
   log in via a new SSH session and terminate the offending process themselves.


Checking your own resource consumption
--------------------------------------

Users are encouraged to monitor their own slice before attributing login node slowness to other
accounts. The following command shows the current state of your cgroup slice:

.. code:: bash

   systemctl status user-$(id -u).slice

The relevant line in the output is::

   Memory: X.XG (high: 4.0G available: YB)

As long as the *available* figure is non-zero, your session is within the limit and is not
contributing to system load. If *available* reads ``0B``, your slice is at or above the
``MemoryHigh`` threshold and your processes are being throttled.

The raw byte values can be read directly from the cgroup filesystem:

.. code:: bash

   cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/memory.current
   cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/memory.high

.. important::
   When the login node feels slow or unresponsive, the first step is always to check your
   *own* account using the commands above. Do not assume that another user's processes are
   the cause before verifying that your own slice is within its limits. Each user's slice is
   accounted and throttled independently; a throttled slice belonging to another user cannot
   directly cause throttling in yours.


Effect of throttling on system load average
-------------------------------------------

The system load average reported by ``uptime`` counts all processes in runnable or uninterruptible
sleep state. A process stalled in state ``D`` — waiting on memory reclaim, I/O, or a kernel lock —
contributes 1.0 to the load average for every second it remains in that state, irrespective of
how much CPU it is consuming.

This means a user slice sitting at ``MemoryHigh`` with ``available: 0B`` can produce a load
average contribution of 10–20 or more from a handful of processes, even though those processes
show only modest CPU percentages in ``top``. The discrepancy between the CPU figures visible in
``top`` and the load average reported by ``uptime`` is the diagnostic signature of memory-pressure
throttling.

Example: a user running a persistent AI coding agent (``claude --resume``) consuming 4.23 GB
against a 4.0 GB ``MemoryHigh`` limit produced the following system state::

   $ uptime
   17:51:49 up 19 days, 5:38, 1 user, load average: 40.85, 39.37, 39.04

   $ systemctl status user-5046.slice
   Memory: 4.2G (high: 4.0G available: 0B)

   $ cat /sys/fs/cgroup/user.slice/user-5046.slice/memory.current
   4546756608

   $ cat /sys/fs/cgroup/user.slice/user-5046.slice/memory.high
   4294967296

The load average of 40.85 on a 32-thread machine — indicating the node was heavily overloaded —
arose not from high CPU usage but from a single process holding the slice 230 MB above its
``MemoryHigh`` threshold, with zero available headroom.


On the use of VSCode, AI coding agents, and similar tools
---------------------------------------------------------

Some users connect VSCode, Claude Code, OpenHands, or similar development environments to the
login node via Remote-SSH. This is not explicitly blocked, but the following conditions apply
without exception:

1. **Use is at the user's own risk.** Such tools are not a supported use case on the login node.

2. **All processes are subject to the cgroup limits described above.** VSCode and its background
   processes — language servers, file indexers, extension workers — will be throttled as soon as
   the user's slice approaches the ``MemoryHigh`` or ``CPUQuota`` ceiling. This is by design and
   will not be changed.

3. **No obligation to provide additional resources arises.** The login node is not a compute
   resource. No EuroHPC resource allocation policy, national allocation agreement, or any other
   instrument governing access to this system creates an entitlement to additional login node
   capacity for the purpose of running development tools.

4. **Performance degradation is not a support issue.** If a VSCode or agent session becomes
   slow or unresponsive, the cause is cgroup throttling as described in this document. Users
   experiencing this should terminate the offending processes and migrate their workflow to
   a local workstation (see `Recommended workflow`_).

The same conditions apply equally to all users. A user whose VSCode or ``node`` processes are
predominantly in state ``D`` is already being throttled by their own cgroup limits. Those
processes are not running freely and are not the cause of performance problems in other users'
sessions. Each user slice is independent.


Why CPU-only SLURM jobs are not an appropriate alternative
-----------------------------------------------------------

Moving a development tool such as Claude Code or a Jupyter server to a SLURM interactive job
on a compute node is not an acceptable workaround and is actively discouraged, for two reasons.

First, Discoverer+ does not permit unlimited wall time. Every job must declare a wall time and
no job may exceed the maximum wall time enforced by the Slurm account's QoS — which does not
exceed 2 hours for most projects. A development tool running as an interactive job will therefore be
unconditionally terminated by SLURM when the wall time expires, making it unsuitable for any
persistent development workflow.

Second, Discoverer+ is a GPU cluster. Its compute nodes are DGX systems equipped with H200
GPUs, and project allocations are denominated in GPU hours. The SLURM billing model charges all
host resources consumed by a job, not only GPU time:

::

   billing/min = (CPU_threads × 0.035714) + (MemoryGB × 0.25) + (GPUs × 1.0)

Memory is the dominant billing term. A job that allocates 4 GB of host RAM and zero GPUs
still consumes billing units at a rate of 1.0 billing unit per minute. A tool running
continuously for 98 hours at this memory footprint would consume approximately 5,880
billing-minutes from the project allocation, producing no scientific output.

More critically, a CPU-only job occupies host memory and CPU threads that are physically
co-located with GPUs on the DGX node. The billing fairness mechanism penalises exactly this
pattern: a project whose jobs consistently over-allocate CPU threads or memory relative to GPU
count will exhaust the ``billing`` counter before the ``gres/gpu`` counter is spent, rendering
the remaining GPU-hours permanently unreachable for that allocation period.

For a full explanation of the billing model and its implications, see: :doc:`slurm-gpu-billing-explainer`.


Recommended workflow
--------------------

The following workflow is correct, supported, and consistent with how EuroHPC resource
allocations on Discoverer+ are intended to be consumed:

- Run VSCode, Claude Code, OpenHands, Jupyter, or any other development tool on a **local
  workstation or laptop**. These tools have no business running on shared HPC infrastructure.

- Connect from those local tools to the login node via SSH *only* for the following purposes:

  - submitting and monitoring SLURM jobs;
  - checking project resource allocation and storage quota;
  - managing files and directories under the project's storage space.

- For interactive workloads that genuinely require cluster compute resources, request an
  **interactive SLURM job allocation on a compute node**. Local tools may then connect to the
  allocated node directly via SSH for the duration of the job. The login node is never the
  target for such connections.

This model ensures that the login node remains responsive for all users, that project billing
budgets are consumed by productive GPU workloads rather than development tooling, and that
users retain full access to the resources their project has been allocated.


Getting help
------------

See :doc:`help`