This page introduces Quality of Service (QoS) classes in Kubernetes, and explains how Kubernetes assigns a QoS class to each Pod as a consequence of the resource constraints that you specify for the containers in that Pod. Kubernetes relies on this classification to make decisions about which Pods to evict when there are not enough available resources on a Node.
Kubernetes classifies the Pods that you run and allocates each Pod into a specific
quality of service (QoS) class. Kubernetes uses that classification to influence how different
pods are handled. Kubernetes does this classification based on the
resource requests
of the Containers in that Pod, along with
how those requests relate to resource limits.
This is known as Quality of Service
(QoS) class. Kubernetes assigns every Pod a QoS class based on the resource requests
and limits of its component Containers. QoS classes are used by Kubernetes to decide
which Pods to evict from a Node experiencing
Node Pressure. The possible
QoS classes are Guaranteed, Burstable, and BestEffort. When a Node runs out of resources,
Kubernetes will first evict BestEffort Pods running on that Node, followed by Burstable and
finally Guaranteed Pods. When this eviction is due to resource pressure, only Pods exceeding
resource requests are candidates for eviction.
Pods that are Guaranteed have the strictest resource limits and are least likely
to face eviction. They are guaranteed not to be killed until they exceed their limits
or there are no lower-priority Pods that can be preempted from the Node. They may
not acquire resources beyond their specified limits. These Pods can also make
use of exclusive CPUs using the
static CPU management policy.
For a Pod to be given a QoS class of Guaranteed:
If instead the Pod uses Pod-level resources:
Kubernetes v1.34 [beta](enabled by default)Pods that are Burstable have some lower-bound resource guarantees based on the request, but
do not require a specific limit. If a limit is not specified, it defaults to a
limit equivalent to the capacity of the Node, which allows the Pods to flexibly increase
their resources if resources are available. In the event of Pod eviction due to Node
resource pressure, these Pods are evicted only after all BestEffort Pods are evicted.
Because a Burstable Pod can include a Container that has no resource limits or requests, a Pod
that is Burstable can try to use any amount of node resources.
A Pod is given a QoS class of Burstable if:
Guaranteed.Pods in the BestEffort QoS class can use node resources that aren't specifically assigned
to Pods in other QoS classes. For example, if you have a node with 16 CPU cores available to the
kubelet, and you assign 4 CPU cores to a Guaranteed Pod, then a Pod in the BestEffort
QoS class can try to use any amount of the remaining 12 CPU cores.
The kubelet prefers to evict BestEffort Pods if the node comes under resource pressure.
A Pod has a QoS class of BestEffort if it doesn't meet the criteria for either Guaranteed
or Burstable. In other words, a Pod is BestEffort only if none of the Containers in the Pod have a
memory limit or a memory request, and none of the Containers in the Pod have a
CPU limit or a CPU request, and the Pod does not have any Pod-level memory or CPU limits or requests.
Containers in a Pod can request other resources (not CPU or memory) and still be classified as
BestEffort.
Kubernetes v1.22 [alpha](disabled by default)Memory QoS uses the memory controller of cgroup v2 to manage memory throttling and protection in Kubernetes. It uses the Pod's QoS class to decide which cgroup settings to apply, but it is a separate opt-in feature. Disabling Memory QoS does not change how Pods are classified.
For Burstable pods, the kubelet sets memory.high to throttle memory allocation
before the workload hits its hard limit (memory.max). The throttling threshold
is calculated as:
memory.high = requests + memoryThrottlingFactor * (limits - requests)
where memoryThrottlingFactor defaults to 0.9. For example, a container with a
256 MiB request and a 1 GiB limit gets memory.high set to roughly 947 MiB.
If a Burstable container has no memory limit, node allocatable memory is used in
place of the limit.
Guaranteed pods do not get memory.high because their requests equal their
limits. BestEffort pods do not get memory.high because they have no requests
or limits.
Memory reservation is controlled via the kubelet configuration field
memoryReservationPolicy:
None (default): the kubelet does not set memory.min or memory.low for
containers and pods. No memory is hard-locked by the kernel.TieredReservation: the kubelet sets tiered memory protection based on the
Pod's QoS class:
memory.min is set to memory requests. The kernel
will not reclaim this memory under any circumstances.memory.low is set to memory requests. The kernel
preferentially retains this memory but may reclaim it under extreme pressure.Memory QoS requires Linux with cgroup v2. Kernel 5.9 or higher is recommended
because memory.high throttling on older kernels can trigger a known
livelock bug.
If the MemoryQoS feature gate is enabled on an older kernel, the kubelet logs
a warning at startup.
Certain behavior is independent of the QoS class assigned by Kubernetes. For example:
Any Container exceeding a resource limit will be killed and restarted by the kubelet without affecting other Containers in that Pod.
If a Container exceeds its resource request and the node it runs on faces resource pressure, the Pod it is in becomes a candidate for eviction. If this occurs, all Containers in the Pod will be terminated. Kubernetes may create a replacement Pod, usually on a different node.
The resource request of a Pod is equal to the sum of the resource requests of its component Containers, and the resource limit of a Pod is equal to the sum of the resource limits of its component Containers.
The kube-scheduler does not consider QoS class when selecting which Pods to preempt. Preemption can occur when a cluster does not have enough resources to run all the Pods you defined.
The QoS class is determined when the Pod is created and remains unchanged for the lifetime of the Pod. If you later attempt an in-place resize that would result in a different QoS class, the resize is rejected by admission.