Reconciling High Server Utilization and Sub-millisecond Quality-of-Service

Time: 一月 3, 2017
Category: colocation


In this paper, we analyze the challenges of maintaining high QoS for low-latency workloads when sharing servers with other workloads.

The additional workloads can interfere with resources such as processing cores, cache space, memory or I/O bandwidth

The goal of this work is to investigate if workload colocation and good quality-of-service for latency-critical services are fundamentally incompatible in modern systems, or if instead we can reconcile the two


  1. queuing delay: increases in queuing delay due to interference on shared resources
  2. scheduling delay: long scheduling delays when timesharing processor cores
  3. load imbalance: poor tail latency due to thread load imbalance


1. Queuing delay

What: Queuing delay occurs due to coincident or rapid request arrivals,Interference from co-located workloads impacts queuing delay by increasing service time, thus decreasing service rate. Even if the co-located workload runs on separate processor cores, its footprint on shared caches, memory channels, and I/O channels slows down the service rate for the latency critical workload.

How: Thus, we propose that load be provisioned to services in an interference-aware manner, that takes into account the reduction in throughput that a service might experience when deployed on servers with co-located workloads.

2. Scheduling delay

What: 调度延迟主要有两方面:

  • scheduler wait time
  • context switch latency

Linux内核默认CFS调度器最大的问题是: CFS’s wakeup placement algorithm allows sporadic tasks to induce long wait time on latency-sensitive tasks like memcached.

How:  Fortunately, there are several strategies one can employ to mitigate this wait time for latency-sensitive services, including

  1. adjusting task share values in CFS,
  2. utilizing Linux’s POSIX real-time scheduling disciplines instead of CFS, or
  3. using a general purpose scheduler with support for latency-sensitive tasks, like BVT
  4. CPU Bandwidth Limits to Enforce Fairness

3. Load imbalance

What: A latency-sensitive service’s vulnerability to load imbalance can be easily ascertained by purposefully putting it in a situation where threads are unbalanced

How: One solution to this problem is particularly straight-forward and effective: threads can be pinned explicitly to distinct cores, so that Linux can never migrate them on top of each other

Leave a Comment