文章归档

CPI2 : CPU performance isolation for shared compute clusters

论文原址: http://research.google.com/pubs/pub40737.html

This paper describes CPI2, a system that builds on the useful properties of CPI measures to automate all of the following:

  1. observe the run-time performance of hundreds to thousands of tasks belonging to the same job, and learn to distinguish normal performance from outliers
  2. identify performance interference within a few minutes by detecting such outliers
  3. determine which antagonist applications are the likely cause with an online cross-correlation analysis
  4. (if desired) ameliorate the bad behavior by throttling or migrating the antagonists.

»» 继续阅读全文

Heracles: Improving Resource Efficiency at Scale

论文原址:

  1. http://csl.stanford.edu/~christos/publications/2015.heracles.isca.pdf
  2. https://cs.stanford.edu/~davidlo/resources/2015.heracles.isca.slides.pdf

Average server utilization in most datacenter is low, ranging between 10%~50%. Difficult to consolidate the latency-critical services on a subset of highly utilized servers. Increase the server utilization by launching best-effort tasks on the same server with a latency-critical job

Goal: Eliminate SLO violations at all levels of load for the LC job while maximizing the throughput for BE tasks.

»» 继续阅读全文

Reconciling High Server Utilization and Sub-millisecond Quality-of-Service

论文原址:http://csl.stanford.edu/~christos/publications/2014.mutilate.eurosys.pdf

In this paper, we analyze the challenges of maintaining high QoS for low-latency workloads when sharing servers with other workloads.

The additional workloads can interfere with resources such as processing cores, cache space, memory or I/O bandwidth

The goal of this work is to investigate if workload colocation and good quality-of-service for latency-critical services are fundamentally incompatible in modern systems, or if instead we can reconcile the two

»» 继续阅读全文