Average server utilization in most datacenter is low, ranging between 10%~50%. Difficult to consolidate the latency-critical services on a subset of highly utilized servers. Increase the server utilization by launching best-effort tasks on the same server with a latency-critical job
Goal: Eliminate SLO violations at all levels of load for the LC job while maximizing the throughput for BE tasks.
1. What is Heracles?
A real-time, feedback-based controller system
- Enables the safe co-location of best-effort(BE) tasks alongside a latency-critical(LC) service
- Ensures that LC jobs meet their target while maximizing the resources given to BE tasks
Four hardware and software isolation mechanisms:
- Core isolation： Pin workload to a set of core using cpuset cgroups
- Network traffic control： Limit the outgoing bandwidth of BE tasks using Linux traffic control， No limit on LC job.
- Shared cache partitioning：
- Cache Allocation Technology(CAT) in Intel Haswell+ CPU, Use way-partitioning to define non-overlapping partitions on LLC.
- Implement software monitor to track the bandwidth usage of LC and BE jobs, Scale down the total of cores for BE jobs if LC jobs does not receive sufficient bandwidth
- Fine-grained power/frequency setting: CPU frequency monitoring, Running Average Power Limit(RAPL), and per-core DVFS.
2. The Design Approach
What: Maximize utilization with the constraint that the SLO must be met
How: Decomposes the high-dimensional optimization problem into many smaller and independent problem, decoupling interference sources. and adjust the BE job allocation by monitoring latency, latency slack, and load