# SILT: small index and Large table

八月 24, 2012

storage

5 Comments »

SILT's multi-store design

The most common approach to build a hign performence key-value storage system uses two components:logstore and sortedstore. But in the implementation of SILT, it add a HashStore for sloving the high write amlication problem when merge logstore into sortedstore by traditional way. how to manage the sortedstore isn't key to performance, i think, just a little infulence.

In fact, SILT isn't a generic key-value store because of its keys are uniformly distributed 160bit hash value(e.g., pre-hashed key with SHA-1), but doing it fixed length as SILT have done here is so much convenient for compressing in-memory-index data, we will see later.

## LogStore

Append only, all data updates and inserts are written to the end of sortedstore file, and the same to delete, instead of performing a real delete operation. SILT design a new hash algorithm base on _cuckoo hashing_, make it more efficient

- Partial-Key, to make it compact, the hash table does not store the entire key in memory, but only a "tab" of the actual key. lose some accuracy and get more performance.
- move a key to its alternative bucket requires knowing its other hash value, but it is too expensive to read the key from disk, even worse, the butterfly effect. to solve this problem, the hash algorithm store the index of alternative bucket in tab, without any disk reads.
- the standar cuckoo hashing just only allows 50% occupancy, SILT improves it by increasing the associativity of the cuckoo hashing table, each bucket can use for 4 entries at the same time.

## HashStore

Once a LogStore is full, SILT freezes the LogStore and converts it into a more memory-efficient data structure called HashStore. HashStore is thus an on-disk cuckoo hash table which recording the key-value entries from insertion order to hash order, so the in-memory index filter isn't needed any more.

thanks to the design, merge HashStore into SortedStore is so fast, just reads HashStore in random and writes SortedStore in sequence.

## SortedStore

SortedStore is a static key-value store on disk with very low memory overhead, is the main optimization of SILT. I have mentioned previously that the key in SILT is limited to 160bit length. considering the expensive memory overhead of position pointer of each node when representing a prefix-tree in typical way, SILT uses a compact recursive representation to elimitnate pointers

the basic idea and what the figure above illustrate may be has a little difference, because SILT have done some optimization for it. each internal node save the numbers of leaf node in its left. so the representation for a prefix-tree T having L and R as its left and right subtrees is defined as follows:

*_Repr( T ) := | L | Repr( L ) Repr( R )_*

if the current subtree is a leaf-node, Repr() retuen -1, so after recursion the figure 6(a) part must be: 3 2 1 -1 -1 -1 3 1 -1 1 -1 -1 1 -1 -1. it is easy for you to know left or right subtree from this string because the first number is the root. but how to locate the root of right subtree in this string?

define an operation, discard_subtree(T), which return the right subtree of T, at first, it work may like as follows:

def discard_subtree(T): (thead, ttail) = (T[0], T[1:]) if thead == -1: return ttail else: ttail = discard_subtree(ttail) * return ttail

in practice, it just return the green part and the red part was kicked out.

it attempt to get rid of the left subtree recursively, and this is wrong(focus on * mark line), once end the recursion(e.g., the first number is -1 which represents a leaf node), the discard_subtree(T) just return and do nothing any more.

we should get rid of the right subtree of the left subtree when the recursion return. so, the right way like this

def discard_subtree(T): (thead, ttail) = (T[0], T[1:]) if thead == -1: return ttail else: ttail = discard_subtree(ttail) ttail = discard_subtree(ttail) return ttail

Now we get the right result

## References

SILT PPT

论文：*SILT: A Memory*-Efficient, High-Performance Key-Value Store

论文给出的LogStore算法中并没有清楚说明put方法对于同一个key的处理方法，在内存索引中没有保存最终的key值，所以没有办法搞清楚是同一个key，还是哈希碰撞。同样关于delete方法也没有很好的说明。请问这个你大概了解么？

不清楚你所问的，在内存中没有保存最终的key值，这句话的意思。

NoSQL的模式和日志记录差不多，不管是Insert还是Delete，都是往LogStore追加数据，如果有旧的记录，又插入了新的记录，那么新的记录就会覆盖旧的记录，因为查找总是找到最新的记录然后返回。

这种冲突可以在应用层来解决，例如在插入记录之前查询记录是否存在。

和日志记录不太一样，因为可能需要一次IO查询，所以这个论文真的是只适合SSD，不然写入效率肯定要打折扣的。

嗯，可能需要一次IO查询，不过它也用了很多手段来降低发生IO查询的机率，而且很节省内存。

不过我对这篇论文的设计还不是特别清晰。

我查了一些原作者相关的文章，应该是一个人想出了ECT，另一个人想出了partial cuckoo hashing的方法，相结合的产物。而且确实很省内存。