Replication, Checkpoint, Logging, and Recovery¶

Discussion¶

03/25/18:
- Revisit RAMCloud, which has a very similar goal with Lego. It keeps a full copy of data in DRAM, use disk to ensure crash consistency. The key assumption of RAMCloud is the battery-backed DRAM or PM on its disk side.
- We don’t need to provide a 100% recoverable model. Our goal here is to reduce the failure probabilities introduced by more components. Let us say Lego do the persist in a batching fashion, instead of per-page. We are not able to recover if and only if failure happen while we do the batch persist. But we are safe if failure happen between batched persist.
- That actually also means we need to checkpoint process state in Processor side. We have to save all the process context along with the persisted memory log! Otherwise, the memory content is useless, we don’t know the exact IP and other things.
- I’m wrong. :-)
03/20/18: when memory is enough, use pessimistic replication, when demand is high, use optimistic to save memory components.

Replication¶

Before started, I spent some time recap, and found Wiki pages¹²³ are actually very good.

Two main approaches:

Optimistic (Lazy, Passive) Replication ⁴, in which replicas are allowed to diverge
- Eventual consistency⁵⁶⁷, meaning that replicas are guaranteed to converge only when the system has been quiesced for a period of time
Pessimistic (Active, Multi-master⁸) Replication, tries to guarantee from the beginning that all of the replicas are identical to each other, as if there was only a single copy of the data all along.

Lego is more towards memory replication, not storage replication. We may want to conduct some ideas from DSM replication (MRSW, MRMW), or in-memory DB such as RAMCloud, VoltDB?

Checkpointing¶

Some nice reading⁹.

Application types:

Long-running v.s. Short-lived
Built-in checkpoint/journaling v.s. no built-in checkpoint/journaling

Two main approaches:

Coordinated
- 2PC
Un-coordinated
- Domino effect

We should favor [Long-running && no built-in checkpoint/journaling] applications. Normally they are not distributed systems, right? Even it is, it might be running as a single-node version. Based on this, I think we should favor coordinated checkpointing.

HPC community¹⁰¹¹¹² has a lot publications on checkpoint/recovery (e.g., Lawrence National Laboratory).

MISC¶

Some other interesting topics:

Erasure Coding
- Less space overhead
- Parity Calculation is CPU-intensive
- Increased latency

–
Yizhou Shan
Created: Mar 19, 2018
Last Updated: Mar 19, 2018