Skip to content

Replication, Checkpoint, Logging, and Recovery

Discussion

  • 03/25/18:

    • Revisit RAMCloud, which has a very similar goal with Lego. It keeps a full copy of data in DRAM, use disk to ensure crash consistency. The key assumption of RAMCloud is the battery-backed DRAM or PM on its disk side.
    • We don’t need to provide a 100% recoverable model. Our goal here is to reduce the failure probabilities introduced by more components. Let us say Lego do the persist in a batching fashion, instead of per-page. We are not able to recover if and only if failure happen while we do the batch persist. But we are safe if failure happen between batched persist.
    • That actually also means we need to checkpoint process state in Processor side. We have to save all the process context along with the persisted memory log! Otherwise, the memory content is useless, we don’t know the exact IP and other things.
    • I’m wrong. :-)
  • 03/20/18: when memory is enough, use pessimistic replication, when demand is high, use optimistic to save memory components.

Replication

Before started, I spent some time recap, and found Wiki pages123 are actually very good.

Two main approaches:

  • Optimistic (Lazy, Passive) Replication 4, in which replicas are allowed to diverge
    • Eventual consistency567, meaning that replicas are guaranteed to converge only when the system has been quiesced for a period of time
  • Pessimistic (Active, Multi-master8) Replication, tries to guarantee from the beginning that all of the replicas are identical to each other, as if there was only a single copy of the data all along.

Lego is more towards memory replication, not storage replication. We may want to conduct some ideas from DSM replication (MRSW, MRMW), or in-memory DB such as RAMCloud, VoltDB?

Checkpointing

Some nice reading9.

Application types:

  • Long-running v.s. Short-lived
  • Built-in checkpoint/journaling v.s. no built-in checkpoint/journaling

Two main approaches:

  • Coordinated
    • 2PC
  • Un-coordinated
    • Domino effect

We should favor [Long-running && no built-in checkpoint/journaling] applications. Normally they are not distributed systems, right? Even it is, it might be running as a single-node version. Based on this, I think we should favor coordinated checkpointing.

HPC community101112 has a lot publications on checkpoint/recovery (e.g., Lawrence National Laboratory).

MISC

Some other interesting topics:

  • Erasure Coding
    • Less space overhead
    • Parity Calculation is CPU-intensive
    • Increased latency


Yizhou Shan
Created: Mar 19, 2018
Last Updated: Mar 19, 2018