Terminology #
- External consistency
- if a transaction \(T_1\) commits before another transaction \(T_2\) starts, then \(T_1\) ’s commit timestamp is smaller than \(T_2\) ’s.
What’s Special #
- Cross datacenter.
- TrueTime API.
- Provides external consistency at global scale.
How it works? #
Structure #


- Zone: zone is the rough analog of a deployment of Bigtable servers.
- zonemaster: assigns data to spannerserver.
- spanserver: serve data to client.
- location proxy: client uses it to locate the spannerserver.
- universemaster: a console that displays status information about all the zones for interactive debugging.
- placement driver: handles automated movement of data across zones on the timescale of minutes.
Spanserver #
Transaction #
Each transaction will be assigned a timestamp to preserves lineaizability.
Read #
For read-only transaction, Paxos group’s leader assigns
\(s_{read}\)
. If read happens on single-site, Spanner just assigns LastTS()
to
\(s_{read}\)
, where LastTS()
is the timestamp of the last committed write at a Paxos group. For read happens on multi-Paxos groups, Spanner chose a simple choice. It just has its reads execute at
\(s_{read} = TT.now().latest\)
(which may wait for safe time to advance).
- What is safe time? - safe time is to preserve lineaizability. A replica can satisfy a read at a timestamp \(t\) if \(t \le t_{safe}\) .
Snapshot read is much easier than RO. It can execute at any replicas that are sufficiently up-to-date.
What about Read-Write Transactions?
Directory #
A directory is the unit of data placement, which is a set of contiguous keys that share a common prefix. When data is moved between Paxos groups, it is moved directory by directory. Paxos group consists of tablets. Each spanserver implements a single Paxos state machine on top of each tablet.
- How to move directory? By using Movedir. It will move data in backgroud, and start a transaction to move changed datas.
some information: https://www.scs.stanford.edu/17au-cs244b/notes/spanner.txt https://quizlet.com/blog/quizlet-cloud-spanner
Shortly of SSTable #
It is self-describing and therefore highly redundant, and traversal of individual columns within the same locality group is particularly inefficient.
- Why does traversing the individual columns within the same locality group cause inefficiency?
What?? #
In straight-up Paxos, both reads and writes go through same protocol Leader must wait another round trip to hear from quorum Why not just handle read locally at the leader (no data to replicate)? Later leader could have externalized writes, violating linearizability How do we fix Paxos to handle reads at leader? Nodes grant leader lease–promise not to ack other leaders for time T Given leases from quorum, leader knows no other leaders, can read locally Assumes bounded clock drift