• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

HsiaoAvailabilityReplication

Page history last edited by PBworks 16 years, 10 months ago

 Paper

  • data redudency over disks accomplished by striping or checksumming
  • in normal mode, mirrored disks have best performance
  • in failure mode, mirrored disks also fastest because data doesn't need to be reconstructed
  • mirrored disk is also the best for availability, since it is easy to recover data (it is copied instead of reconstructed)
  • in a shared nothing environment, how do you deal with a downed site?
    • spreading data back out may be difficult and if done improperly significantly degrades performance
  • tandem: 2 IO connections to each disk, each disk is mirrored, relations clustered across disks
  • failure of a disk is OK, since they are redundent, but failure of a CPU means all of it's work spread to another, potentially doubling that CPU's workload
  • teradata: interleaved clustering: cluster the size of N is broken into N-1 pieces and stored at other sites
  • ROWA incurs a cost
  • chained declustering: primary and backup copies are spread on sites, never on the same disk
  • optionally, disks themselves can be subdivided and chained (the backup copy is stored on P+1 mod M disk)
  • when there's a failure, the load of the (subdivided) disk is split evenly among the backup copies
  • TODO understand difference in backup and chained backup
  •  Experiment 1: find a single tuple using partitioning value
    • normal: elevator disk scheduling algo for IR and CR outperform MD at higher usage levels, same at low
    • TODO: what is elevator scheduling?
    • failure: disks become major bottlenecks when 1 disk is serving multiple nodes; MD suffers, IR and CR better distribute load
    • load balancing effect: the fact that the data is more evenly spread for failures
    • disk scheduling effect: improved disk seek time by IR and CR
  • Experiment 2: 1% selection query on partitioning value
    • normal: CPU becomes bottleneck for CD and ID (TODO why), MD has disk bottlenecks.  MD barely outperforms others because of this
    • because CD and ID distribute their backup pages across machines, they need to send the query to more places, however this behavior also makes intraquery paralellism easier.  MD reads more pages to send data
    • failure: again, spreading of the failure enables CD and ID to operate much better
  • Experiment 3: 0.1% selection on non-partitioning value
    • ID and CD's parallelism really helps the normal query here, esp since the query is not CPU bound.  disk scheduling effects also elevates throughput here
    • although with no contention, MD works better in failure mode since it doesn't need to redistribute scans, CD and ID win the rest
    •  

 

Lecture

  • High availability in PDBs: motivations
    • they wanted a HA fault tolerant DBMS (eg Tandem, how many 9s can you give customers?)
    • when scaling, DBs have many parts, which dramatically increases mean time to failure, so you need some way to overcome that
  • predecessors:
    • RAID - striping + checksum across multiple disks
      • good for high bandwidth filesystem reads
      • writes can be costly, esp DB writes
      • good cost point for reliability
      • poor performance post failure
    • mirrored disks (tandem)
      • 2 disks, 2 controlers, 2 CPUs connected to both
      • reads can go to either disk (replica or main)
      • write to both (synchronously)
      • one failure, data remains available (no recovery needed)
      • second failure *can* be OK, too (as long as it's not next to the other)
      • performance issue: 1 failure means twice the load on it's backup/replica
    • interleaved clustering
      • goal: better failure load balancing characteristics
      • one node is a primary node for a fragment
      • other nodes in its cluster of N nodes would each own a piece of the backup fragment (1/(n-1))
      • issue: any second failure means data becomes unavailable
      • tension: big tension for load balancing, but small clusters for HA in multiple failures
    • chained declustering
      • normal operation: PDB perations: traffic goes to primaries, backups are not used for reading
      • failure mode: neighbor takes up slack, but offloads it's primary fragment to its neighbor, and so on down the chain
      •  

Comments (0)

You don't have permission to comment on this page.