• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

DewittSkewProcessing

Page history last edited by PBworks 17 years, 1 month ago

 Paper

  • join small segments of data together using different techniques, find the best one, use it for the rest
  • skew in DBs can hurt performance, but handling skew usually hurts typical performance
  • traditional hashing: hash values you want to be joined, mod number of CPUs, send each bucket to each CPU, CPU joins buckets
  • but one CPU could handle too many requests because of repeated values
  • partitioning: split up the data into k ranges, differing in ranges but ideally the same number of values
    • should we evenly distribute R, S, or R&&S?
    • we know we should evenly distribute building tuples (rather than probing) so we don't have to store extras on disk
  • subset replication: what if there are too many of one value? send the building tuples to all sites, probing tuples to 1
  • weighted: split up certiain values that are repeated frequently TODO: how is this different?
  • virtual processor partition: split up load into more than k buckets, then divy up work by:
    • round robin
    • processor scheduling: given estimates as to the number of joins needed, equalize the times required by the CPUs using "LPT"
  • hybrid hash: follow basic hasing algo
  • simple range partitioning
    • sample the building table,
    • redistribute values based on computed ranges
    • redistribute probing values
  • weighted range: use how many of the same value to determine how many values go wear
  • virtual paritioning - round robin
  • extant sampling: pick a random page in an extant, then pick a random tuple. requires at most inverse of fill-factor TODO: what is this?
  • to implement in gamma, modifty current (Map HashBucket ProcessorNumber) into (Map Range ProcessorNumber)
    • use these tables to take a tuple and map it to a processor, then send that tuple to the outgoing buffer for that processor
  • for virtual processor ranges, use two level table to map from tuple to virt proc, the from virt proc to phys proc
  • hybrid hash (HH) clear winner for x1 J x1 distrobutions because it doesn't have to sample
  •  

Comments (0)

You don't have permission to comment on this page.