• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Whenever you search in PBworks or on the Web, Dokkio Sidebar (from the makers of PBworks) will run the same search in your Drive, Dropbox, OneDrive, Gmail, Slack, and browsed web pages. Now you can find what you're looking for wherever it lives. Try Dokkio Sidebar for free.

View
 

HaasDataFederation

Page history last edited by Jim Blomo 15 years, 1 month ago

Paper

  • companies may have data stored in a diverse set of places, need to access it uniformly somehow
  • you can write customized combiners, but they are fragile and hard to extend
  • application-integration or component-based frameworks are better because they offer a known interface, but data schemes may change from under them
  • workflow-systems provide some consistancy, but limited support for data comparison and manipulation
  • portals can grab data from difference places, but in order to do things more complex than aggregation, a developer is needed
  • data warehousing moves data from different sources to one relational DB, but may be hard because of cost.  and it doesn't support functions of the original systems
  • a "gateway" allows a federated DBMS to route a query to another source (DB)
  • "transparancy": masking the difference between data sources
  • support heterogeneity: support for hardware, software, data model, interface, protocols
  • high degree of function: use functions on the native sources
  • extensibility

 

  • scalar UDF: user defined function, for example SELECT db2mq.mqsend(a.headline) WHERE headline.date >= blah
    • sends a message to MQSeries
  • table UDF: functions that act like tables: eg in SELECT file.name from TABLE(dir(/etc, cron*))
  •  wrapers: most flexible, wrap parts of an SQL statement and push them to an external data source (eg oracle), then do the rest of the processing (eg relevance) locally
    • offer multi data source abstraction
    • operations including update and delete that get translated into the foriegn commands
    • operations can be on either end (eg remote "similarity" function, or local "group by" function)
    • optimization
    • transaction symantics: the query keeps track of used wrappers, notifies them of abort/commit
  • how to decide which model to use, if any are yes use wrapper:
    • reach out to multiple data sources/servers?
    • transactional consistancy required?
    • are there multiple distince operations or data sets? UDF tables are OK for combining one set, but wrappers need to be used for nicknames and more complex operations
  • why use a DB to back federated data?
    • SQL is great (declarative, well used, has extensions)
    • transactional guarantees
    • easy place to store data ("local store")
    • use of all the tools designed for DBs (reporting, web connection, etc)
  • can be used for a variety of situations
    • national HQ needs to know total sales of all offices (national DB2 queries individuals oracles, then puts them in a view)
    • putting together nonrelational data (store takes messages from queue, gets bids from service, puts them into a DB with XML)
    • semirelational (store makes reports using excel data and sales data)
    • heterogeneous replication (have backup server and dataware house. use triggers to insert data into different places)
    • cached data (some tables are "read only" (only udpated by merchants), those can be cached on other machines)
    •  

 

Lecture

  • motivations:
    • data creation excelerating, but not formally managed
    • DBs know how to store large amounts of data very well
    • good research, commercial projects, etc.
    • companies have different DBs, different vendors, for different reasons
    • take DB ideas, extend it to other data
  • styles of data integration
    • inside out: SQL user defined functions, user defined tables,
    • outside in: midleware to combine data from different sources
    • SOA: have the DB provide a service

Comments (1)

Jim Blomo said

at 11:21 am on Oct 3, 2007

word to your data

You don't have permission to comment on this page.