• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

HaasDataFederation

Page history last edited by Jim Blomo 16 years, 2 months ago

Paper

  • companies may have data stored in a diverse set of places, need to access it uniformly somehow
  • you can write customized combiners, but they are fragile and hard to extend
  • application-integration or component-based frameworks are better because they offer a known interface, but data schemes may change from under them
  • workflow-systems provide some consistancy, but limited support for data comparison and manipulation
  • portals can grab data from difference places, but in order to do things more complex than aggregation, a developer is needed
  • data warehousing moves data from different sources to one relational DB, but may be hard because of cost.  and it doesn't support functions of the original systems
  • a "gateway" allows a federated DBMS to route a query to another source (DB)
  • "transparancy": masking the difference between data sources
  • support heterogeneity: support for hardware, software, data model, interface, protocols
  • high degree of function: use functions on the native sources
  • extensibility

 

  • scalar UDF: user defined function, for example SELECT db2mq.mqsend(a.headline) WHERE headline.date >= blah
    • sends a message to MQSeries
  • table UDF: functions that act like tables: eg in SELECT file.name from TABLE(dir(/etc, cron*))
  •  wrapers: most flexible, wrap parts of an SQL statement and push them to an external data source (eg oracle), then do the rest of the processing (eg relevance) locally
    • offer multi data source abstraction
    • operations including update and delete that get translated into the foriegn commands
    • operations can be on either end (eg remote "similarity" function, or local "group by" function)
    • optimization
    • transaction symantics: the query keeps track of used wrappers, notifies them of abort/commit
  • how to decide which model to use, if any are yes use wrapper:
    • reach out to multiple data sources/servers?
    • transactional consistancy required?
    • are there multiple distince operations or data sets? UDF tables are OK for combining one set, but wrappers need to be used for nicknames and more complex operations
  • why use a DB to back federated data?
    • SQL is great (declarative, well used, has extensions)
    • transactional guarantees
    • easy place to store data ("local store")
    • use of all the tools designed for DBs (reporting, web connection, etc)
  • can be used for a variety of situations
    • national HQ needs to know total sales of all offices (national DB2 queries individuals oracles, then puts them in a view)
    • putting together nonrelational data (store takes messages from queue, gets bids from service, puts them into a DB with XML)
    • semirelational (store makes reports using excel data and sales data)
    • heterogeneous replication (have backup server and dataware house. use triggers to insert data into different places)
    • cached data (some tables are "read only" (only udpated by merchants), those can be cached on other machines)
    •  

 

Lecture

  • motivations:
    • data creation excelerating, but not formally managed
    • DBs know how to store large amounts of data very well
    • good research, commercial projects, etc.
    • companies have different DBs, different vendors, for different reasons
    • take DB ideas, extend it to other data
  • styles of data integration
    • inside out: SQL user defined functions, user defined tables,
    • outside in: midleware to combine data from different sources
    • SOA: have the DB provide a service

Comments (1)

Jim Blomo said

at 11:21 am on Oct 3, 2007

word to your data

You don't have permission to comment on this page.