|
HaasDataFederation
Page history
last edited
by Jim Blomo 16 years, 2 months ago
Paper
- companies may have data stored in a diverse set of places, need to access it uniformly somehow
- you can write customized combiners, but they are fragile and hard to extend
- application-integration or component-based frameworks are better because they offer a known interface, but data schemes may change from under them
- workflow-systems provide some consistancy, but limited support for data comparison and manipulation
- portals can grab data from difference places, but in order to do things more complex than aggregation, a developer is needed
- data warehousing moves data from different sources to one relational DB, but may be hard because of cost. and it doesn't support functions of the original systems
- a "gateway" allows a federated DBMS to route a query to another source (DB)
- "transparancy": masking the difference between data sources
- support heterogeneity: support for hardware, software, data model, interface, protocols
- high degree of function: use functions on the native sources
- extensibility
- scalar UDF: user defined function, for example SELECT db2mq.mqsend(a.headline) WHERE headline.date >= blah
- sends a message to MQSeries
- table UDF: functions that act like tables: eg in SELECT file.name from TABLE(dir(/etc, cron*))
- wrapers: most flexible, wrap parts of an SQL statement and push them to an external data source (eg oracle), then do the rest of the processing (eg relevance) locally
- offer multi data source abstraction
- operations including update and delete that get translated into the foriegn commands
- operations can be on either end (eg remote "similarity" function, or local "group by" function)
- optimization
- transaction symantics: the query keeps track of used wrappers, notifies them of abort/commit
- how to decide which model to use, if any are yes use wrapper:
- reach out to multiple data sources/servers?
- transactional consistancy required?
- are there multiple distince operations or data sets? UDF tables are OK for combining one set, but wrappers need to be used for nicknames and more complex operations
- why use a DB to back federated data?
- SQL is great (declarative, well used, has extensions)
- transactional guarantees
- easy place to store data ("local store")
- use of all the tools designed for DBs (reporting, web connection, etc)
- can be used for a variety of situations
- national HQ needs to know total sales of all offices (national DB2 queries individuals oracles, then puts them in a view)
- putting together nonrelational data (store takes messages from queue, gets bids from service, puts them into a DB with XML)
- semirelational (store makes reports using excel data and sales data)
- heterogeneous replication (have backup server and dataware house. use triggers to insert data into different places)
- cached data (some tables are "read only" (only udpated by merchants), those can be cached on other machines)
-
Lecture
- motivations:
- data creation excelerating, but not formally managed
- DBs know how to store large amounts of data very well
- good research, commercial projects, etc.
- companies have different DBs, different vendors, for different reasons
- take DB ideas, extend it to other data
- styles of data integration
- inside out: SQL user defined functions, user defined tables,
- outside in: midleware to combine data from different sources
- SOA: have the DB provide a service
HaasDataFederation
|
Tip: To turn text into a link, highlight the text, then click on a page or file from the list above.
|
|
|
Comments (1)
Jim Blomo said
at 11:21 am on Oct 3, 2007
word to your data
You don't have permission to comment on this page.