If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

HaasDataFederation

Page history last edited by Jim Blomo 16 years, 2 months ago

Paper

companies may have data stored in a diverse set of places, need to access it uniformly somehow
you can write customized combiners, but they are fragile and hard to extend
application-integration or component-based frameworks are better because they offer a known interface, but data schemes may change from under them
workflow-systems provide some consistancy, but limited support for data comparison and manipulation
portals can grab data from difference places, but in order to do things more complex than aggregation, a developer is needed
data warehousing moves data from different sources to one relational DB, but may be hard because of cost. and it doesn't support functions of the original systems
a "gateway" allows a federated DBMS to route a query to another source (DB)
"transparancy": masking the difference between data sources
support heterogeneity: support for hardware, software, data model, interface, protocols
high degree of function: use functions on the native sources
extensibility

scalar UDF: user defined function, for example SELECT db2mq.mqsend(a.headline) WHERE headline.date >= blah
- sends a message to MQSeries
table UDF: functions that act like tables: eg in SELECT file.name from TABLE(dir(/etc, cron*))
wrapers: most flexible, wrap parts of an SQL statement and push them to an external data source (eg oracle), then do the rest of the processing (eg relevance) locally
- offer multi data source abstraction
- operations including update and delete that get translated into the foriegn commands
- operations can be on either end (eg remote "similarity" function, or local "group by" function)
- optimization
- transaction symantics: the query keeps track of used wrappers, notifies them of abort/commit
how to decide which model to use, if any are yes use wrapper:
- reach out to multiple data sources/servers?
- transactional consistancy required?
- are there multiple distince operations or data sets? UDF tables are OK for combining one set, but wrappers need to be used for nicknames and more complex operations
why use a DB to back federated data?
- SQL is great (declarative, well used, has extensions)
- transactional guarantees
- easy place to store data ("local store")
- use of all the tools designed for DBs (reporting, web connection, etc)
can be used for a variety of situations
- national HQ needs to know total sales of all offices (national DB2 queries individuals oracles, then puts them in a view)
- putting together nonrelational data (store takes messages from queue, gets bids from service, puts them into a DB with XML)
- semirelational (store makes reports using excel data and sales data)
- heterogeneous replication (have backup server and dataware house. use triggers to insert data into different places)
- cached data (some tables are "read only" (only udpated by merchants), those can be cached on other machines)

Lecture

motivations:
- data creation excelerating, but not formally managed
- DBs know how to store large amounts of data very well
- good research, commercial projects, etc.
- companies have different DBs, different vendors, for different reasons
- take DB ideas, extend it to other data
styles of data integration
- inside out: SQL user defined functions, user defined tables,
- outside in: midleware to combine data from different sources
- SOA: have the DB provide a service

HaasDataFederation

Paper

Lecture

HaasDataFederation

Page Tools

Insert links

Comments (1)

Jim Blomo said

Join this workspace

Navigator

SideBar

Recent Activity