Friday, July 4, 2008

From data warehousing to data mining?

From data warehousing to data mining?


Data Warehouse Usage


1) Three kinds of data warehouse applications
i) Information processing

a) supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs

ii) Analytical processing

a) multidimensional analysis of data warehouse data

b) supports basic OLAP operations, slice-dice, drilling, pivoting

iii) Data mining

a) knowledge discovery from hidden patterns

b) supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools.

Further development of data cube technology)

Further development of data cube technology?


Discovery-Driven Exploration of Data Cubes


1) Hypothesis-driven: exploration by user, huge search space

2) Discovery-driven

a) pre-compute measures indicating exceptions, guide user in the data analysis, at all levels of aggregation

b) Exception: significantly different from the value anticipated, based on a statistical model

c) Visual cues such as background color are used to reflect the degree of exception of each cell

d) Computation of exception indicator (modeling fitting and computing SelfExp, InExp, and PathExp values) can be overlapped with cube construction

Complex Aggregation at Multiple Granularities: Multi-Feature Cubes


1) Ex. Grouping by all subsets of {item, region, month}, find the maximum price in 1997 for each group, and the total sales among all maximum price tuples
select item, region, month, max (price), and sum (R.sales)
from purchases
where year = 1997
cube by item, region, and month: R
such that R.price = max(price