Oracle Data Mining (ODM)

Oracle Data Mining
	Oracle Data Mining
Developer(s)	Oracle Corporation
Stable release	10.gR2 / October, 2006
Type	data mining and analytics
License	proprietary
Website	[1]

Oracle Data Mining (ODM) is a software product distributed as an option to Oracle Corporation's Relational Database Management System (RDBMS) Enterprise Edition (EE) that contains a colection of data mining, machine learning and data analysis algorithms for classification, prediction, regression, clustering, associations, feature selection, feature extraction and sequence alignment (BLAST).

Overview and Design Principles

Oracle Data Mining (ODM) implements data mining algorithms inside the Oracle relational database. Execution tasks run asynchronously and independently of any specific user interface as part of standard database processing pipelines and applications. The model-building, scoring, and metadata management operations are accessed via a GUI (Oracle Data Miner) and either a PL/SQL or Java-based API. The main design principle is to enable data mining algorithms to operate natively on relational database tables and eliminate the need for extraction or transferring of data into a standalone server. The design takes advantage of the database environment which provides the means to efficiently execute large queries and analyze large volumes of data. The results of the data mining operations are stored in database tables and are available for access by generic SQL database queries and database-based reporting tools and applications. ODM is organized around a few generic operations providing a general unified interface for all data mining functions. These operations include functions to create, apply, test and manipulate data mining models. Models can be built using a “create model” function with parameters specifying for example the model name, the function type (e.g., classification), the input table(s), the target field, and the algorithm settings. Other functions provide descriptive information about a data mining model, testing, and management capabilities.

History

ODM was introduced in 20..... It is the succesor to the Darwin data mining toolset developed by Thinking Machines Corporation in the 1990's. Thinking Machines was acquired by Oracle in 1999.

Functionality

ODM supports the following data mining functions:

Data transformations and model analysis:
- Data sampling, data transformation, binning and discretization.
- Model exploration, evaluation and analysis including Receiver Operating Characteristics (ROC) analysis for classification and Residual Plot for regression.

Feature Selection.
- Attribute importance by Minimum Description Length (MDL).

Classification.
- Naive Bayes (NB). It makes predictions using Bayes’ Theorem assuming that each attribute is conditionally independent of the others.
- Adaptive Bayes Network (ABN). It differs from Naive Bayes in that it relaxes the conditional independence assumption to include one or more predictors in a conditionally independent feature, and also determines the number and structure of the features using a greedy, recursive procedure adapted to the data. Provides two flavors of rules: aggregate., for a global understanding of the model’s decision process, and detailed: for insight into why the model made a specific prediction.
- Support Vector Machines (SVM). An implementation of SVM for binary and multi-class classification.
- Decision Trees (DT). It implements Classification & Regressions Trees containing Confidence, Support, Splitting Criterion and surrogate attributes for each node.

Anomaly Detection.
- One-class classification. Support Vector Machines(SVM) build a profile of one class and flag input records that are different from that profile and are therefore abnormal or rare.

Regression
- Support Vector Machines (SVM) implementation for the prediction of a continuous target attribute.

Clustering:
- Enhanced k-means (EKM). It uses a distance-based similarity measure (Euclidean) and tessellates the data space. It can create either balanced or unbalanced hierarchies and handles large data volmes via summarization.
- Orthogonal Partitioning Clustering (O-Cluster). Instead of a distance metric it uses a density based approach to find natural data clusters. It creates unbalanced hierarchical trees using active sampling and orthogonal projections.

Association models:
- A priori algorithm (AM). Association rules capture the co-occurrence of items or events in large volumes of “transactional” data such as in the case of market basket analysis

Feature extraction. Feature Extraction creates new set of features by decomposing the original data in a number of features far smaller than the number of dimensions (attributes).
- Non-Negative Matrix Factorization (NMF). It decompoes the data matrix into the product of two lower rank matrices.

Text and spatial mining:
- Text and non-text columns of data.
- Spatial data.

Specialized analytics:
- Sequence similarity searches and alignment (BLAST).

Data Preparation

Oracle Data Mining accepts as input one or multiple tables and executes the relevant joins and transformations necessary for model building. It supports both transactional and nested data tables.

Graphical User Interface: Oracle Data Miner

Oracle Data Mining can be accessed using Oracle Data Miner a GUI “client” that provides access to the data mining functions and structured templates called Activity Guides that prescribe the order of operations, perform all algorithm-required data transformations and provide intelligent settings and optimizations for model parameters. The user interface also allows the automated generation of Java and/or SQL code associated with the data mining activities.

Text mining

Oracle Data Mining allows the use of text (unstructured data) as an input attribute. The Support Vector Machine, Association Rules, K-Means Clustering, and Non-negative Matrix Factorization algorithms can all operate on text (unstructured data).

PL/SQL Interface

Oracle Data Mining provides a native PL/SQL interface as a set of SQL primitives invoked in program block(s). The interface consists of two PL/SQL packages. For example the code below illustrates a call to build a classification model:

begin

 DBMS_DATA_MINING.CREATE_MODEL(
 model_name    		=> 'SVM_model', 
 function 		=> DBMS_DATA_MINING.classification, 
 data_table_name 	=> ‘multitumor_train', 
 case_id_column_name => 'id', 
 target_column_name 	=> ‘class',
 settings_table_name => 'svm_settings');

end;

Java Interface

Oracle Data Mining also supports a Java API for enabling integration with Web and J2EE applications and to facilitate portability across platforms.

Predictive Analytics and the Explain and Predict Packages

Oracle Data Mining contains two SQL self-contained packages: PREDICT and EXPLAIN for building classification or feature selection models. The results are the predicted scores (PREDICT), or the ranked list of features (EXPLAIN), which can be used as part of an operational pipeline, or displayed on the command line or in a spreadsheet.

Spreadsheet Add-In for Predictive Analytics

This is an add-In to Microsoft Excel that allows users to access the fully automated PL/SQL PREDICT and EXPLAIN packages. The Data may be in either Excel or the Database.

Sequence Alignment (BLAST)

Besides data mining functions ODM also provides functionality for sequence similarity searches using NCBI's BLAST release 2.0 algorithm.

User:PabloTamayo/sandbox