User:PabloTamayo/sandbox

Source: Wikipedia, the free encyclopedia.

Oracle Data Mining (ODM)

Oracle Data Mining (ODM) is a software product distributed as an option to Oracle Corporation's Relational Database Management System (RDBMS) Enterprise Edition (EE) that contains a colection of data mining, machine learning and data analysis algorithms for classification, prediction, regression, clustering, associations, feature selection, feature extraction and sequence alignment (BLAST).

Oracle Data Mining
Developer(s)Oracle Corporation
Stable release
10.gR2 / October, 2006
Typedata mining and analytics
Licenseproprietary
Website[1]

Overview and Design Principles

Oracle Data Mining (ODM) implements data mining algorithms inside the Oracle relational database. Execution tasks run asynchronously and independently of any specific user interface as part of standard database processing pipelines and applications. The model-building, scoring, and metadata management operations are accessed via a GUI (Oracle Data Miner) and either a PL/SQL or Java-based API. The main design principle is to enable data mining algorithms to operate natively on relational database tables and eliminate the need for extraction or transferring of data into a standalone server. The design takes advantage of the database environment which provides the means to efficiently execute large queries and analyze large volumes of data. The results of the data mining operations are stored in database tables and are available for access by generic SQL database queries and database-based reporting tools and applications. ODM is organized around a few generic operations providing a general unified interface for all data mining functions. These operations include functions to create, apply, test and manipulate data mining models. Models can be built using a “create model” function with parameters specifying for example the model name, the function type (e.g., classification), the input table(s), the target field, and the algorithm settings. Other functions provide descriptive information about a data mining model, testing, and management capabilities.

History

ODM was introduced in 20..... It is the succesor to the Darwin data mining toolset developed by Thinking Machines Corporation in the 1990's. Thinking Machines was acquired by Oracle in 1999.


Functionality

ODM supports the following data mining functions:

  • Association models:
    • A priori algorithm (AM). Association rules capture the co-occurrence of items or events in large volumes of “transactional” data such as in the case of market basket analysis
  • Feature extraction. Feature Extraction creates new set of features by decomposing the original data in a number of features far smaller than the number of dimensions (attributes).
  • Specialized analytics:

Data Preparation

Oracle Data Mining accepts as input one or multiple tables and executes the relevant joins and transformations necessary for model building. It supports both transactional and nested data tables.

Graphical User Interface: Oracle Data Miner

Oracle Data Mining can be accessed using Oracle Data Miner a GUI “client” that provides access to the data mining functions and structured templates called Activity Guides that prescribe the order of operations, perform all algorithm-required data transformations and provide intelligent settings and optimizations for model parameters. The user interface also allows the automated generation of Java and/or SQL code associated with the data mining activities.

Text mining

Oracle Data Mining allows the use of text (unstructured data) as an input attribute. The Support Vector Machine, Association Rules, K-Means Clustering, and Non-negative Matrix Factorization algorithms can all operate on text (unstructured data).

PL/SQL Interface

Oracle Data Mining provides a native PL/SQL interface as a set of SQL primitives invoked in program block(s). The interface consists of two PL/SQL packages. For example the code below illustrates a call to build a classification model:

begin

 DBMS_DATA_MINING.CREATE_MODEL(
 model_name    		=> 'SVM_model', 
 function 		=> DBMS_DATA_MINING.classification, 
 data_table_name 	=> ‘multitumor_train', 
 case_id_column_name => 'id', 
 target_column_name 	=> ‘class',
 settings_table_name => 'svm_settings');

end;

Java Interface

Oracle Data Mining also supports a Java API for enabling integration with Web and J2EE applications and to facilitate portability across platforms.

Predictive Analytics and the Explain and Predict Packages

Oracle Data Mining contains two SQL self-contained packages: PREDICT and EXPLAIN for building classification or feature selection models. The results are the predicted scores (PREDICT), or the ranked list of features (EXPLAIN), which can be used as part of an operational pipeline, or displayed on the command line or in a spreadsheet.

Spreadsheet Add-In for Predictive Analytics

This is an add-In to Microsoft Excel that allows users to access the fully automated PL/SQL PREDICT and EXPLAIN packages. The Data may be in either Excel or the Database.

Sequence Alignment (BLAST)

Besides data mining functions ODM also provides functionality for sequence similarity searches using NCBI's BLAST release 2.0 algorithm.


References

Further Readings

External Links