MPIL Help Contents
Introduction
Introduction
The instance-based learner MPIL differs from most previous IBL algorithms in that it 
supports multiple passes over the complete set of instances as opposed to a single pass. 
Furthermore, associated  with every instance is a neighborhood (i.e., spanned by the 
instance), which can be either a sphere or a K-dimensional box. The purpose of the 
neighborhood is to define a degree of similarity between instances defining the same 
concept. Storage reduction is achieved by firstly, appropriately defining the neighborhood of 
every instance, and secondly by removing instances that fall within another instance's 
neighborhood (i.e., assuming only instances that describe the same concept). Instances 
associated with different concepts are used to define valid neighborhoods. A neighborhood 
is considered to be valid if the instances enclosed within it describe the same concept as the 
instance that spans the neighborhood. 

MPIL provides for two different learning mechanisms: MPIL-1 and MPIL-2.
MPIL-1
The simplest approach to building a neighborhood for an instance, is directly based on the 
nearest neighbor algorithm.  Every instance I of K features (i.e., the features of an instance 
describe a concept) is represented by points in RK. By associating a radius R with every 
instance, we can define a neighborhood sphere around it. The instance (i.e., point in K-
space) is at the center of the sphere. The radius of the sphere is determined by the 
instance's nearest neighbor, instance IK, such that the concept described by IK and the one 
by Ij are different and the distance between the two is minimal. The similarity measure used 
is the well known euclidian distance. MPIL-1 guarantees that all training instances, unless 
contradictory, are correctly classified after training. In this sense, MPIL1 is very similar to 
the nearest neighbor algorithm.
MPIL-2
MPIL-2 extends the K-dimensional neighborhood sphere by associating a radius with every 
feature of an instance. The overhead involves storing K radii besides the instance itself. The 
gain is twofold: First, a more dynamic neighborhood can be constructed in the form of a K-
dimensional box structure as opposed to the uniformly expanding sphere. This allows the 
creation of fewer and more precise overlaps of neighborhoods to approximate concepts. 
Second, and even more important, it provides a means to determine whether a feature is 
relevant. In other words, it supports knowledge extraction in the form of rules, an important 
feature when applying multi-pass instance-based learning to the task of knowledge 
acquisition in an expert system environment (requiring both consultation and explanation 
facilities). 

In MPIL-2, all radii of a given instance are initialized to infinity. The motivation for this is to 
later place constraints on an instance's neighborhood by dynamically adjusting its radii 
which only depend on its nearest neighbors. The dynamic adjustment of the radii is based 
on a greedy approach in which as few radii as possible are modified. The rationale behind 
this procedure is to keep as many radii of size infinity as possible, so as to increase the 
detection of irrelevant attributes. After the radii have been determined for every instance, 
features with radii infinity can be deemed irrelevant. A feature with infinite radius cannot by 
itself exclude any other instance of the input space from its neighborhood, and may 
therefore be ignored. To determine if the radius of a feature needs to be constrained (i.e., 
reduced in size), requires one to examine all other instances (belonging to different 
concepts) and check them for identical feature values (i.e., see if they have the same 
distance from the origin). The MPIL-2 algorithm is greedy in the sense that at every 
iteration, a feature is selected for which the majority of neighbor instances belonging to 
different concepts can be excluded. This process is continued until either all radii of the 
current instance have been constrained, or all different class neighbors have been excluded 
from the current instance's neighborhood. The motivation for adopting this strategy is to 
iteratively remove as many neighboring instances belonging to different classes, by 
assigning a finite radius to the most suitable feature (i.e., feature considered important).

Procedures
Learning Instances
To learn in MPIL means to create a set of abstraced instances from a given instance file 
(i.e., extension *.net).  
The following steps need to be executed:


1Go to the Instance Menu and select the Create option.
2The standard File Input Dialog Box will appear.  Simply enter any instance file (*.net) 
and select OK. MPIL will read in the specified instance file.
3Select the appropriate learning mode.  Go to the Parameters menu and select the 
Learning Mode option.  Choose from either MPIL-1 or MPIL-2.
4Finally, open Instances Menu and select the Train option.  
MPIL will now create the abstracted instances from the loaded instances.

After learning is completed a Message Box will appear indicating several important facts.  
Below follows an example:

{bml train-in.bmp}The message box indicates how much time has expired since training 
commenced.  Also listed are the number of inputs, outputs and instances loaded.

Removed Instances details how many of the earlier loaded instances have been removed 
(deemed redundant).

Remaining Features refers to the total number of input features which are contained in all 
abstracted instances.

The Savings Ratio is simply calculated as 

Savings Ratio = Removed Instances / Total Instances.

Total Features is calculated as 

Total features =  Inputs * Total Instances

Features Savings Ratio is calculated as

Features Savings Ratio = Remaining Features / Total Features.
Format of Training File (Extension *.net)
Every training (i.e., instance) file consists of a header and a body.  The former contains an 
optional comment delimited by the special character "#".  If the user does not desire to 
specify a comment then only the"#" character is placed on a separate line.
After the optional comment follows the number of inputs and outputs each instance is 
composed of and the number of instances (i.e., training patterns).

Note, the content of the string preceeding each of the three quantities is irrelevant.

The body of the training file consists of a collection of isnatnces.  Each instance is 
composed of Ninputs and Noutputs.

Example:

Even-Parity-2 Data Set
#
Ninputs 2 Noutputs 1
NTrainingPatterns 4
0.0  0.0     0.0
0.0  1.0     1.0
1.0  0.0     1.0
1.0  1.0     0.0

Format of Testing File
Test Files

Test files contain complete instances, e.g. instances composed of both inputs and output 
classes.  These files have extension *.tes.

Example 1:

NTestPatterns 4
0.0  0.0    0.0
0.0  1.0    1.0
1.0  0.0    1.0
1.0  1.0    0.0
Classification Files

Classification files only contain the input component of instances.  The outputs are 
determine by MPIL using the currently loaded abstracted instances.

Example 2:

NTestPatterns 4
0.0  0.0
0.0  1.0
1.0  0.0
1.0  1.0
Examples
Training and Testing Instances
This example will show the user, how to create an instance file, train it to create a set of 
abstracted instances and finally, test the abstracted instances to measure predictive 
accuracy.

The user needs to perform the following steps:


1Start the MPIL program and select menu option Parameters->Setup->Load.
At the prompt locate the directory c:\MPIL\EXAMPLES\EX3.  Select the file "ex3.stp".
2Now select the menu option Instances->Create.  You will be prompted to locate a 
training file containing instances.  Select the file "iris2.net" and load it.
3To train the just loaded instances, select Instances->Train from the menu.
After some  time (you will see several graphical temperature-like bars display training 
information), a Message Box will appear indicating that training has completed. The 
box will also contain pertinent training information.
Hit OK.  At this point MPIL will load the file "iris.log" into your editor.  This file 
contains some basic training information (i.e., which instances have been removed 
during training).
4Next, we will measure how good the abstracted instances are in predicting all iris 
patterns.  Select from the menu Test->Check Accuracy.  At the first prompt, select the 
file "iris.tes".  At the second prompt input "iris.out".
MPIL will now determine the accuracy of the abstracted instances in predicting the 
instances contained in the file "iris.tes" (test file).
Once completed all statistics are displayed.  Hit OK.  MPIL will load the file iris.out 
into the editor.
This completes this simple example of how to train an instance file (extension *.net) and 
test the resulting abstracted instances (extension *.test).
Rule Extraction and Refinement
The following example will illustrate how the user can extract rules from a set of trained 
instances and how these rules themselves can be refined through further training.  The 
following steps need to be executed:

1From the program menu select Parameters->Setup->Load.  Locate the directory 
C:\MPIL\EXAMPLES\EX1 and load the file "ex1.stp".
2Select the menu option Instances->Create and select the file "or4-1.net".  Hit OK.
3Next, train the loaded file.  From the main menu select Instances->Train.
4Now, extract the rules by selecting Rules->Extract from the menu.  Save the rules to 
the file "or4-1.rul".
5Select Instances->Create and load the file "or4-2.net".  Hit OK.
6Now load the previously extracted rules.  Select from the menu Rules->Import and 
load the file "or4-1.rul".
7From the main menu select Instances->Train.

Congratulations!

You just performed the first steps in knowledge refinement by loading a file of 8 instances 
(describing in part the 4-input OR function), creating a set of abstracted instances (through 
training), extracting these as rules, loading the 8 remaining instances and using them to 
refine the earlier extracted rules.  

You can test the newly created abstracted instances to measure their predictive accuracy.
Training Instances with Continous Outputs
This example will illustrate how continuous outputs can be used in MPIL.  The example uses 
a data set containing instances for controlling a simple DC motor.
The instances consist of a single input and one output (both are continuous).
The user needs to perform the following steps:

1Select Parameters->Setup->Load from the main menu, change to directory 
C:\MPIL\EXAMPLES\EX2 and select the file "ex2.stp".
2Next, select Instances->Create and load the file "dc-motor.net".
3Select Instances->Train to train on the loaded instances.  After training, you will be 
informed that 2 instances (of a total of 10) have been removed.
4Check the accuracy of the abstracted instances by selecting Test->Check Accuracy 
from the menu.  At the input select the file "dc-motor.tes" and chose as the ouput 
"dc-motor.out".  The abstracted instances should provide 100% recognition.
5Let us next relax the training conditions.  Select Parameters->Learning and set the 
parameter Delta = 0.1.
6Repeat steps 2 - 3.  This time MPIL will inform you that 5 instances have been 
removed (from a total of 10).  By increasing the parameter Delta you have made it 
easier for MPIL to represent all 10 instances.  Naturally, the quality of the output has 
also suffered.  If you look at the output file generated during testing, you will notice 
that the CF (Certainty Factors) for some of the outputs have dropped.  The CF 
measure how certain MPIL is that the actual output is correct (values are in [0,1]). 
Also, the actual outputs differ more from the expected outputs then before, but the 
difference between the two is still within the maximum amount stipulated by Delta.

Note: MPIL v. 1.02 can only handle a single continuous output at the moment.
Contacts
Universal Problem Solvers, Inc.
If you have any comments to make or suggestions for improvement, you can contact us at:

Universal Problem Solvers, Inc.
610 South Duncan Ave.
Clearwater, FL 34616
USA
Phone: (813) 441-1857
e-mail: zlxx69a@prodigy.com
http://pages.prodigy.com/upso
Other Products
Universal Problem Solvers is proud to present the following software products:

TDL - Trans-Dimensional Learning

Neural Network software for Windows 3.* and Windows 95 that allows for quick, automatic 
construction of neural networks.  Supports incremental learning and trans-dimensional 
learning - learning data sets of various specifications (i.e., inputs and outputs) within one 
coherent network.  Includes use of weighted as well as weightless neurons (i.e., constructs 
semi-weighted networks).  
There is simply no easier way to construct efficient NNs.

fSC-Net - Fuzzy Symbolic Connectionist Network

Combines fuzzy logic, connectionist and traditional symbolic features into one coherent 
system.  Supports both incremental, single-pass learning, feature pruning, rule-extraction, 
and automatic learning of fuzzy membership functions.  A very fast learning system with 
plenty of power.

You can find out about all our products (newest releases and prices) by either requesting a 
price-sheet or by visting our WWW Site.  Contact us today!
References
Menus
File Menu
Edit
Loads the resident word-processor.  Can be used to view or edit MPIL created files.
Exit
Quits the MPIL application.
Instances Menu
Create
Allows the user to read a training file, which contains the instances MPIL will utilize during 
learning.  The user will be prompted to enter the name of a training file (i.e., extension 
*.net), after which the file is consulted.
Load
Lets the user load a file of saved abstracted instances (i.e., extension *.sav).
Abstracted instances consist of groups of instances and their respective radii. 
The file had to be created earlier using Save Menu.
Save
Allows the user to save the abstracted instances which were created by MPIL after the 
learning process completed.
Train
Creates (i.e., learns) a set of abstracted instances from the originally loaded instance file 
(*.net).  Training occurs by either utilizing modes MPIL-1 or MPIL-2. The learn mode can be 
set from within the Parameters Menu.
Test Menu
Check Accuracy
Allows the user to determine the predictive accuracy of the currently loaded set of 
abstracted instances.  First, the system will require the user to enter the file name 
containing the test instances (i.e., file extension *.tes). Second, the user is required to 
supply an output file name to which all results and messages are written to.  Once testing 
has completed sucessfuly, all pertinent information can be retrieved by selecting the Test 
Menu (results are stored as separate entries underneath the double-bar).
Classify
Similar to Check Accuracy, except that the first file is a classification file (i.e., extension 
*.cls).  Once this file has been read, MPIL will classify the pattern contained in the 
classification file by comparing them to the stored abstracted instances.  All results are 
written to a user-supplied output file.
Batch Menu
Random Testing
Allows the user to create random partitions of an instance file.  Two separate files are 
created: a training file and a testing file.  The user sets the starting and ending partion sizes 
as well as the delta stepsize (defaults values are 10%, 90% and 10%, respectively).  Once 
these parameters have been set, MPIL will perform n-fold cross-validation for the various 
partition sizes.  All results are written to a user-specified output file. The parameter n 
(controls the  number of random tests performed) is set by the user.  It is located in the 
Parameters Menu as the option Num. Batch Runs.


Example:

Let us make the following assumptions:
Starting Partition Size is 10%.
Ending Partition Size is 90%.
Delta stepsize is 10%.
Num. Batch Runs is 5.
Instance input file is "ex.net".
Result output file is "ex.out".

MPIL will partition the instance file "ex.net" into two random partitions. Initially, the partitions 
will be such that 10% of all the instances stored in the file "ex.net" are utilized for training 
and the remaining 90% are used for testing.  This step is repeated 5 times and the average 
predictive accuracy is recorded.  Once the above step is completed, it is again repeated but 
this time 20% of the instances listed in the file "ex.net" are used for training and the other 
80% are utilized
during testing.  This procedure is repeated for the remaining training partition sizes (i.e., 
30%, 40%, 50%, 60%, 70%, 80% and 90%).

All partitions are based on a uniform random distribution and the created partition files (i.e., 
training and testing) are disjoint.
Rules Menu
Extract
Selecting this option will cause MPIL to convert the stored abstracted instances into easy to 
read rules.  The rules are written to a user-specified output file (i.e., extension *.rul).  

This option only works in conjunction with learning mode MPIL-2.

Example:

Below follows a set of rules that was generated for the 4-input boolean OR function:

RULE 1:

       IF (-0.999000 < Feature_0 < 0.999000) &
           (-0.999000 < Feature_1 < 0.999000) &
           (-0.999000 < Feature_2 < 0.999000) &
           (-0.999000 < Feature_3 < 0.999000) THEN Output_0 ;


RULE 2:

       IF (0.001000 < Feature_3 < 1.999000) THEN Output_1 ;


RULE 3:

       IF (0.001000 < Feature_2 < 1.999000) THEN Output_1 ;


RULE 4:

       IF (0.001000 < Feature_1 < 1.999000) THEN Output_1 ;


RULE 5:

       IF (0.001000 < Feature_0 < 1.999000) THEN Output_1 ;


Explanation:

All inputs are labled as FEATURE_X, where X refers to the inputs position in the instance 
file (i.e., extension *.net).  In the current example we are dealing with the 4-input boolean 
OR function, hence there are 4 features (inputs): FEATURE_0, FEATURE_1, FEATURE_2 
and FEATURE_3.

The term 

-0.999000 < Feature_0 < 0.999000 

is true if the value of the first input (i.e., FEATURE_0) lies between (-1,+1). Since we are 
dealing with boolean inputs in this example, the only two values allowed are 0 or 1.  In other 
words, the above term states that FEATURE_0 has to have a value of 0 for this term to 
return a value of true.  

The action part of the first rule states Output_0.  The _0 refers to the output class 0 (as 
stated in the instance file).  Analogously, Output_1  refers to the output class 1.

MPIL always supplies rules for all output classes (i.e., categories).

Import
Allows the user to load rules (either generated by MPIL or by one or more knowledge 
engineers) as abstracted instances.  By utilizing MPIL's train, rule extract and import 
functions, it is possible to perform iterative knowledge refinement.

For example, one can start with an initial set of rules, load these into MPIL and refine them 
by adding a file of instances for training.  Once a new set of abtracted instances have been 
created, the user can export these as rules (and possibly modify them by hand).  These 
steps can be repeated any number of times by the knowledge engineer(s).

Note:

In order to import rules into MPIL, the user needs to inform MPIL of the number of inputs 
and outputs used before importing the rules.  This can be achieved by loading a dummy-file 
of training instances (containing only the header file and no instances) prior to importing the 
rules.  The following may serve as an example:

OR-4 Header File#
Ninputs 4 Noutputs 1
NtrainingPatterns 0


Parameters Menu
Learn Mode
The user can select either multi-pass instance-based learning algorithm MPIL-1 or MPIL-2.
Distance Measure
MPIL supports three different distance measures to decide how similar two instances are:
1Hamming
2Euclidian
3Square
Display Info
If this flag is set then additional information (in the form of temperature-like bars) is 
displayed during training, testing and several other MPIL operations.
Show Files
Setting this flag will result in the display of MPIL generated information files after training, 
testing and rule extraction.
Learning
Allows the user to modify three different learning parameters:
1Alpha

This parameter decides by how much the neighborhood spheres utilized by MPIL_1 
are allowed to overlap.  If this parameter is set to 1.0 the overlap is at its maximum.  A 
value close to zero, allows for almost no overlap.  
Note; a value of zero is not supported!

2Delta

This parameter is used to decide if the difference between two input value is 
significant enough to consider the associated outputs to be different.  

Note; If the outputs of a training file are continuous, then the Delta parameter decides 
how many discreet output classes are formed.  In general, the closer the value is to 
zero, the more different output classes are created.  At the same time fewer instances 
will be removed during training (less redundancy and therefore less storage savings).
Of course, the larger the Delta value is chosen, the above stated conditions reverse 
themselves.

3Epsilon

The EPSILON parameter determines how far away the radius of an instance is with 
respect to its nearest neighbor (i.e., the current instance and its nearest neighbor 
have to belong to different classes).

Num. Batch Runs
This parameter is equivallent to the n parameter in n-fold cross-validation.  It is used in 
conjunction with the Random Testing option located under Batch Menu.
Setup
Load

Allows the user to load a setup file containing all MPIL parameter settings.
Save

Allows the user to save all current parameter and menu settings to a setup file (extension 
*.stp)
MPIL automatically saves the current menu and parameter settings once the user shuts 
down (i.e., Exits) the software and automatically recalls the settings upon program start.  
The information is maintained in the setup file mpil.stp.
Resources
{bml resouces.bmp}
Contains information about current MPIL resources.  The example message box to the left 
indicates what the maximum number of inputs, outputs and examples (i.e.,instances)
are and how much is remaining.
Help Menu
Search for ...
This help view tool.
Questions and Comments ...

Indicates how to conatct the creators of MPIL.

About MPIL
Informs the user who created MPIL, states the copyright status and the current version 
number.
{bml memory.bmp}

Informs the user of the currently available amount of memory and the stack size.
Index
Glossary
A
abstracted: abstracted instance consist of a regular instance and one or more radii.  It is 
created by employing learning mode MPIL-1 or MPIL-2.
B
Batch Menu: <Batch Menu>
C
Certainty Factors: Factors in MPIL indicate how good a prediction is.  CF values are within 
the range [0,1], where 1 indicates complete certainty.
concept: terms concept, output and class are used interchangeably within the context of 
MPIL.
Contact: <contact>
contradictory: or more instances are considered to be contradictory, if all their inputs are 
the same but their outputs are different.
Create: <Instances Menu>
E
euclidian: known distance measure employed by many IBL algorithms.  It is calculated as 
the square-root of the sum of the differences of all instance' inputs.
expert system: systems are computer programs that attempt to emulate the performance 
of human experts in specific fields of expertise  (i.e, medical, engineering, mathematics, 
etc.)
F
features: terms feature and input of an instance are used interchangeably within MPIL.
H
Hamming: the similarity of two instances by determining the sum of the absolute 
differences for all inputs.
I
IBL: general class of instance-base learning algorithms.  MPIL-n are members of this class.
instance file: <Training>
Instance Menu: <Instances Menu>
instance-based: -based learning algorithms generally do not make use of necessary and 
sufficient features for representing a concept.  A concepts' properties are a function or blend 
of its instances properties.  This stands in contrast to other machine learning paradigms 
(i.e., decision trees, version spaces, statistical models, neural networks, etc.)
Instances Menu: <Instances Menu>
irrelevant attributes: attributes or inputs are those deemed unnecessary by a learning 
algorithm.  In other words, their presence or absence does not affect the accuracy of the 
classification process.
K
K-dimensional box: hyperbox neighborhood is formed by learn mode MPIL-2 for loaded 
instances.
knowledge acquisition: to an important step - and in general considered a bottleneck - in 
the construction of expert systems.  It deals with the question of how to acquire the 
knowledge that describes some field of human expertise.
knowledge extraction: to the ability of an algorithm to present its gained knowledge in a 
form which is readily accessible to humans.
L
learn mode: <Parameters Menu>
Learning Mode: <Parameters Menu>
M
MPIL: -Pass Instance-Based Learning is an instance-based learning (IBL) algorithm and 
comes in two flavors: MPIL-1 and MPIL-2.
multiple passes: to the fact that both learn modes MPIL-1 and MPIL-2 make more than 
one pass over all instances.  This stands in contrast to other IBL algorithms which may only 
perform a single pass.
N
nearest neighbor: of all IBL algorithms. During learning all instances are simply stored.  
During testing instances are compared against all stored instances and a distance measure 
(i.e., euclidian) is utilized to determine which stored instance is most similar to the one 
being tested.  K-NN is a more general and flexible version of the simple nearest neighbor 
algorithm.
neighborhood: neighborhood is defined by either one or more radii and an instance.  The 
instance acts as a point in n-dimensional space, whereas a single radius defines a 
hypersphere-like neighborhood and multiple radii define a hyperbox.
Num. Batch Runs: <Parameters Menu>
P
Parameters menu: <Parameters Menu>
predictive accuracy: measure used to decide how well as set of abstracted instances are 
performing as predictors of new instances.  Generally, it is calculated as the ratio of 
incorrect classifications over all possible classifications.
R
Random Testing: <Batch Menu>
rules: are both popular and convenient methods for expressing knowledge.  A rule consists 
of both a premise and an action part.  If the premise part of a rule evaluates to the boolean 
value TRUE the action part is executed.  MPIL-2 allows for the extraction of rules from 
abstracted instances.  MPIL-1 does not!
S
similarity measure: used to determine how similar two instances are.  In general, a 
distance measure is applied to the two instances (i.e., euclidian).
similarity: attempts to measure how similar two instances are by utilizing a distance 
measure like the euclidian distance measure.
sphere: spherical neighborhood is created by applying learn mode MPIL-1 to a given set of 
instances.
Square: the similarity of two instances by determing the maximum difference between all 
inputs.
T
testing file: <Testing>
Train: <Instances Menu>
training file: <Training>
