Wine recognition data set
Task: classification
Number of instances: 178
Number of attributes: 13 (numerical)
Type of attribute to be predicted: discrete with 3 classes
Download the data: DataWine
These data concern the chemical analysis of a set of 178 wines coming from 3 different producers (of the same area of Italy). The objective is the extraction of models enabling to find out the producer knowing the content of the following components: Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline.
Sources: Forina, M. et al, PARVUS  An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. Data found in the UCI Machine Learning Repository.
Model with 1 variable
The most precise model that uses only one explanatory variable concerns the content of Flavanoids:
* If (Flavanoids is lower than 1) then (Class is rather 3)
* If (Flavanoids is higher than 2,5) then (Class is rather 1)
* Otherwise (Class is rather 2)
It enables to correctly classify 148 of the 178 data of the sample (83%). We can graphically represent it (red curve) with the experimental data (green points):
Model with 2 variables
This model implies a second variable: the content of Proline. It is similar to the first model, but comprises an additional rule:
* If (Flavanoids is lower than 1) then (Class is rather 3)
* If (Proline is higher than 800) then (Class is rather 1)
* Otherwise (Class is rather 2)
It enables to correctly classify 163 data out of 178 (91%). The following graph illustrates this model (the experimental data are the white triangles):
Model with 3 variables
* If (Flavanoids is lower than 1) then (Class is rather 3)
* If (Proline is higher than 800) then (Class is rather 1)
* If (Color intensity is lower than 2) then (Class is rather 2)
It enables to correctly classify 175 data out of 178 (98%). The following graph is a "4D" representation of this model:
Model with 4 variables (full classification)
The following model enables to correctly classify the totality of the 150 instances of the datadet :
* If (Flavanoids is lower than 0,5) and (Color
intensity is higher than 4) then (Class is rather 3)
* If (Alcohol is higher than 12,5) and (Color
intensity is higher than 4) and (Proline is higher
than 600) then (Class is rather 1)
* If (Alcohol decreases) then (Class is rather 2)
