Mushroom dataset

 

Task: classification

Number of instances: 8124

Number of attributes: 21 (categorical)

Type of attribute to be predicted: categorical with 2 classes

Download the data: Mushroom Dataset

 

The following description is drawn from the UCI Machine Learning Repository: this data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like ``leaflets three, let it be'' for Poisonous Oak and Ivy.

Sources: The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf.

 

Model with 1 variable

The simplest model uses only one explanatory variable, the odor :

* If (odor is a) then (Class is rather edible)

* If (odor is not n) then (Class is rather poisonous)

* If (odor l) then (Class is rather edible)

 

This model enables to correctly classify 8004 out of the 8124 data of the sample (98.5%).

 

Model with 2 variables

Further precision can be obtained by using a second variable in the model, the spore-print-color:

* If (odor is not l) then (Class is rather poisonous)

* If (odor is a) then (Class is rather edible)

* If (odor is n) and (spore-print-color is not r) then (Class is rather edible)

 

This model enables to correctly classify 8076 out of the 8124 data of the sample (99.4%).

 

Model with 3 variables

The third variable that improves the precision of the model is the stalk-surface-below-ring. We can notice the similarity with the model with 2 variables.

* If (odor is not l) then (Class is rather poisonous)

* If (odor is a) then (Class is rather edible)

* If (odor is n) and (stalk-surface-below-ring is not y) and (spore-print-color is not r) then (Class is rather edible)

 

This model enables to correctly classify 8100 out of the 8124 data of the sample (99.7%).

 

Model with a full classification (5 variables)

To classify correctly 100% of the 8124 instances of the dataset, we finally need 5 variables:

* If (bruises is not f) and (odor is not l) and (gill-size is n) then (Class is rather poisonous)

* If (odor is a) then (Class is rather edible)

* If (odor is n) and (stalk-surface-below-ring is not y) and (spore-print-color is not r) then (Class is rather edible)

 

 
 

BLIASoft Knowledge Discovery - Data mining & predictive analytics software - Fuzzy logic & artificial intelligence

              2007-2017 BLIASOLUTIONS - All rights reserved | Terms of use | Site map