Flat Files
Regression Datasets

Sweedish Auto Insurance Claims
Status: Clean dataset ready for modeling
Columns (n=2):
Claims: Total number of insurance claims
Payments (Outcome): Total sum of claims (in thousand of Swedish Kronor)
Rows (n=63):
Individual Swedish Providences

Abalone Snail Age
Status: Clean dataset ready for modeling. Contains categorical and numeric variables with interacting variables.
Columns (n=9):
Sex: (M)ale, (F)emale, (I)nfant
Length: Measure of longest dimension of shell
Diameter: Measure perpendicular to the Length measure
Height: Vertical measure of shell
Whole Height: Vertical measure of shell and snail
Shucked Weight: Weight of snail without shell
Viscera Weight: Snail gut weight
Shell Weight: Weight of dried shell
Rings (Outcome): Each ring is approximately 1.5 years of age
Rows (n=4,177):
Individual Abalone snails
Classification Datasets

White Wine Quality Testing
Status: Clean dataset, but has interacting variables and imbalanced Outcomes. Used for multi-class classification or regression modeling.
Columns (n=12):
Fixed Acidity: Amount of non-volatile acids (does not evaporate). Gives wines a beneficial sour or tart taste.
Volatile Acidity: Amount of acetic acid. High levels negatively affect wine by giving it a vinegar taste.
Citric Acid: Can beneficially affect flavors of wine
Residual Sugar: Amount of sugar remaining after fermentation. Higher levels give wine a sweet taste.
Chlorides: Amount of salt in wine
Free Sulfuric Dioxide: Prevents oxidation of wine
Total Sulfuric Dioxide: At high levels, affects smell and taste of wine
Density: Molecular density of wine, affected by alcohol and sugar content
pH: Acidic measure of the wine (0 = Acidic, 7 = neutral, 14 = Basic)
Sulphates: Added to wine to moderate Sulfur Dioxide levels (Free Sulfuric Dioxide, Total Sulfuric Dioxide)
Alcohol: Percent of alcohol content in wine
Quality (Outcome): Judge's score between 0 and 10
Rows (n=4,898):
Individual wine samples

Banknote Authenticity Testing
Status: Clean dataset with balanced Outcomes.
Columns (n=5):
Variance: Measure of variability in image
Skewness: Measure of symmetry
Kurtosis: Measure of 'tailedness' of distribution
Entropy: Measure of randomness in image
Authenticity (Outcome): 0 for authentic, 1 for inauthentic
Rows (n=1,372):
Individually photographed banknotes

Iris Flowers
Status: Clean dataset with balanced Outcomes.
Columns (n=5):
Sepal Length: Length of leaf that encases flower bud
Sepal Width: Width of leaf that encases flower bud
Petal Length: Average length of petal
Petal Width: Average width of petal
Flower (Outcome): Iris Setosa, Iris Versicolour, Iris Virginica
Rows (n=150):
Individual flower samples