Flat Files

Regression Datasets

Sweden Auto Insurance Claims.xlsx

Sweedish Auto Insurance Claims

Status: Clean dataset ready for modeling

Columns (n=2):

  • Claims: Total number of insurance claims

  • Payments (Outcome): Total sum of claims (in thousand of Swedish Kronor)

Rows (n=63):

  • Individual Swedish Providences

Abalone.xlsx

Abalone Snail Age

Status: Clean dataset ready for modeling. Contains categorical and numeric variables with interacting variables.

Columns (n=9):

  • Sex: (M)ale, (F)emale, (I)nfant

  • Length: Measure of longest dimension of shell

  • Diameter: Measure perpendicular to the Length measure

  • Height: Vertical measure of shell

  • Whole Height: Vertical measure of shell and snail

  • Shucked Weight: Weight of snail without shell

  • Viscera Weight: Snail gut weight

  • Shell Weight: Weight of dried shell

  • Rings (Outcome): Each ring is approximately 1.5 years of age

Rows (n=4,177):

  • Individual Abalone snails

Classification Datasets

Wine Quality.xlsx

White Wine Quality Testing

Status: Clean dataset, but has interacting variables and imbalanced Outcomes. Used for multi-class classification or regression modeling.

Columns (n=12):

  • Fixed Acidity: Amount of non-volatile acids (does not evaporate). Gives wines a beneficial sour or tart taste.

  • Volatile Acidity: Amount of acetic acid. High levels negatively affect wine by giving it a vinegar taste.

  • Citric Acid: Can beneficially affect flavors of wine

  • Residual Sugar: Amount of sugar remaining after fermentation. Higher levels give wine a sweet taste.

  • Chlorides: Amount of salt in wine

  • Free Sulfuric Dioxide: Prevents oxidation of wine

  • Total Sulfuric Dioxide: At high levels, affects smell and taste of wine

  • Density: Molecular density of wine, affected by alcohol and sugar content

  • pH: Acidic measure of the wine (0 = Acidic, 7 = neutral, 14 = Basic)

  • Sulphates: Added to wine to moderate Sulfur Dioxide levels (Free Sulfuric Dioxide, Total Sulfuric Dioxide)

  • Alcohol: Percent of alcohol content in wine

  • Quality (Outcome): Judge's score between 0 and 10

Rows (n=4,898):

  • Individual wine samples

Banknote Authenticity.xlsx

Banknote Authenticity Testing

Status: Clean dataset with balanced Outcomes.

Columns (n=5):

  • Variance: Measure of variability in image

  • Skewness: Measure of symmetry

  • Kurtosis: Measure of 'tailedness' of distribution

  • Entropy: Measure of randomness in image

  • Authenticity (Outcome): 0 for authentic, 1 for inauthentic

Rows (n=1,372):

  • Individually photographed banknotes

Iris.xlsx

Iris Flowers

Status: Clean dataset with balanced Outcomes.

Columns (n=5):

  • Sepal Length: Length of leaf that encases flower bud

  • Sepal Width: Width of leaf that encases flower bud

  • Petal Length: Average length of petal

  • Petal Width: Average width of petal

  • Flower (Outcome): Iris Setosa, Iris Versicolour, Iris Virginica

Rows (n=150):

  • Individual flower samples

Clustering Datasets