Both the red and white varieties of the Portuguese "Vinho Verde" wine are the subject of the two datasets.These datasets can be thought of as regression or classification tasks.The classes are not balanced and ordered (for instance, there are more average wines than excellent or poor ones).The few excellent or poor wines can be identified using outlier detection algorithms.Additionally, we are unsure as to the relevance of each input variable.Test methods for feature selection could therefore be interesting.Few values were removed at random from two datasets before they were combined.
Input variables (based on physicochemical tests):
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
Output variable (based on sensory data):
12 - quality (score between 0 and 10)
Best Model Accuracy: 88.4