Which country offers the best value wines?

10 min readMay 19, 2021

Introduction

Every time, I decide to buy a wine in the supermarket, I am standing in front of the wine shelf, taking pictures and checking reviews on Vivino app. However, since there are around 20–50 bottles on a typical shelf, it takes a lot of time to evaluate each of them (and looks a bit weird too)…
So, I decided to run a research to help people make quick and dependable decisions which country offered best value wines based on the wine type (red/white/sparkling/etc) and budget.
Specifically, my goal was to come up with easy-to-remember suggestions by comparing how different countries performed in terms of value for each wine type and price category.

So, by the end of 2020, I and a friend of mine Rodolfo Briones, ran a research to find which country produced the best value wines across different price categories and wine types.

But first things first...we mentioned ‘Value’ several times. How did we measure Value? In this research, we defined Value as a function of Average User Rating divided by the Price (more below).

Here goes the data journey:

Step 1: Get the data

See Python’s scraper job on Github

At first, I got data by scraping from the Vivino website using Selenium. In total, around 30,000 were scraped— almost their whole wine catalog.

Step 2: Define The Research

See full code on Google Colab for Introduction

Then we defined the research question and the variables.

Research question: Which country offers the best value wines for each wine type and price range?

Target Variable: Value

Value is derived from the formula: (Avg. User Rating / Price) * 100
(Higher the value, the better)

For instance:

If a bottle had a Price: 10€ & Rating: 4.6, Value: (4.6 / 10) * 100 = 46
If a bottle had a Price: 20€ & Rating: 4.0, Value: (4.0 / 20) * 100 = 20
If a bottle had a Price: 100€ & Rating: 3.9, Value: (3.9 / 100) * 100 = 3.9

Features:

Wine type: Red / White / Sparkling / Rose
Producer country: France / Chile / Italy /etc
Price in €
The price buckets:

Low: 𝑃rice <10€
Medium: 10€≤ 𝑃rice <25€,
Premium: 25€≤𝑃rice <100€,
Luxury: 𝑃rice ≥100€

5. Category: Wine Type + Price bucket
e.g. Red-Low, White-Medium

6. Group: Country + Wine Type + Price bucket
e.g. France-Red-Low, Argentina-White-Medium

7. Average User Rating: from 1. to 5. (5 being the highest)

8. Count Review Per Wine: number of User Ratings per bottle

9. Apart from those above: Harvest Year, Grape Type, Name of the Winery, Link on the website, etc.

Step 3: Data Wrangling

See full code on Google Colab for Data Wrangling

Then I went for the heaviest duty task of all Data Science: Cleaning, Tidying, Remapping, Solving Duplicates & Missing values. 👷

In the beginning, we had the following outlook of unclean dataset:

While fixing duplicates was a straightforward task (dropping rows), solving for missing values was an interesting part as I went on a non-linear route… To fix missing values for the price, I checked if the category the bottle belonged to showed low Relative Standard Deviation (aka. Coefficient of Variation) compared to its mean price. The coefficient of variation (CV) is a relative measure of variability that indicates the size of a standard deviation in relation to its mean (more here).

If category’s Coefficient of Variation was low (<= 0.5), I replaced missing values with average prices per category
If category’s CV’s were high (> 0.5), I dropped the associated wines

As a result:

2050 duplicate and 1014 missing values have been dropped
140 missing observations have been fixed using Low CV method

Finally, after data-cleaning has been completed, the snapshot looked like this:

Step 4: Describing the Clean Data

See full code on Google Colab on descriptive statistics

After the Data Wrangling, we ended up with around 25,000 bottles and statistics looked like below:

Country representation for each wine type

Quick Inferences
Upfront, without statistical testing, dataset confirmed general world-wide assumptions (see the first point below), as well as, intrigued us about some new, interesting pieces of information:

Low priced wines show highest Value.
Rose wines have the highest Value and Dessert wines show the lowest Value 👀

Step 5: Modeling — Sigmoid Scaling Factor

See full code on Google Colab on Modeling

So far, the rating of a bottle was simply a mean of all ratings associated to it. However, it didn’t account for the amount of reviewers that contributed to the average score.

We believe that the confidence of bottle’s rating is strengthened as it receives more reviews and weakened, if vice versa. For example, without modeling, two different wines that are sold for 10 euros each and have a rating of 4.0, would both have a value of 40 ((4/10) * 100) even though the first bottle was rated by 5,000 reviewers and the second one by 50. Therefore, in this chapter, we created a statistical model that scales up ratings of those wines that have high number of reviews and scaled down those with a low number of reviews.

To scale the ratings, we used a Sigmoid function that considers bottle’s Review Count — Z score compared to the category that it belongs to. Z-score is a measure of how many standard deviations below or above the population mean, a raw score is.

More specifically, our goal was to devalue wines that have less reviews than the population mean more than rewarding those wines that have more reviews than the population mean; therefore, we came up with the following scaling factor formula:

scaled_factor = 5/(4 + np.exp(-z_value))

Once applying the formula to the dataset, it behaved in the following manner (value multiplier representing a percentage on which we multiplied Rating & Value fields):

Here is the frequency table of Z-score values from the dataset. Here we see that most values fall between -0.5 and 0.5 Z scores.

Here we see the result, how values changed after applying the function:

The difference between original and scaled Values

dark green represents that values were kept the same
yellow represents that values decreased
light green represents that values increased

Frequency Table confirmed what we intended to achieve — to reward highly reviewed wines and vice versa: penalize less reviewed wines but not to the extent that the overall distribution changed. Here, we see that the distribution is still right skewed and the modifications played a refinement purpose as opposed to completely altering the distribution.

Part 5: Statistical Testing

See full code on Google Colab for Statistical Testing

Before we started statistical testing, we had to choose the right model.

Looking at Histograms, we see that categories generally do not follow normal distributions: most are right-skewed, some are uniformal and some are somewhat normally distributed.

After further analysis, we came to conclusion that we had to use non-parametric tests (read more here) such as Kruskal–Wallis test for initially filtering categories that generally did not have statistically significant differences between their groups(countries). Later, we applied Post-Hoc analysis using Conover’s Test to see which groups(countries) within the categories showed statistically significant differences to each other.

We ran these tests separately for all categories and results were similar to this (read more here):

After running Kruskal’s and Conover’s tests, we followed the following strategy:

filtering categories as a whole: to keep only those categories that showed at least 3 statistically significant differences with other countries
filtering groups(countries) inside categories: to keep only those countries that showed statistically significant differences with at least 50% of other countries per category (P < 0.05).

This brought us to the following results:

Categories (and their respective countries) that passed our filtering criteria:

Red — Low for All Countries
Red — Medium for All Countries
Red — Premium for all countries except Italy, Spain, Portugal, Germany, Argentina and USA
White — Low for All Countries
White — Medium for all countries except USA
White — Premium for all countries except Australia
Sparkling — Medium for all countries except France and Portugal

Categories that were fully dismissed based in the filtering criteria:

White — Luxury
Red — Luxury
Sparkling — Low
Sparkling — Premium
Sparkling — Luxury
Rose — Low
Rose — Medium
Rose — Premium
Rose — Luxury

Part 6: Findings

See full code on Google Colab for Findings

Finally, we came to the end —to see which countries offered the best value wines for each wine type and price category.
In the following table, you see categories and countries with average values of Rating, Price, Value and Count.

Places were distributed as such:

🏆 was given to a country that produced the highest value and showed statistically significant differences with the 2nd and 3rd place countries, as well as, to other countries
🥇 was given to a country that produced the highest value but didn’t show statistically significant differences with the 2nd and 3rd place countries but still displayed high statistical significance with the rest of the countries
🥈 was given to a country that had the second best value proposition
🥉 was given to a country that had the third best value proposition

1. White-Low (0-10€)
   Country            Rating  Price  Value  Count  Place
      Portugal          3.7    7.1   55.2    282     🏆
      Spain             3.6    7.5   51.0    323     🥈
      France            3.6    7.7   48.5    389     🥉
      Italy             3.6    7.8   48.0    415   
      Argentina         3.5    7.7   46.7     73
      Chile             3.4    7.8   45.6    143
      Germany           3.6    8.5   43.8    270
      Australia         3.4    8.1   42.8     59
      Austria           3.6    8.7   41.4    1282. White-Medium (10€ to 25€)     
   Country            Rating  Price  Value  Count  Place
      Spain             3.9   15.1   27.6    299     🥇
      Argentina         3.7   14.1   27.2     66     🥇
      Portugal          3.8   15.2   26.8    247     🥇
      Chile             3.6   14.2   26.2     82     🥈
      Germany           3.6   15.6   24.4    714     🥉
      Italy             3.8   16.4   24.4    872
      USA               3.7   16.4   24.3    188
      France            3.7   16.4   24.0    684
      Austria           3.5   15.8   23.3    454
      Australia         3.5   15.9   22.9     803. White-Premium (25€ to 100€)   
   Country            Rating  Price  Value  Count  Place
      Portugal          4.2   39.4   11.8     65    🥇
      Spain             4.2   39.0   11.8     76    🥇
      Italy             4.2   39.6   11.5    310    🥇
      Germany           3.7   40.5   10.0    328    🥈
      Austria           3.6   39.8   10.0    288    🥉
      USA               4.0   46.4    9.7    205
      France            4.0   52.4    8.8    6104. Red-Low (0-10€)             
   Country            Rating  Price  Value  Count  Place
      Portugal          3.7    7.4   55.0    274    🏆
      Italy             3.6    7.8   48.2    875    🥈
      Spain             3.5    7.6   48.2    453    🥈
      Argentina         3.6    8.0   47.1    114    🥉
      Chile             3.5    8.1   44.6    329
      France            3.4    8.0   44.0    890
      Australia         3.3    8.0   43.6    130
      Germany           3.3    8.0   43.1    127
      Austria           3.4    8.7   39.7    1015. Red-Medium (10€ to 25€)
   Country            Rating  Price  Value  Count  Place
      USA               4.2   15.6   28.1     76    🥇     
      Argentina         4.0   15.5   27.2    176    🥇
      Portugal          3.9   16.0   25.8    273    🥈
      Chile             3.7   15.1   25.6    356    🥈
      Spain             3.9   16.6   24.6    618    🥉
      Italy             3.7   16.1   24.4   1232    🥉
      Austria           3.4   15.9   23.0    359
      Germany           3.4   15.9   22.6    319
      France            3.6   16.7   22.5    979
      Australia         3.5   17.3   21.5    3646. Red-Premium (25€ to 100€) 
   Country            Rating  Price  Value  Count  Place
      Chile             4.1   49.0    9.7    142    🏆
      Austria           3.7   45.2    9.0    270    🥈
      Australia         3.9   50.0    8.8    347    🥉
      France            3.9   54.5    8.1   1073
      
8. Sparkling-Medium (10€ to 25€) 
   Country            Rating  Price  Value  Count  Place
      Italy             3.8   15.6   26.1    362    🏆
      Spain             3.5   14.9   24.9    296    🥈
      Portugal          3.5   16.4   22.8     56    🥉
      Germany           3.4   15.7   22.6    145
      Austria           3.4   17.0   21.4     65

Part 7: Final Thoughts & Q/A

What are the limitations of this research?
Due to data limitations, we could not account for slight sugar deviations in wines; thus further sub-grouping by dry and semi-sweet wines would have been beneficial (dessert wines have been excluded due to low sample size) and one might argue that Portuguese dominance in Low budget Red-Wines could be a result of this.
However, as this paper suggests, sweeter wines are more preferred by ‘less well‐educated, less involved’ wine drinkers, so they would be under-represented in such niche product review segment as is Vivino and based on this, we could infer that our findings are mostly oriented towards purely dry-wines.

Second limitation would be that in the Low budget category, wines that were very cheap (<5€) gained very high boost in values; therefore, ideally, we would subgroup Low price further to Very Low price category (<5€) to account for this effect. Another way to tackle this would have been to take logarithm of the prices.

Third limitation would be that this sample was sourced from the EU market. This means that those wines that were shipped to the EU (the biggest market for wines) were captured by this research. Therefore, an average picture might be slightly different, if one visits the USA, China or Chile. Outside the EU, we would expect slightly reduced prices due to absence of EU import tax, thus further increasing their wines’ Value proposition.

As has always been thought, is France still the ultimate go-to country if you want to choose the best wine for most occasions?

The conclusions of this research challenges this belief. Many other producer countries are catching up if not providing a better value e.g.: Portugal, Italy, Spain and Chile. But the most expensive (not part of this research due to low sample size) and exquisite still come from France and therefore remain the reference point. 👑

What is a quick cheat sheet that we can remember to pick the best value wine?

          Budget               White                     Red
      Low     (0-10€)      Portugal, Spain          Italy, Spain, PT                 
      Medium  (10€-25€)    Spain, Argentina          USA, Argentina
      Premium (25€-100€)   Portugal, Spain              Chile