Deserved Goals 2.1

In my last post (which I recommend you read before this), I improved my original Deserved Goals model, which resulted in the following formula:

Deserved Goals 2.0 = (( Average Shots+ A% of the variance ) x ( Average Combined xG per Shot + B% of the variance )) x ( Average Goals per Combined xG + C% of the variance )

I estimated figures for A, B and C, and showed that in-sample, it performed well against other metrics at predicting the Premier League.

However, there are a number of issues with this approach. Firstly, to properly test a metric it is important to separate your data into “training” and “testing”. You should only use the training data to develop the metric, and then you should test it on the unseen testing data to see how it performs. Also, a metric should be able to perform well in other top leagues, not just the Premier League.

I have therefore widened the scope to encompass the top 5 European leagues (England, Spain, Germany, France and Italy), and split the data as follows:

Training Data: 2016/17 & 2017/18, a total of 3652 games

Testing Data: 2018/19, a total of 1,826 games

Ignoring the testing data for now, I will develop the metric using only information from the training data.

To do this I need to find values for the following variables. These need to work well in the training data, have some basis in theory, and be general enough to avoid over-fitting.

Deserved Goals = (( Average Shots+ A% of the variance ) x ( Average Combined xG per Shot + B% of the variance )) x ( Average Goals per Shot Based xG + C% of the variance )

Let’s work through these one at a time.

Average Shots is easy, as we can just see what the average number of shots taken by a team was in our training data. There were 90,858 shots taken in 3,652 games, with 2 teams in each game, which is 12.4 shots per game per team.

Average Combined xG per Shot should theoretically be the same as the average conversion rate (assuming xG accurately reflects the chance of a goal being scored). In the training data, there were 10,121 goals scored from 90,858 shots, which is a conversion rate of 11.1%.

Average Goals per Shot Based xG should theoretically be 100%, again assuming xG is a good measure of the chance of a goal being scored.

Filling in those figures gives the following:

Deserved Goals = (( 12.4 + A% of the variance in shots) x ( 11.1% + B% of the variance in Combined xG per Shot )) x ( 100% + C% of the variance in Goals per xG)

As for A, B and C, we need to look at how these 3 components regress to the mean within a season. Again, we will only use the training data.

Let’s start with Shots. In the charts below, on the left is the correlation between the shot rate at each stage in the season, and the shot rate in future games, for both Shots For and Shots Against. We can see that Shots For are more repeatable, indicating that taking shots is more of a skill than not allowing your opponent to take shots. This tells me that to get better predictions, I need to have different input values for Attack and Defence.

The other thing to notice is that the data is noisy. This reflects what actually happened in the training data, but if we want it to actually tell us something about football we need to look at the trend.

The right hand side shows the trend I will use in my metric. For the first half of the season (19 games), I have plotted a logarithmic trend line. I have then fixed the value at the value after 19 games for the remainder of the season. This is because I believe the drop off in correlation reflects the diminishing number of games remaining, not an actual drop off in predictive power.

Shot Correl

Put simply then, the value for A will vary depending on how many games have been played. This is because after a few games we are much less sure that the variation reflects ability rather than luck than we are after a larger number of games.

Repeating this for the other components produces these results, which will be the inputs for A, B and C.

Correls

For all 3 components, it seems like it is easier to rate Attack than Defence. Using these values should improve predictions, as we will regress defensive statistics towards the mean more than the attacking statistics.

OK, so this is the new form of the metric (v2.1), which uses the figures we calculated and the inputs from the above chart:

Deserved Goals For = (( 12.4 + A% of the variance in shots for) x ( 11.1% + B% of the variance in Combined xG per Shot for)) x ( 100% + C% of the variance in Goals per xG for)

Deserved Goals Against= (( 12.4 + A% of the variance in shots against) x ( 11.1% + B% of the variance in Combined xG per Shot against)) x ( 100% + C% of the variance in Goals per xG against)

Deserved Goals Ratio = Deserved Goals For / (Deserved Goals For + Deserved Goals Against)

Now we have a metric, it’s time to work out the relationship between the metric and future points, again only using the training data. Using the Slope and Intercept functions in Excel for each week of the season, and taking an average of results of the middle 11 games where the values should be more stable, I get the following formula:

Future Points per Game = (4.48 x Deserved Goals Ratio) – 0.88

As a quick sense check, the average team should have a ratio of around 0.500.

4.48 x 0.500 – 0.88 = 1.36 points per game

1.36 points per game over a 38 game season is around 52 points, which is about right for an average team.

OK, so we have arrived at a complete model. Here are the results for each metric within the training data:

Training Results

As expected, the model does well within the training data. However, the real test is to see how it gets on with the testing data, which it hasn’t seen yet.

Here are the results with the testing data:

Testing Results.png

Whilst the difference is not as large as in the training data, Deserved Goals 2.1 still races into an early lead, and is the best overall metric.

Looking at correlation instead of average errors, we get a similar picture in the testing data:

Correl Testing

This is an encouraging result, and shows that Deserved Goals 2.1 would be a good metric to choose when trying to predict future performance in top level club football.

If you have any questions, comments or suggestions, please let me know. I am on Twitter @8Yards8Feet

Data from: http://www.football-data.co.uk/  and https://projects.fivethirtyeight.com/soccer-predictions/

One thought on “Deserved Goals 2.1

Leave a comment