Thursday, November 16, 2017

Thinking about a Passing Ability Stat

I am very happy with my passing value added stat, I think it adds more to the measuring of attacking passing but I think that it is missing something as overall passing statistic.

So today while I was procrastinating writing a stats preview for the North London Derby (Look for it tonight/tomorrow, it's got some good stuff in there!) I was thinking about passing.

One of the thoughts that popped into my head was looking at the completion percentage above average based on the three thirds of the pitch and also long passing (maybe short passing? I didn't include this but maybe I should, that is why I am writing this out).

The simplest way to do this is:

player pass completion for third / league average pass completion for third * 100

Take Mesut Ozil for Final 3rd Passing:

0.727 (his completion%) / 0.614 (Lg Avg) *100 and you get 118 which also helpfully easily translates to 18% better than league average

For this stat I did that for Defensive 3rd passing, Middle 3rd Passing, Final 3rd passing and Long Ball passing.

To combine them into one stat the are all weighted by the total number of attempts in each category for an overall number. This creates the stat that for the time being I am labeling Pass+

As a quick aside the reason I am doing these where it is because it is compared to average and setting everything to 100 is because, well I come from a baseball background where this is common and I believe that it is easier to grasp that numbers over 100 are good, 100 is average and below is bad. It also has the added benefit of each number above or below can be pretty easily described as that many percent above or below average.

The next thing that I did with this stat is bring in my passing value added stat. The reason for this is that I think a passing stat shouldn't just measure completing passes but also should also include attacking value which I think PPVA does well at.

So similar to the other stats I got this on the same scale, but for this I made an adjustment to use the 75th percentile value instead of the average because otherwise things got really screwy. Maybe someone smarter than me can give pointers on a better way to work this out but this is what I did to make the scale work out better with the other stats.

So once I had PPVA+ I combined them into one stat that for the time I am calling Passing Ability. I don't love this name and would like to think of something else. The method for combing them was that the Pass+ stats are weighted 4 times the value of PPVA+ (I did this because it is made up of 4 stats and it seemed about right, again all a work in progress).
And the tableau for the premier league to play with:

Thursday, October 12, 2017

Creating Radar Charts in Tableau - A How To

During the international breaks I like to try creating something new, it is nice to get away from the day to day of the club schedule and play around with data.

During this international break I have worked on figuring out a way to replicate Ted Knutson's very popular Radar charts.

My knock off seems to be fairly popular as well so I wanted to give a bit of a rundown on how to create them should you ever want to try it yourself.

This is based on this template that was posted on the tableau blog.

The first step is getting your data ready. For this walk through I am using the creation of my midfield template.

To have the data play nicely with the radar it all must be normalized to the same scale. To do this I will make everything run between 0 and 100. So I identify each set of data and the minimum and maximum value to be used to calculate the value for the radar.

Stat Name M Min Max
Pass% Value M 74 90
Key Passes Value M 0.7 2.5
PPVA Value 0.03 0.55
xG Buildup Value 0.1 0.6
xG+xA Value 0.1 0.5
Drib Value M 0.5 2.1
Disp Value M 2.4 0.5
Fouls Value 2.4 0.6
DribPast% Value 60 20
Suc Tackles Value 1 3
Int Value 1 3
Suc Long Balls Value 0.53

Narrowing the list down to 12 values (I even through on one more than Ted!) is very tough but I feel that this gives a good overview of the different things you want to measure from a midfielder. It has passing, chance creation, overall scoring contribution, dribbles and ball retention, and some defense. It isn't perfect because nothing is and when you make your own you can go crazy with what ever you want to include. The minimum and maximum are at the 95th percentile and the 5th percentile of the stat.


Above is an example of normalizing Passing% to 100. First I have an If statement for values greater than 0.9, then values for below 0.74 and then finally for values in between. To do this you subtract the minimum value and then divide by the spread between minimum and maximum and multiply by 100. 

When done it should look something like this:

So you go through and do this for all of the stats you want to include in the radar. Once you are complete with that go to Analysis -> View Data and select all of the data to copy into an excel sheet.


In the excel sheet you will add a column for the order that you want the stat to show up on the radar.  For this each stat will have this order:

Stat Name M Order
Pass% Value M 1
Key Passes Value M 2
PPVA Value 3
xG Buildup Value 4
xG+xA Value 5
Drib Value M 6
Disp Value M 7
Fouls Value 8
DribPast% Value 9
Suc Tackles Value 10
Int Value 11
Suc Long Balls Value 12
The next step is to create the radar in tableau. So create the new worksheet and add the newly created excel data to the data connection. 

Next is to create the x Value we will use. We will create a calculated field, I am naming mine X-M and enter the values for the X coordinates for each stat: 

CASE [Stat Name M]
WHEN "Pass% Value M" THEN 0
WHEN "Key Passes Value M" THEN [Per90 Stat Value] *(1/2)
WHEN "PPVA Value" THEN [Per90 Stat Value] *(sqrt(3)/2)
WHEN "xG Buildup Value" THEN [Per90 Stat Value]
WHEN "xG+xA Value" THEN [Per90 Stat Value] *(sqrt(3)/2)
WHEN "Drib Value M" THEN [Per90 Stat Value] *(1/2)
WHEN "Disp Value M" THEN 0
WHEN "Fouls Value" THEN [Per90 Stat Value] *(-1/2)
WHEN "DribPast% Value" THEN [Per90 Stat Value] *(-sqrt(3)/2)
WHEN "Tackles Value" THEN [Per90 Stat Value] *-1
WHEN "Int Value" THEN [Per90 Stat Value] *(-sqrt(3)/2)
WHEN "Suc Long Balls Value" THEN [Per90 Stat Value] *(-1/2)
END


And then the same with the Y Values:

CASE [Stat Name M]
WHEN "Pass% Value M" THEN [Per90 Stat Value]
WHEN "Key Passes Value M" THEN [Per90 Stat Value] *(sqrt(3)/2)
WHEN "PPVA Value" THEN [Per90 Stat Value] *(1/2)
WHEN "xG Buildup Value" THEN 0
WHEN "xG+xA Value" THEN [Per90 Stat Value] *(-1/2)
WHEN "Drib Value M" THEN [Per90 Stat Value] *(-sqrt(3)/2)
WHEN "Disp Value M" THEN [Per90 Stat Value] *(-1)
WHEN "Fouls Value" THEN [Per90 Stat Value] *(-sqrt(3)/2)
WHEN "DribPast% Value" THEN [Per90 Stat Value] *(-1/2)
WHEN "Tackles Value" THEN 0
WHEN "Int Value" THEN [Per90 Stat Value] *(1/2)
WHEN "Suc Long Balls Value" THEN [Per90 Stat Value] *(sqrt(3)/2)
END
Next you will add the X-M to the columns aggregated as an average and Y-M to the rows also aggregated as an average.

Next is to drag Stat Name M Dimension to the marks section.

Then you will convert the mark type from Automatic to Polygon.

Then drag the Order Measure to the path section in mark to fix the weird shapes.


Now we have something that looks like a Radar! However right now it is showing all of the values and we need to fix it to show only one player at a time. To do this drag the Players Dimension to the filters section. I then add it to the window as well and change it to select only single values and I also customize the filter to not show the all value.

It should look something like this now:


We are very close now. Now we will add the background image for the radar. For this I took the blank template from here and then added the values around the circle for the stats. It looks like this:
Next it will be added into Tableau. To do this you go to Map -> Background Images -> and then select your data sheet. Then you select where you saved the image and put in the matching coordinates. 

Now we should have a real radar with some minor formatting stuff to make it look pretty to finish!

The formatting that I like to do is to change the color opacity to 65% to be able to see the numbers on the image behind, have each team be assigned a color, remove the axis labels and the center lines. I then put them all in a dashboard with the Per 90 Stats and Minutes information to complete everything. The final product should look like this:


You can find the dashboard and play around with it here

Friday, September 8, 2017

Introducing Passing Progression Value Added

For a while I have been wanting to create a way to measure passing value added.

I have added a stat that is called xG chain and xG build up that was created by Ted Knutson and Thom Lawrence for Statsbomb services. Mine might be a bit different but I have tried to follow the same general guidelines laid out in their introductory post on Statsbomb Services.

This is cool and helpful but it misses a lot of passes that don't lead to shots so I wanted to see about figuring out a way to include those. I really liked the way that Nils Mackay went about analyzing the problem and decided to use that as the starting point for my model. Mackay has gone even further in refining his model but for now I focused on making this simple for my first attempt.

What I am setting out to measure is the value added (xG in this case) between the starting point of a pass and where the pass ends.

The sporting logic behind this is that to be able to win you must score goals. Your team is better able to score goals the closer they are able to take shots to the opponents goal. Getting closer to the opponents goal through passing increases the likely hood of taking high quality shots. This last part is what I am looking to attempt to measure.

To accomplish this I use a very simple xG model to assign a value for every position on the pitch.

The equation for the xG model is this:

(1-(1/(1+((e^(-1.56335793278499+(Distance from Center*0.0000564550258161941)+ (Square Root (Distance from Endline^2+Distance from Center^2))*-0.0693321731182481)))))))

Essentialy it looks at how far you are from the center of the pitch (Closer to the Center is better) and how far you are from the center of the goal (Closer to the Center is better).

Here is what the values for areas of the field look like up and down the pitch:
To determine the value added for a successful pass this model takes the value of the ending point for the pass and then subtracts the value for the starting point for the pass.

So for example a completed pass starts at the point (60,10) and ends at the point (40,0). The end value is 0.01591 minus starting value of 0.00608 would give a simple value added of this pass of 0.00983. I have also made the decision to give a completed pass a bonus of 0.003 (the reasoning is that keeping possession to be able to continue to attack is valuable and this seems like a reasonable amount to assign, I am open to changing this) and if it starts and stays within the attacking final third an additional 0.015 is added (same reasoning as above but the attacking final third is even more important). So the total value added with this pass is 0.01283.

For an incomplete pass a player is penalized for the value to the opponent taking over at the end point of the pass. This is what the value of the pitch look like:
The values are pretty similar to above but they include the following penalties in addition to the value of where the opponents takes over: -0.01 for losing possession (the reasoning is that your team cannot attack any longer once they do not have the ball, this seems like a value that is about right but I would be open to changing) and if the pass is lost in the defensive third an additional 0.015 is subtracted.

An example again, lets say that again we try to pass from the point (60,10) and ends at the point (40,0) but is intercepted. The value for this pass would be -0.01327, -0.00327 for the opponent taking over and -0.01 for losing possession.

These calculations are done for every pass attempt in the game.

I have gone back and done this for all of the games in the 2017-18 season thus far and added this stat to the Tableau database under the passing tab.

Also here are the top 25 in raw Value Added from the Premier League through the first 3 weeks:

Thursday, September 7, 2017

Premier League Week 4 Projections

After a long international break it is time to get back to club football. As an Arsenal fan and American it has been a pretty crappy couple of weeks with the results but according to the odds it looks like things should hopefully get better.

Without any further rambling here are the odds for week 4 of the Premier League:
After several weeks with lots of heavy favorites, this week only Arsenal breaks the 50% odds barrier, As a pessimist this also likely means that of the big teams Arsenal will have their result go against them.

The Manchester City vs Liverpool match up looks really good and even with a 4:30 am kick off I will probably still do my best to wake up to watch it. I have City as the favorite but with Liverpool's track record and how they just dominated Arsenal it could definitely be closer than the odds suggest.

A new thing that I tweeted out this week is the relative strength rankings for each team that feeds into these projections.
This week we have a 1 vs 3 match up and a 4 vs 7 match up which might end up as one of the better weekends for top teams going against each other.

On the title projection front, things have narrowed down to a few favorites with the rest of the big teams falling back:

Click to Enlarge
Manchester City is still the title favorite but Manchester United and Liverpool have both moved to over 20% for the title odds.* Arsenal's season might be collapsing but I still have them as the 4th highest title odds so maybe things aren't as bad as it seems after two losses.

Another new thing I tweeted out today was the team performance compared to the expected points based on odds and based on expected goals:
This is pretty interesting, I wouldn't expect Huddersfield to keep up this performance but the points that they have already banked are very valuable and can't be take away. Also even with the only perfect record, Manchester United haven't over performed by a crazy amount, Jose might be doing his second year magic.
-
This article was written with the aid of StrataData, which is the property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform, in addition to StrataBet Premium Recommendations.

*Note: If teams tie on points this assigns both to that position, I have not added a goal differential to break ties. Here is a further explanation for how these projections work.

Wednesday, September 6, 2017

Explaining My Simulation Methodology

I have been meaning to get around to this for a while now and with a break in fixtures for international team games this seems like a good time to go over my simulation methodology.

 

Basics:

The model is built on this logic: that a soccer match result is determined by goals, goals are determined by the number of shots and the quality of those shots. So I have built the model and 1) trying to estimate the number of shots each team will have and 2) a rough idea of where these shots will be taken and the quality of the shots.

For simplicity I group shots into three location buckets, Danger Zone (6 yard box + Middle of 18 yard box), Wide Box (wide areas of the box) and Outside the Box.

I also estimate the number of headers that will occur in the game and I assume that all headers will be from the danger zone (about 95% of headers occur in this area) all other areas are shots from feet.

Lastly I estimate the number of Big Chances each team will have per game. For simplicity I also assume that all of these will occur in the danger zone (about 75% of big chances occur in the danger zone).

 

Determining Values:

To arrive at the values for each I have taken data from the last two seasons plus the current season. I then weight the data to get to a single value. The current weighting is 1 for 2016-17, 0.5 for 2015-16 and Games Played/19 so for this week there have been 3 matches so the weighting is 0.16 and this will go up every week.

You could certainly pick different weights for this but my thinking is that I would use last season as the baseline for each team, two years ago as half as important because there can be quite a bit of turn over in a squad in that time but still it can provide information and then a sliding scale for the current season that would put it on equal terms at the halfway point with the previous season and then have the largest weighting.

I use this weighting on the data for both offensive and defensive statistics as well as overall and home and away.

These values then feed into the simulation model to determine the number and quality of shots for each team.

Using Arsenal vs Bournemouth as an example:

Arsenal have 6.42 Danger zone shots for overall, 7.41 at home while Bournemouth allow 5.59 overall and allow 3.61 Danger Zone shots on the road. Averaging all of these I have Arsenal with a raw value of 5.69 Danger Zone shots. Taking out the expected headers and Big Chances (same methodology as above) Arsenal are left with 1.58 regular danger zone shots from feet. The decimal portion of the shot is then compared to a random number and if the decimal is higher than the random number the shots total is rounded up.

Doing this for all the different shot categories Arsenal are estimated to have 15.03 (1.58 DZ, 4.37 WB, 4.97 OB, 2.3 headers, 1.81 BC) shots but that can vary between 12 and 17 shots, compared to 10.65 (0.98 DZ, 2.29 WB, 4.32 OB, 1.78 headers, 1.27 BC) shots but can vary between 8 and 13 shots for Bournemouth.

Once the number of shots are determined each one is assigned an xG value. Danger Zone shots are 0.17 xG, Wide Box 0.06 xG, Outside the box 0.024 xG, headers 0.08xG and big chances 0.45 xG.

 

Simulating the match:

These values are assigned to each shot and compared to a random number. Again to our example:

Arsenal
Shot Type Value Random Result
DZ 0.17 0.610036 0
DZ 0.17 0.172277 0
WB 0.06 0.303131 0
WB 0.06 0.267087 0
WB 0.06 0.068808 0
WB 0.06 0.6799 0
OB 0.024 0.715029 0
OB 0.024 0.012071 1
OB 0.024 0.577135 0
OB 0.024 0.657269 0
OB 0.024 0.936911 0
H 0.08 0.356094 0
H 0.08 0.022968 1
BC 0.45 0.657358 0
BC 0.45 0.545432 0

Based on these results Arsenal would have scored 2 goals in this simulation.

This is done for both teams and the goals scored are compared and the result is recorded and then the simulation is run (with the decimals again compared to a new set of random numbers to simulate a bit of randomness that happens) again another 9,999 times. The odds that I present are the count of each result divided by the number of simulations.

Again to the example of Arsenal vs Bournemouth, Arsenal won 5,327 of the simulations, there was a draw 2,190 times and Bournemouth won 2,483 times. So the odds for the match would be presented as 53.3% for Arsenal, 21.9% Draw, 24.8% Bournemouth.

 

Simulating the Season:

For each of the remaining matches in the season the odds are determined using the above method and a similar exercise is performed to simulate the season. I use this to give odds for each team winning the league or finishing top 4 and other targets.

To help illustrate I will again use the Arsenal vs Bournemouth example. For this a random number is generated. I got 0.5058 as my random number and that is compared to odds of home win: 0 to 0.533, draw: greater than 0.533 to less than 0.753 and Away win: 0.752 to 1. With this random number Arsenal have been simulated as the winner.

This is done for each match and the number of wins, draws and losses are recorded as well as the points and where each team finished in the table. This is done another 9,999 times to simulate the season 10,000 times and then the results are presented as the simulated odds.

The latest simulation makes Manchester City the title favorites winning the title in 32.7% of simulations.

 

Team Strength:

Earlier today on twitter I posted something new and that is related to my simulation work. I called it team strength rankings.
Here is how that is derived.

Each team's overall shot spread is multiplied by the assigned values (basically it is a simplified xG value per game) and then compared to league average. Using Arsenal as an example, they have an estimated 1.8 xG per game overall compared to 1.3 for League Average. I then took the team value divided by league average times 100 to give the value in the tweet where 100 is league average and every point above or below represents 1% above or below the league average.

For the overall ranking it is the average of the offense and defense with that compared to league average to determine overall rank.

This is a new thing for me so this might need tweeking as I continue on.

Please let me know if things need further clarification or if I missed anything.

Monday, September 4, 2017

5 Highest Quality Chances from Premier League Week 3

A little slow getting this out this week but hey it's an international break and we are all kind of off our regular schedule.

Arsenal fans should probably avoid this week with the amount of shambolic defending that will be featured.

5) Harry Kane vs Burnley



Shot from feet in the center of the box, regular assisted shot, classified as a big chance: 0.40 xG.

Harry Kane just doesn't score in August.

4) Daniel Sturridge vs Arsenal




Shot from head from very close  range, assisted by a cross, classified as a big chance: 0.44 xG

That's about as wide open a header you will get. Sturridge did not miss his chance.

3) Mohamed Salah vs Arsenal




Shot from feet from very close range, following a cross, classified as a big chance: 0.54 xG

This is a really pretty movement from Liverpool to carve open Arsenal, Petr Cech makes a great save to deny a goal. 

2) Dele Alli vs Burnley



Shot from feet from very close range, following a corner, classified as a big chance: 0.76 xG

1) Mohamed Salah vs Arsenal 



Shot from feet from the center of the box, following a fast break, classified as a big chance: 0.77 xG

Not the best corner from Arsenal, a great one man fast break from Salah.

Thursday, August 31, 2017

Arsenal's Squad Mismanagement

I am doing a bit of change of pace post today away from stats to look more at Arsenal. In particular there poor squad management and the coming squad rebuilding job.

Here is a look at the current Arsenal Squad and the contract lengths(according to transfermarkt)

I have shown the length of each players deal and the age that they will be in that season. I have also highlighted the peak years (24-28) green and put players over 30 in red.

Arsenal have six senior players out of contract at the end of the season (and it would have been 7 with Alex Oxlade-Chamberlain getting sold yesterday). The running down of contracts for Per Mertesaker and Santi Carzorla make a certain amount of sense. For the rest it is absolutely stupid to hold on to a player this late with no chance of getting a return for them.

This situation is bad but what is worse is that next season a further ten players will be in their last season.

Arsenal are going to be in an even more desperate spot next summer. It will have teams circling looking to pick off our best player plus the team will be further weakened with our two major stars in Alexis Sanchez and Mesut Ozil gone(they aren't extending). Further making things challenging Arsenal are likely to be sitting outside the top four again (52% chance of 5th or worse according to my last simulation) and without champions league football to attract players. Oh and their manager will be in the last year of his contract with questions on the long term direction of the club in question.

A couple of those guys with two years to go on their contracts, like Petr Cech and Nacho Monreal are still solid contributors that would not bring a return in the market so it makes financial sense to keep them around but not extend their contract but these are the exceptions. The other guys above with two years left? Yeah decisions should have been made on them last season whether they had a future or not at the club and moved on while they had good market value.

Well run teams make decisions on these players at this point. If you decide they aren't in the long term plans you can still extract value in the transfer market. If you decide to extend the allure of running down the contract to enter free agency is still three years away so that is a less appealing option.

The other major issue for Arsenal is that for all the attacking talent in the team, they are seriously short in players with promise who are pre-peak or just entering their peak ages to take over for older players.

Beyond Alex Iwobi who is near the first team that looks like they could be a starting player. Sure Reiss Nelson looks promising but he is 17 and doesn't have much first team experience but could get it this year but beyond that? Not much.

Let's play director of football and go through the players with two years left and take a look at them.

David Ospina: He is clearly the second choice and not the ideal long term solution to Arsenal's goal keeping question. 29 this season and two years on his deal. Ideally he would have been sold last season but definitely should have gone this summer. His wages as a starting goal keeper for an international team are probably fairly high for a guy who will start in low importance cup games.

Matt Macey: I have no real opinion on his quality but as the third keeper he is probably on a low wage. Probably behind Emi Martinez in the pecking order. Maybe sell but not really a concern as the third keeper.

Nacho Monreal: A solid player still but not a guy that I want extended. He gets picked out to be exploited by high quality wingers not a convincing center back. He can do a passable job at both positions still against average opponents and wouldn't return much of a transfer fee. Keep for depth.

Mathieu Debuchy: Replaced by Hector Bellerin, Arsenal have been trying to offload him for a the last couple years without success. Really should explore doing an NBA style buy out for him just to get him away from the club.

Aaron Ramsey: A good player who suffers from injury issues and not really fitting into the system. If he is going to stay you probably need to build around him to maximize his talents if you don't want to do that he arguably makes the team weaker when he abandons the midfield to play as a second striker too often.  Arsenal should have made the decision on him last year to sign or extend. With the window over he is going to stick around and be in the same position Ozil, Alexis and Chamberlain were in. I probably would have extended him, switched to a 4-3-3, sold Ozil and brought in more midfield depth and let him do his charging runs but with more cover behind him.

Theo Walcott: Fine player, doesn't fit with how Arsenal play. Sell him as he is running down the contract and getting past the peak years. Probably should have left last year to maximize return and allow for money to be re-invested in a younger replacement.

Oliver Giroud: The guy scores goals and has pretty flicks and layoffs but isn't a starter. Teams were offering transfers in the 30 million range for him and Arsenal should have been ruthless and let him go.

Danny Welbeck: Does everything well expect finish moves to produce goals. However that is important as a striker. With this is clearly a rotation option, I like him and think that he is someone to keep around. The age profile is still good and I would have looked to try to see if his contract could be extended a couple years.

Chuba Akpom: Not rated by the manager. Sell.

After that exercise Arsenal moved on at least four players and bought one out to clear out space on the roster. It also freed up £250,000 £300,000 a week in salaries plus some transfer funds to go to replacements and to allow for money to go to extensions.

Doing this exercise helps the team to clear out players who don't have a clear future, it helps to avoid the issues that Arsenal currently have with the Premier League FFP rules. With Arsenal bumping into the top of the new soft cap in the Premier League it is even more important to maximize the output that you get for the salaries that you pay your squad. In this kind of a situation having a lot of good squad players kills long term roster flexibility. This is also an indictment of Arsenal's commercial department failing to keep up allow for more salary flexibility.

Young players are rejecting Arsenal. The current players plus management have backed Arsenal into a corner with an aging squad with little resale value and contracts running down. There doesn't seem to be a plan for next year let alone one for three or four years down the road.

Things are likely to get worse and it could take years for Arsenal to get out of this whole. We can all hope that this doesn't permanently knock Arsenal down from the top tier of the Premier League.