Friday, May 12, 2017

Week 37 Premier League Simulation


I have been lacking in producing that hot fresh content the bring this blog the tens of viewers each day lately and I am sorry for that.

Here is the simulated odds for this weeks Premier League slate.

It is nearly a coin flip for Chelsea to lock up the title this evening, Manchester City has ridiculously high odds of beating Leicester which considering their last meeting might be too high.

In the race between Liverpool and Arsenal, the model says that Liverpool should be heavy favorites away at a bottom half side which is kind of scary. The model doesn't know about their injuries and problems with a deep block. Arsenal have to go to Stoke which is never pretty and they are sligh favorites to come out of that with a win.

The 7am slate of games is not exactly exciting so I will probably be watching the razenball sporters of Leipzig take on Bayern Munich. Or I will let the kids control the TV, who knows. #NCT

Thursday, May 11, 2017

Premier League Table Simulation

After Arsenal's win over Southampton yesterday there are still four teams fighting for the last two Champions League spots.


Manchester City and Liverpool are still the clear favorites but Arsenal have edged past Manchester United as the closet follower.

To have a realistic shot Arsenal need to get to a minimum of 73 points and from there it could get pretty interesting with the goal differential between the teams. If Arsenal do win out and get to 75 points they will beat Liverpool's points totals about 70% of the time. If the get to 73 points they will finish ahead of Liverpool 19% of the time.

The road ahead for Arsenal:

41.7% chance of a win at Stoke.

55.9% chance of a win vs Sunderland.

44% chance of beating Everton at home on the last day.

All told there is about a 10.3% chance of running the table which is crazy considering where Arsenal were a few weeks ago.

Monday, May 8, 2017

How I Get My Data

One of the most frustrating things about soccer (football) is that there is no easy way to get statistics. Sure you can find match results pretty easy but if you want anything more complicated most people are out of luck.

I considered looking at the Opta method, which would give a nice fire hose of data to examine but that seems expensive and this is a hobby that I was really just getting started in and wasn't willing to commit to spending that kind of money at this point. Maybe some day.

So decided that seeing as this is a hobby I do for fun I don't really have a problem spending my leisure time working on it.

The initial plan was something similar to the work that I had done in College for a project in one of my econometrics classes where I looked at NFL decisions on 4th down. For that I copied and pasted all the play by play data into excel and then cleaned the data very inefficiently. I should have learned to write code to help but I was young and dumb.

For soccer I decided that the commentary data that ends up on places like the BBC and ESPNFC looked like the best. It includes some general shot location; middle of six yard box, wide six yard box, center of the box, wide box, outside the box and long range. The shot location isn't as specific as the Opta data but the price is right on this data. It also includes who took the shot which is important and even better it also includes who set up the shot. The other great thing about this is that it also includes context for the shot, which foot or head and how it was assisted; cross, set piece, free kick, through ball, headed pass and fast break.

So I started off copy and pasting data by hand and then using excel formulas extracting the relevant data from the play by play information.

The next step in my data was using the awesome webapps that have been developed to help scrape data from the web. I currently use both import.io and ParseHub to collect the data each week and then I have a number of macros that I have written that take these big CSV files and break them into each match and extract the data that I need.

The next thing was that I really wanted big chance data, and unfortunately that isn't in the play by play commentary. To accomplish that I still go into whoscored and manually search through all the matches and enter the big chance information into the previous data. This is still a big time suck for me and I would love a better method but I have not figured out when at the moment, if you have suggestions please let me know.

So that is how I figured out how to get half way decent shot information which I know is always one of the biggest questions for people getting started in analytics.

Sunday, May 7, 2017

Premier League Season Simulation

I am not one to write code.

I work with people who do that and keep the hell away from that because I will break things. Given that I decided to write some very bad VBA code that simulates the rest of the Premier League Season.

It is based on the other simulations I have run and posted here and like those it is untested for accuracy.

Here is what I got for the rest of the season:

Click to make bigger
According to my model the title is all but decided for Chelsea after Tottenham lost on Firday. Even with Arsenal winning Sunday they are still way outside the top four spots.

Man City is in the best spot for top four at 91%, Liverpool is next at 65% followed by Manchester United at 59%.

Thursday, May 4, 2017

Celta Vigo vs Manchester United First Leg Simulation

I haven't watched any Celta Vigo this year but from what I have heard on the podcasts and read is that they have thrown in the towel on the league season and accepted a safe mid-table position and thrown everything into the Europa League.

This simulation is based only on league stats so it might underrate Celta Vigo's chances if they have been resting players in the league and not given it their best shot. 

Manchester United are pretty heavy favorites for this one and at the very least should expect to come away with an away goal (86% chance of at least one goal and 53% chance of more than 1) if not the win. 

The Projection:




Updated Big 5 League Stats





Wednesday, May 3, 2017

Monaco vs Juventus First Leg Simulation

Monaco have a big task facing the best defense in Europe but the model favors Juventus in what will likely be a fairly low scoring first leg (35% chance of 2 goals or less).




Premier League Top 6 Stats vs the Top 6

One of the things that has become apparent this season is that the top 6 (really top 5 but I will be nice and include Arsenal in this discussion) teams are significantly better than the rest of the league this year and the results against each other have been what is driving the places in the table.

Given that, I thought that it would be interesting to pull the stats from the database I have been working on all season and compare how the teams have fared against each other:


Offense vs other Top 6 Teams

xG
SoT xG
Big Chances
Danger Zone
Shots
Team
Per Game
Rank
Per Game
Rank
Per Game
Rank
Per Game
Rank
Per Game
Rank
Chelsea
1.10
3
1.16
3
1.45
3
3.73
7
9.00
7
Tottenham Hotspur
1.02
4
0.90
10
1.22
5
3.11
13
10.78
4
Liverpool
1.26
2
1.48
1
1.60
2
5.40
2
12.10
2
Manchester City
1.52
1
1.24
2
2.18
1
5.45
1
13.36
1
Manchester United
0.86
7
0.82
9
0.86
11
4.14
4
11.57
3
Arsenal
0.83
8
1.12
12
1.00
6
3.78
6
9.33
6











Defense vs other Top 6 Teams

xG
SoT xG
Big Chances
Danger Zone
Shots
Team
Per Game
Rank
Per Game
Rank
Per Game
Rank
Per Game
Rank
Per Game
Rank
Chelsea
0.79
3
0.75
2
0.88
3
3.03
1
8.61
3
Tottenham Hotspur
0.86
4
0.90
4
0.94
5
3.56
5
8.88
4
Liverpool
0.95
5
1.07
8
1.31
9
3.37
4
8.00
1
Manchester City
0.75
2
0.82
3
0.79
2
3.03
1
8.09
2
Manchester United
0.73
1
0.71
1
0.65
1
3.18
3
9.06
5
Arsenal
1.11
8
1.17
9
1.33
10
4.64
8
10.91
6


I was a little surprised but based on the stats Manchester City have the best record thus far against the other big teams. Chelsea have been second best followed very closely by Liverpool.

Manchester United are the fourth best team, with a very good defense against the the big teams (I would rank them second behind City) but that has come at the expense of producing offense or watchable matches.

A bit surprisingly Tottenham are the fifth best team, their defense is good and not far off the other teams but on offense I still think that they are way over performing their stats. Maybe they really are an awesome long range shooting team and that they can turn high volume of low quality chances into goals and wins on a regular basis I just think that betting on it to continue next season would be a risky proposition.

Lastly is poor old Arsenal. They don't have a stat where they rank better than sixth at against the top teams. They are the worst team on offense and the worst team in defense. There really isn't anything else that can be said about this.