• Changing RCF's index page, please click on "Forums" to access the forums.

2018 NBA Draft - June 21

Do Not Sell My Personal Information
I also wanted to post this... Another draft guy that largely drives his opinions on #'s on twitter is @scricca1

More specifically, he built a small statistical model that can be used as a NBA 3P% predictor. Basically he was able to find a solid correlation between NBA 3P% and the average of NCAA 3P%, 3PA/40, and FT% (once those numbers were adjusted to the same number scale).

The correlation isn't strong enough that it's the end-all be-all, but I like the tool quite a bit. He uploaded a Google Doc here with over 300 players that you can search, or can enter your own figures if you can't find the player already on there:

https://docs.google.com/spreadsheet...wx9XQU-EutsFGfLYTLiqLIgd4/edit#gid=1853145954

And here is the actual step-by-step breakdown about how he came up with the model:

https://medium.com/@scricca1/so-can...ict-nba-3-point-shooting-ability-21cee782859e

I've also messed around and tested out the model with past players who are now in the NBA by using their final college season stats. It's about as accurate of a predictor as I've seen (which isn't saying much lol).

Obviously if you follow the draft closely then you know how important a player's shooting projection is. It's probably the single most important thing in evaluating players in my opinion
 
Ntilikina last year - age 18.6, 6'5", per-40 stats in 19.3 minutes per game: 11.2 points (53% true shooting), 4.7 boards, 3.1 assists, 1.5 steals, 0.4 blocks, 1.8 TO, 5.0 PF
Bonga this year - age 18.3, 6'9", per-40 stats in 23.4 minutes per game: 11.0 points (62% true shooting), 6.0 boards, 4.4 assists, 2.0 steals, 0.8 blocks, 4.0 TO, 3.7 PF
Okobo this year - age 20.4, 6'3", per-40 stats in 26.5 minutes per game: 19.9 points (59% true shooting), 4.0 boards, 6.8 assists, 1.6 steals, 0.5 blocks, 4.4 TO, 2.8 PF

Somehow Bonga and Okobo still buried deep in the second round of most mocks in spite of being, at worst, very comparable to a guy who went #8 last year.

I also wanted to post this... Another draft guy that largely drives his opinions on #'s on twitter is @scricca1

More specifically, he built a small statistical model that can be used as a NBA 3P% predictor. Basically he was able to find a solid correlation between NBA 3P% and the average of NCAA 3P%, 3PA/40, and FT% (once those numbers were adjusted to the same number scale).

The correlation isn't strong enough that it's the end-all be-all, but I like the tool quite a bit. He uploaded a Google Doc here with over 300 players that you can search, or can enter your own figures if you can't find the player already on there:

https://docs.google.com/spreadsheet...wx9XQU-EutsFGfLYTLiqLIgd4/edit#gid=1853145954

And here is the actual step-by-step breakdown about how he came up with the model:

https://medium.com/@scricca1/so-can...ict-nba-3-point-shooting-ability-21cee782859e

I've also messed around and tested out the model with past players who are now in the NBA by using their final college season stats. It's about as accurate of a predictor as I've seen (which isn't saying much lol).

Obviously if you follow the draft closely then you know how important a player's shooting projection is. It's probably the single most important thing in evaluating players in my opinion

Awesome find...I'd been meaning to get around to this.
 
I also wanted to post this... Another draft guy that largely drives his opinions on #'s on twitter is @scricca1

More specifically, he built a small statistical model that can be used as a NBA 3P% predictor. Basically he was able to find a solid correlation between NBA 3P% and the average of NCAA 3P%, 3PA/40, and FT% (once those numbers were adjusted to the same number scale).

The correlation isn't strong enough that it's the end-all be-all, but I like the tool quite a bit. He uploaded a Google Doc here with over 300 players that you can search, or can enter your own figures if you can't find the player already on there:

https://docs.google.com/spreadsheet...wx9XQU-EutsFGfLYTLiqLIgd4/edit#gid=1853145954

And here is the actual step-by-step breakdown about how he came up with the model:

https://medium.com/@scricca1/so-can...ict-nba-3-point-shooting-ability-21cee782859e

I've also messed around and tested out the model with past players who are now in the NBA by using their final college season stats. It's about as accurate of a predictor as I've seen (which isn't saying much lol).

Obviously if you follow the draft closely then you know how important a player's shooting projection is. It's probably the single most important thing in evaluating players in my opinion

To save people the time of searching a lot of the top prospects on the model, here is currently each player's projected NBA 3P% based on what he's come up with:

Jaren Jackson Jr - 37.8%
DeAndre Ayton - 30.1%
Trae Young - 39.1%
Mo Bamba - 27.3%
Mikal Bridges - 40.8%
Wendell Carter - 33.7%
Marvin Bagley - 29.0%
Miles Bridges - 39.1%
Collin Sexton - 33.1%
Lonnie Walker - 36.1%
Chandler Hutchison - 33.5%
Landry Shamet - 41.1%
Keita Bates-Diop - 36.0%
Trevon Duval - 27.6%
Hamidou Diallo - 28.6%
Shake Milton - 40.2%

I'm relatively surprised with how accurate the predictor is with past players as well. It's pretty fun to search guys in the last couple of draft's and seeing how accurate the model is. I think using this as a tool + taking into account things such as the types of attempts that they take, a player's shooting form, shot diversity, etc and you can get a pretty decent idea of just how good of a shooter the player can become.
 
Interesting in particular that 3-point volume at the NCAA level is an important predictor for 3-point accuracy (and presumably volume too) at the NBA level.

When I made my model I was surprised to see that NCAA 3-point volume was a huge predictor of NBA success, far more important than NCAA 3-point accuracy. I guess qualitatively what's happening is the very best shooters shoot a lot of 3's, including high degree of difficulty 3's, and their efficiency numbers suffer as a result. This holds true at the NBA level...what's a better list of the ten best shooters in the NBA right now, the leaders in 3-point volume or the leaders in 3-point percentage?

zAw2D.png


zAw3w.png


Clearly the highest volume guys are the more dangerous shooters, even though only one of them ranks top-10 in percentage.
 
16 of Bagley's 27 games so far this season have been against top-75 defenses, compared to just 3 of Ayton's 31 games. Something to consider when projecting how well their high scoring volume/efficiency is likely to translate.
 
Looked into Okobo quite a bit. If he doesn't go in the first round I'll be surprised. Here's my full scouting report on him

DXeggSWWsAEaiZL.jpg:large
 
Updated my statistical rankings through yesterday's games:

zAJlW.png


Jontay leaps to the top after a pair of monster games. Sexton has quietly caught up to Young.
 
Updated my statistical rankings through yesterday's games:

zAJlW.png


Jontay leaps to the top after a pair of monster games. Sexton has quietly caught up to Young.

How long would it take you to give a full blown breakdown of your model? Not sure how detailed yours is from the next or how well I'd be able to understand it lol, but I'd be interested in learning about it.

How much of your defensive ratings are based on DRPM for example? It's so difficult to quantify defensive ability from one guy to the next, there's no foolproof stat. Not sure there ever will be.
 
How long would it take you to give a full blown breakdown of your model? Not sure how detailed yours is from the next or how well I'd be able to understand it lol, but I'd be interested in learning about it.

How much of your defensive ratings are based on DRPM for example? It's so difficult to quantify defensive ability from one guy to the next, there's no foolproof stat. Not sure there ever will be.

Wouldn't take too long; have to go at the moment, but will get to it for sure later tonight.

EDIT:

My model attempts to predict NBA adjusted plus/minus. The major benefit of predicting APM, instead of an NBA box score metric, is that my model doesn't inherit any biases at this stage (e.g. a model that tries to predict PER will necessarily end up with all the same biases/flaws of PER). The major drawback is that adjusted plus/minus is a very noisy stat, and the price I pay for trying to predict a noisy stat is relatively high uncertainties in my model coefficients (e.g. the marginal value of an NCAA assist, or rebound). Ultimately, it's a good tradeoff because in most cases the extra uncertainty in my predictions due to uncertainty in model coefficients is small relative to other sources of uncertainty.



My model assumes that there are no interaction terms between parameters. That means that the value of an NCAA player's assist according to my model does not depend at all on how many rebounds he gets, or how many points he scores.

It turns out that this assumption is absolutely crucial. Without it, the space of possible models is too large relative to the available sample size. In general, if you have an N-dimensional problem (and all the dimensions are important), you need at least 2^N data points for general smooth manifold fitting methods (like machine learning type stuff) to work. Here, N>15 (at a minimum), so we would need a sample size of around 50,000.

One way to visualize this problem is as follows. Picture a thousand points in one dimension, i.e. on a line. Only two points are "on the frontier"; the maximum and the minimum. All other points have neighbors on either side. Now picture them in two dimensions, on a plane. Now dozens of points around the perimeter of this blob are on the frontier. By the time you get to 15 dimensions, virtually all of the points are on the frontier, and very few have neighbors nearby.

The assumption that there are no interaction terms avoids this problem nicely by effectively breaking up the 15-dimensional problem into 15 one-dimensional problems, where a sample size on the order of a thousand is more than enough for a smooth curve fit (which is what I do).

In reality, it's a big enough sample size to get away with *some* interaction terms, but this is tricky business and often ends badly. The classic example is the assists*rebounds interaction term in basketball-reference's box plus/minus. At the time it seemed clever; the top 6 seasons of all time belonged to LeBron and MJ. Then, last year Russ had his triple double season and shattered the all-time BPM record, and the reputation of BPM as a stat. I chose to keep it simple and safe, and avoid any dabbling in interaction terms.



The last important thing my model does is estimate uncertainties in its predictions, something notably lacking from most such models people have published. This illuminates some of the strengths and weaknesses of my model. For example, the largest contributor to uncertainty is made two-pointers, that is, my model is generally less accurate in predicting players who make a lot of two pointers. This makes some sense; a player's two pointers made per game (even taken together with two point percentage, or equivalently two pointers missed) falls far short of describing how good a scorer the player really is inside the arc. There's just not enough information in this part of the box score to properly evaluate a player.



Some more minor things:

-All stats are per possession. I also include height, and minutes per game.

-I assume a quadratic aging curve. I found that on offense, better prospects actually follow a steeper aging curve than worse prospects, and I accounted for this as well.

-My sample only goes through the 2012 draft. This hurts my sample size, and also hurts because my model is really tuned to predict how players entering the NBA a decade ago would be expected to perform. Obviously the NBA has changed since then and I have no way of adjusting for that.

-My sample only includes prospects that went on to play significant NBA minutes, so it suffers from "survivor bias" and therefore tends to be slightly too optimistic in its projections. How to correct for this is an interesting question in its own right that I won't get into for now (but could talk more about if you're interested).

-My model has some interesting artifacts because of the relatively large uncertainties in the coefficients I mentioned earlier. For instance, made two pointers have a slight (not statistically significant) negative value. In reality, they probably have (at least) a slight positive value. I could manually correct things like this to make my model slightly better, but that's obviously a slippery slope toward tweaking and tuning my model in retrospect to make it look like I think it "should." So I decided to just let it be, even in cases where the helpful tweak is obvious.

EDIT 2: Forgot one other note

-My model doesn't account for strength of schedule or team strength.
 
Last edited:
Wouldn't take too long; have to go at the moment, but will get to it for sure later tonight.

EDIT:

My model attempts to predict NBA adjusted plus/minus. The major benefit of predicting APM, instead of an NBA box score metric, is that my model doesn't inherit any biases at this stage (e.g. a model that tries to predict PER will necessarily end up with all the same biases/flaws of PER). The major drawback is that adjusted plus/minus is a very noisy stat, and the price I pay for trying to predict a noisy stat is relatively high uncertainties in my model coefficients (e.g. the marginal value of an NCAA assist, or rebound). Ultimately, it's a good tradeoff because in most cases the extra uncertainty in my predictions due to uncertainty in model coefficients is small relative to other sources of uncertainty.



My model assumes that there are no interaction terms between parameters. That means that the value of an NCAA player's assist according to my model does not depend at all on how many rebounds he gets, or how many points he scores.

It turns out that this assumption is absolutely crucial. Without it, the space of possible models is too large relative to the available sample size. In general, if you have an N-dimensional problem (and all the dimensions are important), you need at least 2^N data points for general smooth manifold fitting methods (like machine learning type stuff) to work. Here, N>15 (at a minimum), so we would need a sample size of around 50,000.

One way to visualize this problem is as follows. Picture a thousand points in one dimension, i.e. on a line. Only two points are "on the frontier"; the maximum and the minimum. All other points have neighbors on either side. Now picture them in two dimensions, on a plane. Now dozens of points around the perimeter of this blob are on the frontier. By the time you get to 15 dimensions, virtually all of the points are on the frontier, and very few have neighbors nearby.

The assumption that there are no interaction terms avoids this problem nicely by effectively breaking up the 15-dimensional problem into 15 one-dimensional problems, where a sample size on the order of a thousand is more than enough for a smooth curve fit (which is what I do).

In reality, it's a big enough sample size to get away with *some* interaction terms, but this is tricky business and often ends badly. The classic example is the assists*rebounds interaction term in basketball-reference's box plus/minus. At the time it seemed clever; the top 6 seasons of all time belonged to LeBron and MJ. Then, last year Russ had his triple double season and shattered the all-time BPM record, and the reputation of BPM as a stat. I chose to keep it simple and safe, and avoid any dabbling in interaction terms.



The last important thing my model does is estimate uncertainties in its predictions, something notably lacking from most such models people have published. This illuminates some of the strengths and weaknesses of my model. For example, the largest contributor to uncertainty is made two-pointers, that is, my model is generally less accurate in predicting players who make a lot of two pointers. This makes some sense; a player's two pointers made per game (even taken together with two point percentage, or equivalently two pointers missed) falls far short of describing how good a scorer the player really is inside the arc. There's just not enough information in this part of the box score to properly evaluate a player.



Some more minor things:

-All stats are per possession. I also include height, and minutes per game.

-I assume a quadratic aging curve. I found that on offense, better prospects actually follow a steeper aging curve than worse prospects, and I accounted for this as well.

-My sample only goes through the 2012 draft. This hurts my sample size, and also hurts because my model is really tuned to predict how players entering the NBA a decade ago would be expected to perform. Obviously the NBA has changed since then and I have no way of adjusting for that.

-My sample only includes prospects that went on to play significant NBA minutes, so it suffers from "survivor bias" and therefore tends to be slightly too optimistic in its projections. How to correct for this is an interesting question in its own right that I won't get into for now (but could talk more about if you're interested).

-My model has some interesting artifacts because of the relatively large uncertainties in the coefficients I mentioned earlier. For instance, made two pointers have a slight (not statistically significant) negative value. In reality, they probably have (at least) a slight positive value. I could manually correct things like this to make my model slightly better, but that's obviously a slippery slope toward tweaking and tuning my model in retrospect to make it look like I think it "should." So I decided to just let it be, even in cases where the helpful tweak is obvious.

EDIT 2: Forgot one other note

-My model doesn't account for strength of schedule or team strength.
That's pretty dope you got all that on your own but damn this makes me wonder what kinda next level shit NBA teams have. We need some AIs to parse through all the SportsVu stuff for us and tell us if LBJ was actually better than MJ LMAO
 
ESPN has moved Ayton to #1 over Doncic.

Bagley down to 5.

Jaren Jackson up to 4.

Wendell Carter up to 7.

Trae Young down to 8.

Aaron Holiday up to 23.

Jacob Evans up to 24.

Tony Carr now at 42.

Brandon McCoy down to 47.

Moritz Wagner up to 55.

Shamorie Ponds appears at 58.

Goga Bitadze down to 61.
 
ESPN has moved Ayton to #1 over Doncic.

I'm still trying to figure out how people have Ayton over Bagley, let alone Doncic. Bagley's a year younger than him and has looked better against much stronger competition. Ayton...is taller, I guess?
 
I'm still trying to figure out how people have Ayton over Bagley, let alone Doncic. Bagley's a year younger than him and has looked better against much stronger competition. Ayton...is taller, I guess?

What does Bagley do better than Ayton besides getting off the floor quickly?
 

Rubber Rim Job Podcast Video

Episode 3-14: "Time for Playoff Vengeance on Mickey"

Rubber Rim Job Podcast Spotify

Episode 3:14: " Time for Playoff Vengeance on Mickey."
Top