How long would it take you to give a full blown breakdown of your model? Not sure how detailed yours is from the next or how well I'd be able to understand it lol, but I'd be interested in learning about it.
How much of your defensive ratings are based on DRPM for example? It's so difficult to quantify defensive ability from one guy to the next, there's no foolproof stat. Not sure there ever will be.
Wouldn't take too long; have to go at the moment, but will get to it for sure later tonight.
EDIT:
My model attempts to predict NBA adjusted plus/minus. The major benefit of predicting APM, instead of an NBA box score metric, is that my model doesn't inherit any biases at this stage (e.g. a model that tries to predict PER will necessarily end up with all the same biases/flaws of PER). The major drawback is that adjusted plus/minus is a very noisy stat, and the price I pay for trying to predict a noisy stat is relatively high uncertainties in my model coefficients (e.g. the marginal value of an NCAA assist, or rebound). Ultimately, it's a good tradeoff because in most cases the extra uncertainty in my predictions due to uncertainty in model coefficients is small relative to other sources of uncertainty.
My model assumes that there are no interaction terms between parameters. That means that the value of an NCAA player's assist according to my model does not depend at all on how many rebounds he gets, or how many points he scores.
It turns out that this assumption is absolutely crucial. Without it, the space of possible models is too large relative to the available sample size. In general, if you have an N-dimensional problem (and all the dimensions are important), you need at least 2^N data points for general smooth manifold fitting methods (like machine learning type stuff) to work. Here, N>15 (at a minimum), so we would need a sample size of around 50,000.
One way to visualize this problem is as follows. Picture a thousand points in one dimension, i.e. on a line. Only two points are "on the frontier"; the maximum and the minimum. All other points have neighbors on either side. Now picture them in two dimensions, on a plane. Now dozens of points around the perimeter of this blob are on the frontier. By the time you get to 15 dimensions, virtually all of the points are on the frontier, and very few have neighbors nearby.
The assumption that there are no interaction terms avoids this problem nicely by effectively breaking up the 15-dimensional problem into 15 one-dimensional problems, where a sample size on the order of a thousand is more than enough for a smooth curve fit (which is what I do).
In reality, it's a big enough sample size to get away with *some* interaction terms, but this is tricky business and often ends badly. The classic example is the assists*rebounds interaction term in basketball-reference's box plus/minus. At the time it seemed clever; the top 6 seasons of all time belonged to LeBron and MJ. Then, last year Russ had his triple double season and shattered the all-time BPM record, and the reputation of BPM as a stat. I chose to keep it simple and safe, and avoid any dabbling in interaction terms.
The last important thing my model does is estimate uncertainties in its predictions, something notably lacking from most such models people have published. This illuminates some of the strengths and weaknesses of my model. For example, the largest contributor to uncertainty is made two-pointers, that is, my model is generally less accurate in predicting players who make a lot of two pointers. This makes some sense; a player's two pointers made per game (even taken together with two point percentage, or equivalently two pointers missed) falls far short of describing how good a scorer the player really is inside the arc. There's just not enough information in this part of the box score to properly evaluate a player.
Some more minor things:
-All stats are per possession. I also include height, and minutes per game.
-I assume a quadratic aging curve. I found that on offense, better prospects actually follow a steeper aging curve than worse prospects, and I accounted for this as well.
-My sample only goes through the 2012 draft. This hurts my sample size, and also hurts because my model is really tuned to predict how players entering the NBA a decade ago would be expected to perform. Obviously the NBA has changed since then and I have no way of adjusting for that.
-My sample only includes prospects that went on to play significant NBA minutes, so it suffers from "survivor bias" and therefore tends to be slightly too optimistic in its projections. How to correct for this is an interesting question in its own right that I won't get into for now (but could talk more about if you're interested).
-My model has some interesting artifacts because of the relatively large uncertainties in the coefficients I mentioned earlier. For instance, made two pointers have a slight (not statistically significant) negative value. In reality, they probably have (at least) a slight positive value. I could manually correct things like this to make my model slightly better, but that's obviously a slippery slope toward tweaking and tuning my model in retrospect to make it look like I think it "should." So I decided to just let it be, even in cases where the helpful tweak is obvious.
EDIT 2: Forgot one other note
-My model doesn't account for strength of schedule or team strength.