The Netflix Prize was awarded to the team with the algorithm that most accurately guessed people’s movie tastes. Accurate, according to some measure: root-mean-squared error, or the L2 norm.
In my opinion, that’s the wrong measure of success. Netflix selected for algorithms that predicted well across all data, penalizing large misses extra.
The best algorithm, I think, should observe my tastes and recommend just one product that I’ve never heard of (or at least never tried), that I absolutely love. It’s OK if I like a movie and you show me another one by the same director — but I could have done that myself. The best algorithm would say:
You like Cowboy Bebop + Out Of Africa + Winged Migration so you will like = Seven Samurai.
That would be true in my case. Cowboy Bebop indicates that I like Asian sh*t; Out Of Africa is an old classic; Winged Migration doesn’t have a lot of talking
In other words,
- only the “most recommended” movie matters
- it should blow me away
- it should be surprising.
RMSE fails #1 because success in the highest recommendation matters
As a result, today’s recommendation engines are conservative in the wrong ways and basically hack together machine learning fads.