Designing A Points/Probability System

Started by
3 comments, last by Kylotan 10 years, 11 months ago

[Removed]

Advertisement
A recommendation system shouldn't be based on categories; work at the level of individual SKUs. For example, Amazon is very explicit about what it proposes: "customers who bought/browsed this also bought/browsed ...". Recommend the items which are correlated in the real world, not in your abstract and arbitrary category system.

Omae Wa Mou Shindeiru

[Removed]

Shaquil, on 17 May 2013 - 07:06, said:

LorenzoGatti, on 17 May 2013 - 02:04, said:
A recommendation system shouldn't be based on categories; work at the level of individual SKUs. For example, Amazon is very explicit about what it proposes: "customers who bought/browsed this also bought/browsed ...". Recommend the items which are correlated in the real world, not in your abstract and arbitrary category system.


Relax. Amazon's recommendation engine sucks particularly because it only considers what you and other people bought, but has no idea why you bought them. There are many, many factors that contribute to bulk/related purchases that could have nothing to do with an individual user's tastes. The only correlation that matters is in the customer's mind; the real world doesn't matter. Working at the level of individual SKUs has got to be the craziest idea I've ever heard. You do realize that all 15 different Anniversary Editions of Atlas Shrugged are the same book but different SKUs?

Anyway, I'm looking for advice/a point in the right direction on designing the system in general. I came up with one method that I'm going to go with for now, though I don't like it:

When users sign up for the service, they indicate things about themselves. Favorite genre of music/movie/books, their job, their favorite celebrities, etc, etc. Using that info, points will be attributed to appropriate categories of books. When the user loads up the main page where the recommendations will be, they'll see 10 recommendations. Every single recommendation is calculated individually as a random draw. The system adds up all the points, and any individual category's percentage of being chosen is equal to its percentage of the total points. When a category wins, the system chooses a book from that category based on other rules that are irrelevant for now. On the next recommendation, the category that won the previous one is deducted a certain amount of points (its original amount of points will stay the same in the database, but the actual value we're using in the system will be decreased), and all losing categories will gain the same amount of points. The random drawing is then done again. The point is that the categories with 0 points will never get the first recommendation, and rarely get in the next 4 or 5, but might possibly grab the 10th spot at the end. A category with 0 points should have a chance to win because 0 doesn't mean "dislike," it just means "has expressed no interest." Negative points means dislike.

I recognize that the more categories there are, the more drastically the system will alter percentages for each next "drawing," but that's why I'm looking for input/other ideas, because it isn't very ideal.

I think you should largely abandon the focus on categories and genres. If users are rating things, then go for some variant of the Netflix model, which looks for correlations between ratings for different products, and uses those correlations to drive recommendations (e.g. people who like X and dislike Y usually like Z; this user likes X and dislikes Y; therefore this user will probably like Z). In short, in your case, users should be rating books, not categories of book.

Anything broader than specific titles is going to have a lot of false positives and negatives, significantly more so than even the Amazon model (which has the drawback of being, in all honesty, an afterthought). Not to mention that a genre/category system relies very heavily on your ability, as the operator, to create an effective categorization system; a per-product correlational system just works, without the operator needing to have any familiarity whatsoever with the products in question.

Furthermore, a correlational system deals far better with users like myself - who have tastes that span across multiple genres, but very specific likes and dislikes within those genres - than the system you described (it also generally better serves people with niche tastes). If I were faced with a question like "what is your favorite genre of X" I would just leave, because a question like that is basically meaningless to me (rating my interest in different film genres was the part of signing up for Netflix that I hated - and ultimately the least important part, because the category ratings are mostly used to determine what categories are shown on your home page), though I have absolutely no problem answering whether I liked or disliked a particular product (in fact, continuing with the Netflix example, I have rated so many movies and TV shows that the predicted rating for a movie I haven't watched is generally incredibly close to the rating I wind up giving after watching).

Knowing the users specific likes and dislikes is a far more effective predictive tool that simply knowing broad genre preferences, which are a very crude instrument at best. To use video game examples, knowing that I like Super Mario Bros 2 and 3 tells you something about what other games I might like; knowing that those are the only Super Mario Bros games I like tell you even more, but in either case, that knowledge is far more useful than simply knowing that I like 2D sidescrolling platformers. Knowing that I like the Street Fighter, Guilty Gear, and Fatal Fury franchises tells you something else about my preferences, and knowing that I also detest Mortal Kombat tells you something else. In fact, knowing that I hate Mortal Kombat, despite liking other fighting games, other M-rated games, and other gory games, should allow you to make much better recommendations than simply knowing that I like fighting games, don't particularly care what a game is rated, and don't care one way or the other whether a game is particularly gory or violent.

Note that this is a preference pattern that only the finest-toothed categorization system will even be able to pick up on - there are two specific, categorizable reasons that I don't care for Mortal Kombat, but you have to be familiar with fighting games as a genre to even know about the categories in question, and extremely pedantic to actually include them in your categorization system.

And if you do find genre/category preferences important, you can derive those from a person's specific preferences, anyway, and more accurately than by asking about their genre preferences; when asked about catergory preferences, people tend to overstate their interest in certain categories, often depending on the "social status" of those categories (a variant of the stated preference vs. revealed preference issue - an example of this, again going back to Netflix, is that people are far more likely to put certain categories of movie, like Oscar-winners, in their Netflix queue than they are to actually watch them).

Ultimately, finding new media by category is easy enough as it is, so a category-based recommendation engine provides relatively little added value, the high rate of false positives will kill your users' opinion of your service, and the high rate of false negatives will rob you of opportunities to pleasantly surprise them.

I'm going to say that a genre/category system is fine. Sure, it will be have some disadvantages relative to a direct product correlation system, but the fact is that correlations are only useful on objects you have some data. Genres help you extrapolate that data to new products or products that haven't sold any copies yet, while restricting it to pairs of products a user has rated will limit the useful data to the most popular products - often the exact opposite of what you want.

This topic is closed to new replies.

Advertisement