Results 1 to 11 of 11

Thread: Context mixing for recommendation engines

  1. #1
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts

    Context mixing for recommendation engines

    Random thought--

    A recommendation engine should be a mixture of at least these contexts:

    1. similarity to previous choices
    2. overall popularity (or some measure of quality): recommends most popular items
    3. novelty (newness): recommends newest items

    The recommendation engines I've encountered seem to be stuck on #1, so they just keep feeding you more of the stuff you've chosen already. They are seemingly unable to learn concepts like "user has eclectic taste with a preference for novelty" or "user has eclectic taste with a preference for quality."

    It could be that this has been done and I just haven't noticed it. I don't buy much online lately.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    There're certainly lots of works in this area: https://en.wikipedia.org/wiki/Recommender_system
    So if recommendations on, say, amazon, don't seem to be useful for you, it can be one of the following:
    1. There're whatever "political" reasons to fake recommendations, because being useful to the customer
    is not necessarily profitable for the marketplace.
    2. Its actually more profitable to optimize recommendations for resellers, who buy same things many times.
    3. They're just dumb and lazy.

  3. #3
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    I know that there has been a lot of work in that area. I think they generally try to come up with a monolithic solution that tries to learn every rule from the data.* Some rules are more learnable than others, so those kinds of rules dominate the recommendations.

    I was thinking that the context mixing approach would be fairly ideal for adding in rules that are hard to learn, like "user prefers new music" or "user prefers popular mainstream music".

    Again, I'm not that up to date, so this could have already happened.

    * Or give things attributes and try to learn what attributes factor into each user's imaginary preference function, apparently. Novelty and popularity wouldn't really work well as attributes.

  4. #4
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    I skimmed over the wikipedia entry for recommender system, and I can't say for certain whether they are already doing this. The hybrid systems seem to use a kind of mixing. But I don't see evidence that they're mixing in simple rules like novelty and popularity. Everyone prefers popular things to some extent and new things to some extent. Some people have niche interests and other people just prefer whatever is popular and new.

    I don't want to belabor this subject too much, because I don't know much about it. I was sort of hoping that, e.g. Matt might know something.

  5. #5
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Now that I think about it, making recommendations is a very similar problem to compression. You could think of items as symbols, and then the task is to assign each symbol (item) a probability of being next based on various different contexts.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Yes, its exactly like compression.
    Also, compression forces you to optimize for codelength, which is called logistic regression in statistics,
    and is usually a superior choice, comparing to common least-mean-square optimization.

    Btw, entropy coding frequently appears in most of the areas, when you need some kind of optimization.
    For example, a perfect string sort can be implemented as compressing the strings, then applying radix sort.

  7. #7
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Shelwien View Post
    Yes, its exactly like compression.
    Also, compression forces you to optimize for codelength, which is called logistic regression in statistics,
    and is usually a superior choice, comparing to common least-mean-square optimization.

    Btw, entropy coding frequently appears in most of the areas, when you need some kind of optimization.
    For example, a perfect string sort can be implemented as compressing the strings, then applying radix sort.
    I'm not sure I follow everything you've said, but, based on the wikipedia for logistic regression, I think the recommender case would be an ordinary linear regression, because the dependent variable is a continuous probability (or score).

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Its not a single probability, but a string probability (aka likelihood), which is equal to exponent of negative codelength.
    Use of logistic regression in compression is also not the most obvious choice, but paq shows that its significantly better.

    https://en.wikipedia.org/wiki/Maximu...ood_estimation
    http://stats.stackexchange.com/quest...ta-compression

  9. The Following 2 Users Say Thank You to Shelwien For This Useful Post:

    nburns (9th January 2017),RamiroCruzo (9th January 2017)

  10. #9
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    I took a look at the Netflix Prize. The goal of that contest was to predict how a user would rate a movie.

    I think that solves a slightly different problem than what I'm trying to solve. Being able to predict how a user would rate a movie *after* seeing it allows you to recommend movies that a user would *love*. I'm approaching the problem as one of trying to recommend movies that a user *will watch*.

    Recommending movies that a user would *love* is perhaps deeper and more interesting... but finding movies that the user will watch could be more tractable and actually more useful.

  11. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    I don't see much difference from modelling perspective.
    In your case you'd get lists of movies that users watched, and try to predict what else they'd watch.
    Same as if the rate had only a single value.

    If there's no other info, I guess you can start by collecting contextual stats, like which movies are watched by users, which had X in the list.

    Also you can develop it as a compressor which has to compress these lists.

  12. #11
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Shelwien View Post
    I don't see much difference from modelling perspective.
    In your case you'd get lists of movies that users watched, and try to predict what else they'd watch.
    Same as if the rate had only a single value.

    If there's no other info, I guess you can start by collecting contextual stats, like which movies are watched by users, which had X in the list.
    I was even thinking of looking at the user-movie data as sequences rather than just unordered sets. People might tend to watch certain movies in sequence, like Godfather -> Godfather II, or Back to the Future -> something else with Michael J. Fox. So that's even more like compressing text.

    Also you can develop it as a compressor which has to compress these lists.
    Right, you could literally develop it as a compressor for user movie logs.

Similar Threads

  1. EMMA - Context Mixing Compressor
    By mpais in forum Data Compression
    Replies: 360
    Last Post: 3rd February 2019, 07:56
  2. Context Mixing
    By Cyan in forum Data Compression
    Replies: 9
    Last Post: 23rd December 2010, 21:45
  3. Simple bytewise context mixing demo
    By Shelwien in forum Data Compression
    Replies: 11
    Last Post: 27th January 2010, 04:12
  4. Context mixing
    By Cyan in forum Data Compression
    Replies: 7
    Last Post: 4th December 2009, 19:12
  5. CMM fast context mixing compressor
    By toffer in forum Forum Archive
    Replies: 171
    Last Post: 24th April 2008, 13:57

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •