Now for the fun part
Most likely the normalization will happen after a song finishes playing or is skipped, after rating modifications are provided. At the moment I'm trying to get some data from a smaller library to decide on how to model the normalization. My goal is that eventually the ratings modified by normalization and play/skip etc can be a good indicator of what songs are currently most appreciated. I'm also looking at tracking related information, like highest rating achieved, rating decay rate, and probably a few others for individual songs, and attention spread over [period] for libraries. I'd be using these for various custom Weighted Random systems.
Ok, so I've been working on my datasets, and I'm quite happy with the data model. The current
theory is that after EpicRating changes a rating (it should not be done after a manual rating change) that calculations will be done to adjust ALL ratings. This will very likely be done with some fairly simple calculations (mean, std dev, z-score, percentile re-conversion).
The issue is that songs that hold the same rating have nothing to distinguish them from each other, so the results aren't as spread as they ideally would be for a sustainable rating system. The conclusion that I've reached is that statistical diversity needs to be introduced, and done in such a way as to keep the differences meaningful without them being large. (call it RatingScore?)
At the moment that would likely mean using
a meaningless measure rule 2 (below) to resolve rating-space conflicts. My hope is that once GMB starts keeping a log of activities, the rating normalization can pull some information from logs to resolve the issue.
Example:
Let's say 5 songs have a rating of 75. In order to place them properly, ideally we want them each to have a different number. If GMB does
not have a log of events, then any statistical spread would need to be introduced through the Most Recently Played (say, have it add some small decimal number based on how recently it was played). This will not resolve ALL conflicts, but should resolve most of them.
The most meaningful resolution would be if GMB does keep a log. Current theoretical model:
1) If multiple songs are rated "75" -> Check to see if they have a "trend" (previous rating change). If they have a positive trend history (song used to be rated lower, is now rated higher), add 0.1 to the Rating Score. If they have a negative trend history, subtract 0.1 from it.
In plotting how to change a rating, previous rating changes are the most relevant modifier available.
2) If there are still multiple songs with the same rating score (which means they share the same trend/lack of trend with the other "conflicts), decide the matter by adding 0.09 to the most recently played, 0.089 to the next most recent, etc, until conflicts are resolved.
3) If there are STILL songs with conflicts, it means they have no trend and have not been played (IE, rating was set manually, song was not played/skipped) then add 0.005 to the song that has been modified (in any way) the most recently, 0.0049 to the next, etc.
4) If there remains any conflict, it means that the songs in question have been set manually as a group (modified at the same time) and have never been played/skipped. In this case, add 0.0001, 0.0002, etc based on some arbitrary value (hash of song? file name?)
What this will do is create a set of rating scores that are not shared by any 2 songs, allowing us to take full advantage of the z-score normalization. Small differences (decimal) will have a minor impact, and the more/longer this system is used, the less later rules will be called on to resolve conflicts.
Hopefully we'll be getting the non-log version of this up and running soon, and get in some actual tests to make sure that it follows the data models. This has been tested on a library of 10k songs in the following play configurations: random, weighted random, filtered to play specific range of rating, filtered to groups of artists, filtered to specific album(s)/artist(s), and a few others. I'm keeping my fingers crossed. Hopefully some good news soon!
Dan