Thursday, January 8, 2015

PAPER Research Paper Recommender Systems: A Literature Survey

Abstract. In the past sixteen years, more than 200 research articles were published about research-paper recommender systems. We reviewed these articles and present some descriptive statistics in this paper, as well as a discussion about the major advancements and shortcomings and an overview of the most common recommendation concepts and techniques. One thing we found was that more than half of the recommendation approaches applied content-based filtering (55%). Collaborative filtering was applied only by 18% of the reviewed approaches, and graph-based recommendations by 16%. Other recommendation concepts included stereotyping, item-centric recommendations, and hybrid recommendations. The content-based filtering approaches mainly utilized papers that the users had authored, tagged, browsed, or downloaded. TF-IDF was the weighting scheme most applied. In addition to simple terms, n- grams, topics, and citations were utilized to model the users’ information needs. Our review revealed some serious limitations of the current research. First, it remains unclear which recommendation concepts and techniques are most promising. For instance, researchers reported different results on the performance of content-based and collaborative filtering. Sometimes content-based filtering performed better than collaborative filtering and sometimes it was the opposite. We identified three potential reasons for the ambiguity of the results.

A) Many of the evaluations were inadequate. They were based on strongly pruned datasets, few participants in user studies, or did not use appropriate baselines. 

B) Most authors provided sparse information on their algorithms, which makes it difficult to re-implement the approaches. Consequently, researchers use different implementations of the same recommendations approaches, which might lead to variations in the results.
C) We speculated that minor variations in datasets, algorithms, or user populations inevitably lead to strong variations in the performance of the approaches.
Hence, finding the most promising approaches becomes nearly impossible. As a second limitation, we noted that many authors neglected to take into account factors other than accuracy, for example overall user satisfaction and satisfaction of developers. In addition, most approaches (81%) neglected the user modeling process and did not infer information automatically but let users provide some keywords, text snippets, or a single paper as input. Only for 10% of the approaches was information on runtime provided. Finally, barely any of the research had an impact on research-paper recommender systems in practice, which mostly use simple recommendation approaches, ignoring the research results of the past decade. We also identified a lack of authorities and persistence: 73% of the authors wrote only a single paper on research-paper recommender systems, and there was barely any cooperation among different co- author groups. We conclude that several actions need to be taken to improve the situation: developing a common evaluation framework, agreement on the information to include in research papers, a stronger focus on non-accuracy aspects and user modeling, a platform for researchers to exchange information, and an open-source framework that bundles the available recommendation approaches.

Please remember:
Similarity is a word that has different meanings for different persons or companies, it exactly depends on how mathematically is defined. In case you had not noticed, recommender systems are morphing to .......... compatibility matching engines, as the same used in the Online Dating Industry since years, with low success rates until now because they mostly use the BIG 5 to assess personality and the Pearson correlation coefficient to calculate similarity.
The BIG 5 (Big Five) normative personality test is obsolete. The HEXACO (a.k.a. Big Six) is another oversimplification. Online Dating sites have very big databases, in the range of 20,000,000 (twenty million) profiles, so the BIG 5 model or the HEXACO model are not enough for predictive purposes. That is why I suggest the 16PF5 test instead and another method to calculate similarity. I calculate similarity in personality patterns with (a proprietary) pattern recognition by correlation method. It takes into account the score and the trend to score of any pattern. Also it takes into account women under hormonal treatment because several studies showed contraceptive pills users make different mate choices, on average, compared to non-users. "Only short-term but not long-term partner preferences tend to vary with the menstrual cycle".
If you want to be first in the "personalization arena" == Personality Based Recommender Systems, you should understand the ............ Online Dating Industry first of all!

Please see: "How to calculate personality similarity between users"
Short answer: the key is the ENSEMBLE!
(the whole set of different valid possibilities)

Worldwide there are over 5,000 online dating sites, no one uses the 16PF5, no one is scientifically proven yet, and no one can show you compatibility distribution curves, i.e. if you are a man seeking women, to show how compatible you are with a 20,000,000 women database, and to select a bunch of 100 women from 20,000,000 women database.

Please read also
An exercise of similarity.
How LIFEPROJECT METHOD calculates similarity.
Personality Distribution Curves using the NORMATIVE 16PF5.
Innovations: to take the 16PF5 test 3 times.
Why your brain distorts!

No comments:

Post a Comment