Wednesday, May 19, 2010

paper Personality Similarity calculation

If you have seen this paper


paper "A Novel K-Means Based Clustering Algorithm for High Dimensional Data Sets"

"We conduct our experiments on a data set which data is gathered from PEIVAND web site. This web site is for finding suitable partners who are very similar from point of personality's view for a person. Based on 8 pages of psychiatric questions personality of people for different aspects is extracted. Each group of questions is related to one dimension of personality. To trust of user some questions is considered and caused reliability of answers are increased. Data are organized in a table with 90 columns for attributes of people and 704 rows which are for samples. There are missing values in this table because some questions have not been answered, so we replaced them with 0. On the other hand we need to calculate length of each vector base on its dimensions for further process. All attributes value in this table is ordinal and we arranged them with value from 1 to 5, therefore normalizing has not been done. There is not any correlation among attributes and it concretes an orthogonal space for using Euclidean distance. All samples are included same number of attributes."

do not use the formula proposed in that paper because your Online Dating Site will reach "as low as" 3 to 4 persons high compatible per 1,000 persons screened, so in a 1,000,000 women database, any man will see as many as 3,000 to 4,000 women to contact (nearly at the same time), that means, a whole precision LESS than anyone could achieve by searching on one's own!

No comments:

Post a Comment