Friday, October 14, 2016

Very easy to copycat eHarmony, but very difficult to innovate: a 100 times better algorithm than eHarmony.







eHarmony, is a 16+ years old obsolete site and a HOAX, based on a big scientific fraud, only sustained by big marketing budget.
The Big Five normative test had been proven/revealed as an incomplete and incorrect model to assess/measure personality of persons.


From time to time, I receive emails asking me why is too difficult to innovate: to offer a 100 times better algorithm than eHarmony's one.

The answer is quite easy.
In compatibility matching methods there are 2 steps:
1) to objectively measure personality traits (with the 16PF5 test or similar) without distortion.
2) to calculate compatibility (personality similarity) between prospective mates


1) to objectively measure personality traits (with the 16PF5 test or similar) without distortion.
The license of the 16PF 5th version normative personality test is over USD100,000 per language and norm. The IPAT is the owner and has over 60 years of experience in personality testing.
 
https://www.16pf.com/en_US/

Very few people in the World can copycat the 16PF5 test, to construct a similar one than the 16PF5 normative test but improved using Item Response Theory (IRT).  
To calibrate such a normative test, over 1,000 persons (males and females) as test takers, as sample will be needed per language and norm.
The cost of doing the above, to calibrate that normative test, is also in the range of USD100,000 per language and norm.
Normative tests can not simply be translated, because they need the norm for that test, and that norm is updated each and every time Census Figures are released.

To clarify better that situation (a comparison with drivers):
Suppose you are citizen of country "A" where speed limit is 70 mph and the population drives at 65 mph average with 2 mph standard deviation (some persons always surpass the speed limit) in a normal distribution,
and you are tested driving at 67 mph so you are OVER the average in 1 standard deviation, you are at 84.1344% over the whole population.
You will score 8 or 9 in a sten scale depending on how it was constructed.
You will be seen as VERY HIGH in that scale.

If you drive in country "B" where speed limit is 75 mph and the population drives at
71 mph average with 2 mph standard deviation (some persons always surpass the speed limit) in a normal distribution,
and you are tested  again driving at 67 mph,  you are UNDER the average in 2 standard deviation, you are at 02.2751% under the whole population.
You will score 1 in a sten scale.
You will be seen as VERY LOW in that scale.


All normative tests like Big Five, HEXACO, 16PF5 or similar like 15FQ+ to assess/measure personality should be used with the correct norm.
You should have the norm for each country where you operate
and
You should use the norm for each country where you operate.
Example:
* English for the United Kingdom and the Norm for the United Kingdom (sample of individuals with the same demographic characteristics of the United Kingdom).
* English for Ireland and the Norm for Ireland (sample of individuals with the same demographic characteristics of Ireland).
* English for Australia and the Norm for Australia (sample of individuals with the same demographic characteristics of Australia).
* English for New Zealand and the Norm for New Zealand (sample of individuals with the same demographic characteristics of New Zealand)
* English for Canada and the Norm for Canada (sample of individuals with the same demographic characteristics of Canada).
* English for South Africa and the Norm for South Africa (sample of individuals with the same demographic characteristics of South Africa).
* English for United States and the Norm for United States (sample of individuals with the same demographic characteristics of United States)
* French for France and the Norm for France (sample of individuals with the same demographic characteristics of France).
* French for Canada and the Norm for Canada (sample of individuals with the same demographic characteristics of Canada).
* German for Germany and the Norm for Germany (sample of individuals with the same demographic characteristics of Germany).
* Spanish for Spain and the Norm for Spain (sample of individuals with the same demographic characteristics of Spain).
* Italian for Italy and the Norm for Italy (sample of individuals with the same demographic characteristics of Italy).
* Portuguese for Portugal and the Norm for Portugal (sample of individuals with the same demographic characteristics of Portugal).
* Portuguese for Brazil and the Norm for Brazil (sample of individuals with the same demographic characteristics of Brazil).
etc, etc


Please remember:
Online Dating sites OFFERING COMPATIBILITY MATCHING METHODS BASED ON PERSONALITY SIMILARITY have very big databases, in the range of 20,000,000 (twenty million) profiles, so the Big Five model or the HEXACO model are good for guidance purposes but not enough for predictive purposes.
That is why I suggest the 16PF5 normative personality test instead.
The same applies for Personality Based Recommender Systems.

2) to calculate compatibility (personality similarity) between prospective mates
The output of the 16PF5 test are 16 independent variables STens (Standard Tens) taking integer values from 1 to 10. STens divide the score scale into ten units.
STens have the advantage that they enable results to be thought of in terms of bands of scores, rather than absolute raw scores. These bands are narrow enough to distinguish statistically significant differences between candidates, but wide enough not to over emphasize minor differences between candidates.

Similarity is a word that has different meanings for different persons or companies, it exactly depends on how mathematically is defined.

I calculate similarity in personality patterns with (a proprietary) pattern recognition by correlation method. It takes into account the score and the trend to score of any pattern.
E.g.
The pattern 6.7.6.8.9.6.7.7.8.7.2.5.8.7.3.
4 (output of one person 16PF5 test, 16 independent variables STens taking integer values from 1 to 10.) is 74.79865772% +/- 0.00000001% similar to the pattern 5.7.4.8.7.4.5.6.4.6.8.9.6.8.4.4 (output of another person 16PF5 test, 16 independent variables STens taking integer values from 1 to 10.)


Please notice, please remember:

The ENSEMBLE (the whole set of different valid possibilities) of the 16PF5 is: 10E16, big number as All World Population is in the range of 7.5 * 10E9 (estimated October 2016)
(7.5 * 10E9) / 10E16 == 7.5 * 10E(-7) or 0.75 * 10E(-6) or 0.75 micro part!
All World Population is 0.75 micropart of the ensemble of the 16PF5 normative test.
E.g. for the 16PF5 Brazilian version, in Portuguese for Brazil and the Norm for Brazil (sample of individuals with the same demographic characteristics of Brazil).
Brazil population is in the range of 200 Million persons, 200 * 10E6 == 2.0 * 10E8.

Demographic characteristics of Brazil 47.73% White, 43.13% Brown (Multiracial), 7.61% Black, 1.09% Asian, 0.43% Amerindian
The Brazilian version of the test with the Norm in Portuguese for Brazil should be constructed testing over 1,000 persons (males and females) as test takers, as sample, with 47.73% White, 43.13% Brown (Multiracial), 7.61% Black, 1.09% Asian, 0.43% Amerindian persons with age and race representative of the Brazilian population. T
o calibrate that normative test, the cost is in the range of USD100,000
https://en.wikipedia.org/wiki/Demographics_of_Brazil
https://en.wikipedia.org/wiki/Race_and_ethnicity_in_Brazil

(2.0 * 10E8)/ 10E16 == 2.0 * 10E(-8) or 0.020 * 10E(-6) or 0.020 micro part!
Brazilian population is 0.020 micro part of the 16PF5's ensemble.

You can not use simple regression equations to calculate similarity between quantized patterns because if you use
simple regression equations:
* women will "see" men as all the same.
* men will "see" women as all the same.

You will need to use:
a quantized pattern comparison method, part of pattern recognition by cross-correlation, to calculate similarity between prospective mates

Please see:
How LIFEPROJECT METHOD calculates similarity. 
http://onlinedatingsoundbarrier.blogspot.com.ar/2013/03/how-to-calculate-personality-similarity.html
For online dating applications and personality based recommenders persons/daters are "distinguishable" and not "exchangeable".
http://onlinedatingsoundbarrier.blogspot.com.ar/2013/03/paper-on-similarity-between.html
http://onlinedatingsoundbarrier.blogspot.com.ar/2016/08/paper-enhanced-user-modeling-based-on.html
 
The 3 milestone discoveries of the 2001 - 2010 decade for Theories of Romantic Relationships Development are:
I) Several studies showing contraceptive pills users make different mate choices, on average, compared to non-users. "Only short-term but not long-term partner preferences tend to vary with the menstrual cycle"
II) People often report partner preferences that are not compatible with their choices in real life. (Behavioural recommender systems or other system that learns your preferences are useless)
III) What is important in attracting people to one another may not be important in making couples happy. Compatibility is all about a high level on personality similarity between prospective mates for long term mating with commitment.
 
The key to long-lasting romance; COMPATIBILITY is: STRICT PERSONALITY SIMILARITY and not "meet other people with similar interests".  
 
WorldWide, there are over 5,000 -five thousand- online dating sites
but no one is using the 16PF5 (or similar) to assess personality of its members!
but no one calculates similarity with a quantized pattern comparison method!
but no one can show Compatibility Distribution Curves to each and every of its members!
but no one is scientifically proven! 
No actual online dating site  is "scientifically proven" because no one can prove its matching algorithm can match prospective partners who will have more stable and satisfying relationships -and very low divorce rates- than couples matched by chance, astrological destiny, personal preferences, searching on one's own, or other technique as the control group in a peer reviewed Scientific Paper for the majority (over 90%) of its members.
but no one can show you a list of compatible persons like this:
( for a prospective male customer / sample but calculated with real values)
“Over 1,000,000 million women database, here is the list of the 12 more compatible with you. Notice that woman#1 is the most compatible with you but she could be more compatible with other men right now.
woman#01 is 95.58476277% compatible
woman#02 is 95.56224356% compatible
woman#03 is 95.52998273% compatible
woman#04 is 94.18354278% compatible
woman#05 is 93.00453871% compatible
woman#06 is 93.00007524% compatible
woman#07 is 92.99738452% compatible
woman#08 is 92.37945551% compatible
woman#09 is 92.29779173% compatible
woman#10 is 92.27114287% compatible
woman#11 is 92.19515551% compatible
woman#12 is 92.12249558% compatible”


The only way to revolutionize the Online Dating Industry is using the 16PF5 normative personality test, available in different languages to assess personality of members, or a proprietary test with exactly the same traits of the 16PF5 and expressing compatibility with eight decimals (needs a quantized pattern comparison method, part of pattern recognition by cross-correlation, to calculate similarity between prospective mates.)

High precision in matching algorithms is precisely the key to open the door and leave the infancy of compatibility testing.

Without offering the NORMATIVE 16PF5 (or similar test measuring exactly the 16 personality factors) for serious dating, it will be impossible to innovate and revolutionize the Online Dating Industry

The Online Dating Industry does not need a 10% improvement, a 50% improvement or a 100% improvement. It does need "a 100 times better improvement"

 
All other proposals are NOISE and perform as placebo.  

No comments:

Post a Comment