## What’s Wrong with OKCupid’s Matching Algorithm

OKCupid is using the wrong mathematics to match potential dates together. But before I critique them, let me compliment them on what they’re doing right:

• “Our” mutual score is the geometric average of your score of me, and my score of you.
• They low-ball the match % until they have enough statistical confidence in the number of questions we’ve both answered.
• Questions come from users as well as staff. So they avoid some potential blind spots. (crowdsourcing)
• OKCupid prompts you with questions that have the greatest chance of distinguishing you as quickly as possible. (maximally separating hyperplanes) If OKC already knows you want your date to shower at least once a day, keep a clean room, and that picking food from the trashcan is unacceptable, it won’t ask if you prefer crustpunks or gutterpunks.
• You don’t have to be the same as me for us to match. I get to specify what answers I want from you.
• They use a logarithmic scale of importance. Logs are the natural way we perceive levels or categories of importance. (For example “categories” of how big a war was, emerge naturally when you take the log of number of deaths.)
• It’s simple. At least they’re not using a non-linear Bayesian splitting tree didactogram or some other hunky machine-learning jiu jitsu.

But, there’s still room for improvement. Particularly the following critique, originally made by Becky Russoniello. Currently, OKCupid is set up to award high scores just for being not-a-terrible match. That’s bad.

HOW MATCH PERCENTAGES WORK NOW

To show why I need to first detail how your score of me is calculated:

1. You answer questions like, “Is homosexuality a sin?” Your answer consists of: (a) what you think, (b) what answer/s are acceptable for me to give, and (c) how important it is for me to get this question “right” per your definition.
2. The question’s importance draws from {Mandatory, Very Important, Somewhat Important, A Little Important, Irrelevant} which biject to the numbers {250, 50, 10, 1, 0}.
3. If I get a Very Important question “right”, I get 50/50 points, and if I get a Very Important question “wrong”, I get 0/50 points. If I haven’t answered the Very Important question, I get 0/0 points — neither penalised nor rewarded.

For more details, see their FAAAQ.

THE PROBLEM

Here’s the important flaw: the denominator grows as long as we’ve answered the same question. In practice, the Mandatory questions both

1. crowd out more interesting differentiators, and
2. inflate the scores of people who merely have tolerable political views.

To demonstrate this, I’ll share some of the Mandatory questions from my own OKCupid profile.

• Do you think homosexuality is a sin?
• How often are you open with your feelings? (can’t be Rarely or Never)
• Would it bother you if your boss was minority, female, or gay?
• Would you write your child’s college entry essay?
• What volume level do you prefer when listening to music? (can’t be “I prefer not to listen to music”)
• Would you try to control your mate with threats of suicide?
• Gay marriage — should it be legal?
• Are you married, engaged to be married, or in a relationship that you believe will lead to marriage?
• How important to you is a match’s sense of humor? (can’t be Not Important)
• Would the world be a better place if people with low IQ’s were not allowed to reproduce?

Some other doozies which I might wrongly make Mandatory include:

• Which is bigger? The Earth, or the Sun?
• How many continents are there?
• Do you consider astrology to be a legitimate science?

The problem with all of these filters, is that I mean them to act only in a negative direction. (Could I call them “quasi-filters”?)

NON-TERRIBLE ≠ GOOD

In other words, someone doesn’t become a great potential match simply because they’re not

• a bigot,
• a cheat,
• a eugenicist,
• or a depressive manipulative.

You need to receive those check-marks just to get to zero with me. You also need to be not-married-to-someone-else. That doesn’t win you plus points, it’s just a requirement. But under the current OKCupid schema, you do win 250/250 from me for simply being available. Oops.

Likewise, knowing basic facts from grade-school seems, like, uh, necessary. But, even if somebody thinks there are 6 or 8 continents, do you really think you won’t be able to tell once they message you?

Few people will be culled by the Continents question, and if you make 10 such easy questions Mandatory, then everybody else will start with 2500/2500 points — so the rest of your match questions will barely distinguish one from the other. Even the Very Important questions (50 points apiece) will only budge the score a little below a default of 100%. And the Somewhat Important questions, which tend to be the more discriminative ones, are mowed down by the juggernaught of Easy Questions.

EDIT (23 NOV): According to the comments, the number of continents is not a universal fact, but rather varies from culture to culture (and within cultures). So that’s a really terrible question to make Mandatory! I should have said above Few people will be culled by asking whether the Earth is bigger than the sun, and if you make 10 such easy questions Mandatory, then everybody else will start with 2500/2500 points.

OKCupid asks other, more useful questions, like:

• Are you annoyed by people who are super logical?
• Do you like abstract art?
• Do you spend more money on clothes, or food?
• Could you tolerate a ___________________ [my political / religious views] ?
• Do you like dogs?

which would actually distinguish among potential dates for me. Let’s face it: I write a blog about mathematics, so someone who is annoyed by super logical people is probably going to dislike me. And, I like abstract art. Maybe we could go to a gala for our first date.

Although everyone knows there are 7 continents the Sun is bigger than the Earth, not everyone is bothered by “logical” personalities. So those questions better sort the available dates.

SPAMMABLE

The worst side effect of the current scoring system, is that a spammer could easily answer only the questions with obvious answers (basic facts and display of non-bigotry) and get a decently high match percentage with a lot of people. At which point, the spammer uploads a picture of an attractive guy/girl, writes some generic profile text, and scams away.

THEY CAN MAKE THE SYSTEM BETTER

I think a better model oft how people evaluate potential dates can be found within economics. Specifically, Kahneman & Tversky’s Prospect Theory:

The main lessons I draw from prospect theory, as a theory of psychology, are:

1. We evaluate things based on a reference point (“zero”).
2. Small perceived negatives are twice as bad, as small perceived positives are good (“local kink at zero”).
3. Really bad or really good, we lose our ability to coherently measure how far from zero (“log-like at great distances”).

How does P.T. apply to dating and OKCupid?

Bigots, cheats, eugenicists, and depressive manipulatives are way off in negative land. I’m not even interested in meeting them. I don’t care whether OKC gives them a 0% or a 10%, because those are effectively the same to me: ignore. I only need OKCupid to accurately score people who are somewhere north of my reference point.

• What if the scoring system simply binned everyone below 50%? They could all be labelled “non-match” and then twice as many numbers would be available to grade the remaining candidates.

That’s a mathematically good idea, but doesn’t address the issue of dilution. And, it seems to ignore an aspect of “numbers psychology”: people like using only the upper half of the scale. Think about how people use the hotness scale: they would never be comfortable dating a 4.

• What if OKCupid revamped their whole framework along the lines of Prospect Theory? Try to establish a reference point, do some research into psychology papers that bear on the topic, and so on.

Well, it might be cool. But that’s a lot of work, and OKC is already successful. Big changes alienate users.

Here’s the simplest solution I can think of — which requires no UI changes and no research. In fact an OKC developer should only need to amend one line of code.

• Mandatory questions can only give out negative points for answering wrong. No plus points for right answers to Mandatory.

Mathematically this is ugly because you introduce a discontinuity — but, so what? I think this is what the broad majority of people mean when they say something is mandatory. If you have a mandatory employee meeting, do people get a bonus for showing up? Does HM Revenue pat you on the back for paying tax?

In the eloquent phrasing of Chris Rock:

If OKC ends out giving some negative (or I guess imaginary, under the square root from the geometric average) scores, so what? I was ignoring everybody under 60% anyway.

YOU CAN MAKE YOUR SCORES BETTER

If you use OKCupid, there is a way to improve your matches even if they never change their matching algorithm:

• Lower the importance of questions with obvious answers. I bet you won’t start matching with people who believe the Earth is larger than the Sun. And you will pick up extra precision in matches with other people.
• Even if something is mandatory for you to date someone, don’t use the Mandatory category like that. Maybe you can have a few mandatory questions, but overall it just dilutes the scoring.