Thursday, October 24, 2013

Predictive analysis and OK Cupid

Recently, I was chatting with a friend of mine as to how dumb the idea of matrimonial site is and how it can make one depressing. While we were chatting she had an interesting technical take on the whole system. Here is her post on the subject. Also, she is the first guest writer on my blog. Madhuri Haraway.

Madhuri was my junior in MSRIT and completed her masters form USC. She writes more regularly here .

Recently I was going through the TIME magazine's list of ' 100 most influential people in the world' . The list featured Sam Yagan one of the co-founders of the dating website OK Cupid. A note below his name stated that ' What do you get when you combine big data, the quest for love and complete irrelevance? The hippest spot  on the internet: OKC!' 

Though I had heard of dating websites using predictive/ big data analytics I had never  thought about how they collect data and come up with statistical models to predict the compatibility between two people. I started looking online about how OKC collects data. ( Ok you may think I am a nerd but predictive analytics fascinate me! This is what I do for a living and derive immense joy out of it. I love to think of ways to collect data, preprocess it ,come up with the best statistical models and optimize it. Well I won't bore you anymore with my tech talk but it was amusing to know the love of my life is used to help people find love.). I read that they have an extensive questionnaire system which people take and based on their responses they are matched up with various profiles.  To put it lay man's terms the answers become the response variable or the output and the questions become the predictor variables or inputs. This input- output form is then mapped on to a mathematical equation. Basically when people give answers, the server takes in these answers converts into a suitable form, plugs it into an equation and outputs a value. This output value for each person is used to determine the compatibility between two people.

The data modeler aspect of me was curious to know about the questions asked. So I registered on OKC in order to take their through questionnaire , to better understand the data that is being collected and how I would go about designing models for such data. ( I admit I am a data junkie. I get my fix by studying various complex algorithms to process data) As I filled out the questionnaire I realized I had not been very truthful with my answers.  I mentioned in my profile that I was looking for a casual fling (which is a blatant lie because I don't have the guts to go through with it). I answered that looks are more important than a man's smartness. Before you go ahead and judge me if casual  fling is what I am looking for-- I sure can judge the book by its cover. In the middle of answering 158th question an odd thought occurred to me--  sometimes people may twist their answers in order to make them look appealing. The algorithms can't account for this human ambiguity and takes the answers as is. This introduces certain errors/bias in the data and could lead to wrong predictions.  That is when I decided to stop answering the questions and delete my profile because I could not take the predictions the system made for me at face value. It wasn't fun anymore as I did not trust OKC's suggestions.

However this approach to dating reminded me of the middle aged 'aunties' in India who tried to fix me up with the so called ' perfect guy'. I am sure their approach at matching two people is not as sophisticated as OKC but the essence remains same. They collect data about a guy(s) and his family (his qualifications, job, salary, properties owned, how many times he goes to the temple, his family status etc etc) by exchanging details (a.k.a gossip) in social gatherings and try to compare this detail with girl(s) they know of. Next they rank the girls based on what they determine as compatibility and start approaching the highest ranked girl in the list. If it doesn't work out they try the second highest and process continues. So I concluded that OKC is a high tech, 'amoral' (indian aunties don't believe in setting up people for casual sex because they consider it amoral and would take offense if I compared the dating site to the ' Noble work' they do) version of middle aged indian ' marriage broker' aunties. 

P.S. Sam Yagan went to Harvard and Stanford and turned his talents (he was a math major at Harvard) to help people not as lucky as him in finding love (he married his high school sweetheart), made money and featured in the TIME list. I applause his efforts as it has helped bring many people closer. 

P.P.S: I am extremely curious to know how my friends will react if I add OKC as my current employer on Facebook. Ah well! I suppose blatant lying on FB is not a good idea. 
