Friday, December 30, 2016

2016 is Finally Ending

2016 was, to say the least, a tumultuous year, marked by numerous conflicts across the world, the British exit from the E.U., an exhausting U.S. political campaign culminating in the election of The Donald, a sharp ascent of Putin's menacing role in the world, and too many other things to even list.  I mention this to note that I haven't missed any of these events, or their importance, but I'll skip most of these in this year's summary in lieu of some more personal, or at least scientific, happenings.  (I do occasionally tweet some political opinions, and if you want to see those, you should follow me there.)

So, in no particular order, here are some things I do want to note as 2016 comes to a close.
  • It's reassuring to remember that while it may not feel like it, the world continues to become a better place as a whole.  If you don't believe me, take a look at the data.  And even if you do believe me, read this book by Steven Pinker.
  • I have been developing my amateur interest in architectural photography.  Below is a recent photo of mine of UIC's University Hall, looming over its surroundings.  While not particularly pleasing to look at, I think it captures "socialist utopian" ideology of the brutalist architecture on display all over campus.
    University Hall, photo by me
  • It has been 25 years since Pretty Good Privacy (PGP) was developed.  Given the recent political happenings, I'd say it's a pretty good time to start using it for sensitive emails.  I installed Mailvelope, and here is my public key; its fingerprint is AC5E DCA0 76A1 F55A 4819 94A9 2FAC ADDD C766 7CB9.
  • Reports of the Russian government influencing our election highlight once again the importance of good security practices, which are horribly lacking throughout most of our companies and the government.  Until we fix this, we will continue to be at the mercy of foreign adversaries, hackers, and (mis)fortune.
    photo from glitch news
  • This year, I graduated my first Ph.D. student, Jeremy Kun.  Jeremy finished in 5 years, though he only started working with me at the end of his second year.  Before graduating, he had the option to do a postdoc in academia, work at Google, or join a startup, and he decided to go the startup route and is now at 21 Inc.  He wrote an interesting blog post about his journey through grad school that I recommend everyone considering math or cs theory grad school to read. (Full disclosure, I think he makes UIC MCS seem a rather nice place, which I agree with, but it's also in my interest to promote it as such to prospective students.)  I also expect to graduate some more students in 2017.
    Jeremy Kun defending his thesis
  • The Man Who Knew Infinity, a movie about Ramanujan, was released in the U.S. this year.  Even though the movie got some biographical details wrong, and even though I found some parts a bit annoying, the mathematical parts were quite accurate.  In particular, I think this movie, more than any other that I've seen, does a pretty decent job of showing to a general audience what it that mathematicians do all day.
  • Two years ago I predicted that a computer program will be able to beat the best human Go players by the year 2020.  AlphaGo reached this milestone this year, and while this technically met my prediction, the speed at which it arrived hasn't helped allay my fears of A.I. posing an existential risk to humanity.  Those of you who haven't given this issue much thought should watch Sam Harris's TED talk on this topic.  Also, I recommend watching Westworld, which I liked both as a show and for some of the nontrivial philosophy that it presents on this topic.
    computers become better than humans at one more thing, image from quartz
  • This year Elon Musk declared that the odds are a billion to 1 that we are living in a simulation. The argument goes like this: eventually we will become advanced enough to simulate worlds ourselves, and the simulated beings won't know they're being simulated (and perhaps eventually make their own simulations), and the number of simulated worlds will vastly outnumber real ones.  My prior doesn't allow for such odds, and I think there are quite a few hidden and probably false assumptions in his argument, but if he's right, it would reveal that we're fundamentally mathematical beings, and that should at the very least make Max Tegmark happy.

Tuesday, December 20, 2016

Counting Our Losses After the Election

After Trump's victory this election, I've seen a number of posts criticizing the "data scientists," all of whom predicted a Clinton victory.  If they all got it wrong, how can they claim to be engaging in science if they won't now change their methods?  And should they change their methods?

the electoral vote outcome as of 12/20/16, image from Wikipedia

I'm not a fan of the hordes of "data scientists" running regressions pretending they're doing "science."  In fact, I'm skeptical of any field doing actual science that has science in its name, computer science included. But I want to defend the some of the forecasts themselves and suggest a way of going forward.

(Also, while I wouldn't blame the pollsters, who have an increasingly hard job these days, one group I have no problem blaming are the political "scientists," who have all these theories about what candidates should and shouldn't do, where advertising helps and where it doesn't, and Trump did none of the things he was "supposed" to do and still won.)

Blame the forecasters?

I don't think there was an honest way to look at the polling, or really, most other publicly available data, and claim that Trump was actually more likely to win than Clinton. The truth is simply that unlikely events occasionally occur, and this was one of them.

While the forecasts all agreed Clinton is the favorite, they assigned her different win probabilities.  Sam Wang (whose forecast I repeatedly dismissed before the election) assigned something like a 99% chance to a Clinton victory.  Nate Silver predicted something like a 2/3 chance to a Clinton victory.  Does that mean that Nate is a better predictor?

Well, still not necessarily.  Unless someone assigned a 100% probability to a Clinton win, we can't know for sure.  Sam Wang could have been closer to the truth, but simply gotten unlucky.  Moreover, people should be rewarded for predicting close to 0% or 100% because those predictions are much more informative.  Nate Silver's prediction might have been well calibrated, but still quite useless.

Consider the following prediction.  I can predict that for the next 10 elections, the candidates of the two major parties have roughly a 50-50 chance of winning.  Since the Democrats and the Republicans roughly win half the time, I'll probably be well calibrated, but my prediction will remain useless.

Count your log-loss

So, ought we throw out hands up in the air and trust everyone equally next time?  No!  Statistics and machine learning have ways of evaluating precisely these things.  We can use something called a loss function (for reasons I won't go into here, I will use the "log-loss" function, but you can use others), where we assign penalties, or losses, for inaccurate predictions.  Whoever accumulates the least loss over time can be thought of as the better predictor.

The binary version of the log-loss function works as follows:
L(y,p) = -(y log(p) + (1-y)log(1-p))

So let y=1 in the event where Trump wins and p be the probability assigned to that event.  Someone assigning this event a probability of .01 will suffer loss = -(1*log(.01)+(1-1)log(1-.01)) = 2.  Whereas someone assigning this event a probability of .33 will suffer loss of approximately 0.5.  Note that had Trump lost, the losses would have been approximately .005 and .2, respectively, rewarding the confident prediction.

So, according to this metric, Sam Wang gets penalized a lot more than Nate Silver for predicting an event that didn't occur.  If he keeps doing this over time, he will be discovered to be a bad predictor.  Note that this function indeed assigns a loss of 0 for predicting a 100% probability to an event that occurs and infinite loss to assigning 0% to an event that occurs.  Big risks yield big rewards.  Also note that my scheme of assigning a 50-50 chance to each future election will simply yield a loss of about .3 each time, which shouldn't be too hard to beat.

So, I suggest we start keeping track of the cumulative log-losses of the various people in this game to keep them honest.