Sharpe Ratio

The main problem of empirical analysis is comparing the right “apples to apples”. The Sharpe Ratio is a statistical method for converting oranges to apples. Check out my example code on Github here.  The higher the Sharpe Ratio, the greater the returns per unit of risk. Notes 1.) I briefly mention in the Jupyter Notebook Read more about Sharpe Ratio[…]

Lessons from OpenAI Five

I was fortunate to attend OpenAI Five today and play on stage in San Francisco, streamed live on Twitch. I was floored by the performance of OpenAI bots. The following is my dig on the bots and Dota.

The bots seemed to approach the game differently than the humans. While we (the bots and the humans) both know that the goal of the game is to destroy the ancient, we approach that goal differently. For example,

def Human(x): try: last hitting creeps
except: group as five

As opposed to

def OpenAI(x): try: group as five
except: last hitting creeps

Is the above a true simplification of the process? I checked OpenAI’s website and found this. This page details the values given to the bots’ reward functions. It appears the numbers are randomly generated at first, but over large empirical datasets, the optimal value is found. One of the important things to notice is the priority given to buildings. In contrast, last hitting is small. However, let’s take a look at the total weight given to buildings versus last hitting when accounting for gold and xp earned.

score for last hitting creep wave =
 
number of creeps * last hit weight + xp of creeps * xp weight + gold of creeps * gold weight
 
melee creeps = 3 * .16 + 57 * 0.002 + 150 * 0.006 = 1.49
range creep = 1 * .16 + 69 * 0.002 + 55 * 0.006 = 0.304
creep score = 1.49 + 0.304 = 1.794

This is the basic formula for 1 pre 5 min creep wave with no denies.

score for last hitting tower = tower bounty * gold weight + t1 tower weight
score for last hitting tower = 150 * 0.006 + 0.75 = 1.65

This is the basic formula for 1 T1 tower.

From the bot match I played, we lost our first T1 around 3 mins. The bots were efficient, only trading about 1 wave for 1 tower. They move quickly from the objective back to lanes or to the next objective. On napkin style math, it appears a perfectly CS-ed wave is worth a little more than a T1 tower. One element I did not weigh, however, is what OpenAI calls “team spirit”. Bots value their teammates scores very highly. How often have I played a pub where my safelane carry refuses to TP because “it wouldn’t be worth it for me to miss this wave”? In contrast, bots are happy to rotate early (their mid laner left the lane at level 4 to secure T1 top). It is not in their code to push as 5, but it is learned that cooperation supersedes individual success. This was also seen by the cores letting supports last hit in lane.

So what about game 3 in casters vs bots? Why did the bots lose so badly? One of the problems Purge noted is the bots were not prioritizing farming other lanes. They would cut creeps and hit towers, but they weren’t TP-ing away as soon as they saw an enemy. Slark was close to his shadow blade for so long, yet didn’t clear some camps to buy it. My brother noted this as a fatal flaw of generality. The bots were not able to adapt to poor game situations and therefore are not as good as he thought. I am here to defend our future overlords (may the bots remember my piety).

By not letting the bots draft their own heroes, it renders their empirical dataset useless. The reward functions are specifically created to maximize their probability of winning a game given certain assumptions– such as drafting. Purge asks why they don’t finish their items and split the map better. It’s because they can’t. It is not in their learned behavior of how to win games. In a situation where they have their optimal lineup, it would be a waste to farm a lane and finish an item. One of my friends argues that changing the draft conditions is as severe as adding a new hero. It changes the immensely complex landscape that is Dota 2 and the bots have never seen it before. The early OpenAI blog posts tell us the first few agents wandered around the map aimlessly before even learning to last hit. Similarly, the bots in game 3 stayed in trees or die backed because they didn’t know what to do.

In summary, given a model the bots have trained, they will always win. Change the fundamental assumptions of that model, and sure, humans will win.

 

[…]

HackerWrecked

A piece of advice I used to give followed the lines of “If I managed to do it… so can you!”. In retrospect, that was terrible advice because I perpetuated survivorship bias. A younger student at my college recently confessed to me their admiration for my successes and their hopes to manifest as something akin Read more about HackerWrecked[…]

Federal Government Spending at Trump Properties

Today’s post is a “flex” on trying out Tableau and working again with ProPublica’s Data Store. Below is a graphic on federal government spending at Trump Properties since 2015, created from this dataset. The legend is displayed as Amount Property Number of records **ProPublica puts a disclaimer that government agencies have fought to not disclose Read more about Federal Government Spending at Trump Properties[…]