The Geography of Basketball, Part II: Watching the Game in ArcGIS

A guest post by Gregory Brunner

I wasn’t planning on writing another blog on this topic so soon, but a few days ago I was looking into how to turn the gameid field in the NBA data into the actual game date, teams involved, etc., and I stumbled upon this:

For about a day, I thought I found something that I wasn’t supposed to find. Then I Googled NBA Movement API and found this amazing post by Savvas Tjortjoglou on How to Track NBA Player Movements in Python. That same day, I found NBA Player Tracking. All this amazing player movement data is out there for consumption. I had to explore it!

All I really wanted to do was get this data into ArcMap, see if I can replay the data using the time slider, and then make some webmaps. I wanted to go from watching this:

(That’s Russell Westbrook hitting a 3 point shot. You can access that data as json here.)

to exploring ways to replay the data in ArcMap:

(That’s the first 90+ seconds of the first quarter of the game in ArcMap played back in 10 seconds!)

and also maybe play around with different ways to render players and moments on the court.


View larger map

That’s Russell Westbrook hitting a 3 point shot.

So how do you go from the NBA player tracking video to working with the data in ArcMap and ArcGIS Online?

I decided to stay with Russell Westbrook and look at the first game he played in the 2014-2015 season. That game was on October 29, 2014 against the Portland Trailblazers and all of the data we’ll look at here is from that game. Note that the game was played in Portland, not Oklahoma City, as you might infer from my court image.

Picking up on Savvas Tjortjoglou’s post, I read in event data from from the stats API by passing a specific gameid and eventid.

event_url = 'http://stats.nba.com/stats/locations_getmoments/?eventid=%s&gameid=%s' % (eventid, gameid)
response = requests.get(event_url)
home = response.json()["home"]
visitor = response.json()["visitor"]
moments = response.json()["moments"]
gamedate = response.json()["gamedate"]

The variables home and visitor are dictionaries containing information on the players involved in the game. Moments are the events on the court for each player. The gamedate is the date of the contest.

I used the home dictionary and the visitor dictionary to create a team dictionary and a player dictionary so that later I could apply those to my feature class in the form of a player domain and a team subtype.

#Create team dicitonary
team_dict = {}
team_dict[home['teamid']] = home['name']
team_dict[visitor['teamid']] = visitor['name']
team_dict[-1] = 'Basketball'
#Create player dictionary
d = {}
d[-1] = 'Basketball'
for h in home['players']:
    d[h['playerid']] = h['firstname'] + ' ' + h['lastname']
for v in visitor['players']:
    d[v['playerid']] = v['firstname'] + ' ' + v['lastname']

Then, for every moment in the data, I parsed the data into a tuple that I used to populate my game event feature class.

    coords = []
    for moment in moments:
        quarter = moment[0]
        for player in moment[5]:
            player.extend((moments.index(moment), moment[2], moment[3]))
            clock_time = create_timestamp(quarter, date, player[6])
            ct = datetime.datetime.strftime(clock_time, '%Y/%m/%d %H:%M:%S.%f')[:-5]
            player_data = (player[0], player[1], player[2], player[3], player[4], player[5], player[6], player[7], ct)
            coord = ([10*(player[3]-25), 10*(player[2]-5.25)])
            coords.append((coord,)+player_data)

Notice that I scaled the coordinates by a factor of 10 and also shifted the x and y positions slightly. I did this to put the movement data in the same coordinates as the shot chart data, which was in units of feet x 10, with the (0,0) point being the center of the hoop. Now that I think about it, I should probably be putting the shot chart data in the coordinate frame of the player movement data as units of feet will make more sense.

Next, I created a feature class.

def create_feature_class(output_gdb, output_feature_class):
    feature_class = os.path.basename(output_feature_class)
    if not arcpy.Exists(output_gdb):
        arcpy.CreateFileGDB_management(os.path.dirname(output_gdb),os.path.basename(output_gdb))
    if not arcpy.Exists(output_feature_class):
        arcpy.CreateFeatureclass_management(output_gdb,feature_class,"POINT","#","DISABLED","DISABLED", "PROJCS['WGS_1984_Web_Mercator_Auxiliary_Sphere',GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],PROJECTION['Mercator_Auxiliary_Sphere'],PARAMETER['False_Easting',0.0],PARAMETER['False_Northing',0.0],PARAMETER['Central_Meridian',0.0],PARAMETER['Standard_Parallel_1',0.0],PARAMETER['Auxiliary_Sphere_Type',0.0],UNIT['Meter',1.0]]","#","0","0","0")
        arcpy.AddField_management(output_feature_class,"TEAM_ID","LONG", "", "", "")
        arcpy.AddField_management(output_feature_class,"PLAYER_ID","LONG", "", "", "")
        arcpy.AddField_management(output_feature_class,"LOC_X","DOUBLE", "", "", "")
        arcpy.AddField_management(output_feature_class,"LOC_Y","DOUBLE", "", "", "")
        arcpy.AddField_management(output_feature_class,"RADIUS","DOUBLE", "", "", "")
        arcpy.AddField_management(output_feature_class,"MOMENT","LONG", "", "", "")
        arcpy.AddField_management(output_feature_class,"GAME_CLOCK","DOUBLE", "", "", "")
        arcpy.AddField_management(output_feature_class,"SHOT_CLOCK","DOUBLE", "", "", "")
        arcpy.AddField_management(output_feature_class,"TIME", "TEXT", "", "", 30)

Then, I pushed the game data into the feature class.

def populate_feature_class(rowValues, output_feature_class):
    c = arcpy.da.InsertCursor(output_feature_class,fields)
    for row in rowValues:
        c.insertRow(row)
    del c

I used the game date for YYYY-MM-DD to create a clock time string. In ArcGIS, for the time to make sense, time needs to move forward. So instead of counting down from 720.0 seconds at the beginning of a quarter to 0.0 seconds at the end, I needed to define a scheme where time moved forward. The time scheme I defined is QQ:MM:SS, where QQ is the quarter, MM is the minutes into the quarter, and SS is the seconds. For example, a time of 01:02:30, indicates that we are 2 minutes and 30 seconds into the first quarter. Here’s how I created the timestamp.

def create_timestamp(quarter, gamedate, seconds):
    m,s = divmod(720-seconds, 60)
    ms = round((s-int(s))*100)
    t = datetime.time(int(quarter), int(m), math.floor(s), ms*10000)
    dt = datetime.datetime.combine(gamedate, t)
    return dt

The NBA data is interesting because there are a lot of coded values. Players and teams have numerical IDs. Above, I created a dictionary of players and teams. I used the players to create a “Players” domain on my feature class.

def create_player_domain(gdb, fc, player_dict):
    domName = "Players"
    inField = "PLAYER_ID"
    arcpy.CreateDomain_management(gdb, domName, "Player Names", "TEXT", "CODED")
    for code in player_dict:
        arcpy.AddCodedValueToDomain_management(gdb, domName, code, player_dict)

    arcpy.AssignDomainToField_management(fc, inField, domName)

And I used "TEAM_ID" as the subtype on my feature class:

def create_team_subtype(gdb, fc, subtype_dict):
    arcpy.SetSubtypeField_management(os.path.join(gdb,fc), "TEAM_ID")
    for code in subtype_dict:
        arcpy.AddSubtype_management(os.path.join(gdb,fc), code, subtype_dict)

It's pretty neat how we can apply these GIS data management fundamentals to the NBA data!

I also added a line to remove duplicate moments from the feature class because when I started concatenating multiple events, I noticed that there were some duplicates.

arcpy.DeleteIdentical_management("Game_0021400015_Event_1_10", "Shape;TEAM_ID;PLAYER_ID;LOC_X;LOC_Y;RADIUS;MOMENT;GAME_CLOCK;SHOT_CLOCK;TIME", "", "0")

So what does this data look like in ArcGIS?

At about 1 minute and 6 seconds into the game, Wesley Matthews took a 3 point shot (You can watch the footage here at on.nba.com. The shot is the first video in the sequence).

Wesley Matthews Footage

What does this look like in ArcMap?

Wesley Matthews Shooting

This is a roughly 1 second interval of data right after Wesley Matthews released the shot.

He missed.

He missed and the players moved in for the rebound. I used this project to make the movie above. I have also shared this map document (MXD) as a map package (MPK) on my github site. Take a look at the map package in ArcGIS if you're interested.

What does a single event look like as a webmap?


View larger map

This is the webmap for Event 346 (Russell Westbrook hitting a 3). Click on a feature to see the attribution. There are over 7,000 features in this single event and this event spanned less than 10 seconds on the game clock!

In my webmap for Event 346, I can isolate Russell Westbrook and the ball using a filter in ArcGIS Online. It looks like this.


View larger map

That's where Russell Westbrook hits a 3. I've applied the heat map effect to the basketball.

I got ambitious when I wanted to make a video and time-enabled webmap, so I concatenated Events 1 - 10 from the same game. Those are the events I used for the video at the beginning. I published the data as a time-enabled webmap. The time-slider won't appear till you View larger map.


View larger map

The map looks like a mess until the slider appears. There are over 30,000 features in this feature service. I'm sharing it as a hosted feature service, which I did not realize until Gavin noticed, but the hosted feature service stripped out the sub-seconds on the TIME field. I don't know why this happened because for the Event 346 webmap, I can see the sub-seconds. This impacts playing back the game moments as does the fact that the default time-slider does not display data in less than 1 second intervals. We would have to create a custom slider or playback capability to accommodate playing back data of this frequency, which we're thinking about doing. We'll write about it after we do it. Still, I'd encourage you to take a look and play around with the time-slider and data yourself. You can identify, query, and render the data in many different ways!

All that being said, this has all been really fun for me! I can't believe all this data is out there. We're really just scratching the surface. If anyone would like to see the code, you can find it at my github page. I have also shared a map package there if you want to see the data in ArcGIS. Definitely let me know if you have any problems, questions, or insights! In addition to exploring how to improve the playback capability, I'm thinking about doing a post that explores some ArcGIS spatial analysis or space-time analysis and visualization on the data. Let me know if you would be interested in reading about that.