These past couple of days have been so crazy in terms of thinking on my feet…the updates every other day in the last week from the Hackathon Chair has led me to throw a lot of wild ideas out the door, but has also helped me condense my idea.
Last Friday the chair emailed us specifically about giving us 5 NBA games of data this Friday (today) that we would use for our hack. So I went from infrared idea, which I thought was the final idea, to a model on best team with various execution factors, to a model measuring sloppiness.
From there I tried really hard to get data by randomly generating 30 games as sample data (one to correspond with each team in the NBA), to figure out how to weigh each factor.
Here are the factors I considered to add in my algorithm and how they were laid out in my spreadsheet:
Yellow columns were supposed to be negatively correlated and Red were supposed to be positively correlated in a logistic regression of wins and losses
I did have some trouble running SAS on this dataset, and only the green columns consisted of data that I found (which took me 3 hours on 2 computers)…
I met with Professor Mario to ask him if there was a more efficient way of yielding a working equation about execution to use, by the time I could scrape the data from the 5 games given to me…
As usual, Professor Mario, left me with more and better ideas. After discussing the difficulties of my plan to create this model with Mario, I realized under this time constraint it was not very feasible to create the model I was envisioning.
Professor Mario brought up an interesting point about creating a visualization that “normal people” could make decisions on without knowing the model and assumptions behind it. Another important constraint was that I only had 90 seconds to present this idea to the panel of judges
He recommended that I focus on just two factors to put on each axis and focus to show an individual, for example, which player per team is the sloppiest.
We then proceeded to brainstorm solutions on how to quickly find 2 outstanding factors to plot for each respective axis. I threw out the idea of again weighing the variables because collecting the data to formulate the best model with would take so long, and it would also be hard to determine what sample size of games to use to even begin crunching. Mario did suggest using pair-wise functions or just linear regression to again find and weight variables, but with (82*30)+(60*30) points of data per factor, even a couple lines of code would run forever.
That’s when Mario told me intuition comes in, while statistics can give us a numerically more accurate view, intuition can also be strong in fans who have followed the sports for a long time. Mario discussed how shooting”granny-style” has proven to be significantly more accurate because humans have better depth perception when they are doing work below their shoulder, with their hands. However, though the statics/numbers reveal that truth, no one in the NBA actually shoots “granny shots” because it looks goofy. So in the same way the statistics may point me to 2 factors that are most significant to calculate, they might look “goofy” as well, and I really want to choose two variables that help my credibility not just as a statistician but also a sports fan.
I’ve decided to focus on finding the relative defensive sloppiness the starting players of an NBA game. The two axes-es will include either a passive or active sloppiness. I won’t detail exactly what variables I pick (yet) but this is the final idea plan! 😉
5 more days!