Two weeks ago I met with Professor Mario to discuss the details of my new direction. I thought I had the best idea with The Toolbox Concept. He was able to be my devil’s advocate and say that there were too many other factors, so if I came to a conclusion, I would’ve made so many assumptions up to now then that the conclusions would basically be meaningless.

Thankfully ,from the Hackathon, he thought my idea of defensive sloppiness was intriguing so we decided to write the paper on.

We discussed the format of the paper:

Intro/Abstract: Relevance of measuring defensive sloppiness

Methods: How to prove that the factors that were just from a sports fan’s hunches were actually correct and what concrete numbers back that

Conclusion: Reiterate the variables and discuss further studies of situations that the index could be used for

I was stuck in a bit of circle logic at first because how do I prove (based on wins/losses) that my index is correct when I have yet to even define what it’s measuring.

Professor Mario suggested over email that I answer questions along the lines of … factor X was marginally more/less in winning games than loosing games.

And then after I went through the each factor stating how it makes a significant different to the wins and losses, I can then weigh them and scale them to a reasonable index.


 

Here are the factors to look at and their sources:

Positive (for defensive sloppiness)

-defensive fouls (reaching in particular)

https://www.teamrankings.com/nba/matchup/mavericks-pacers-2016-10-26(by team/game)

-uncontested shots

-defensive three seconds

http://www.nbaminer.com/player-foul-details/ (average by season or playoffs)

Negative (against defensive sloppiness)

-charges drawn

http://stats.nba.com/players/hustle/ (by player/season)

http://www.nbaminer.com/player-foul-details/ (average by season or playoffs)

-defensive rebounds

https://www.teamrankings.com/nba/matchup/mavericks-pacers-2016-10-26 (by team/game)

-blocks

https://www.teamrankings.com/nba/matchup/mavericks-pacers-2016-10-26 (by team/game)

-steals

https://www.teamrankings.com/nba/matchup/mavericks-pacers-2016-10-26 (by team/game)


I did find a cove of stats from NBA detailing “hustle” http://stats.nba.com/hustle/#!/

but it was just a list, and not by game so maybe this can be used to benchmark which variables to use or if I got all the factors from this place I could run it in SAS to weigh them and find correlations.


This is one from the NBA detailing “defense” I can view a particular game by “last game”

http://stats.nba.com/players/defense/#!?sort=DEF_WS&dir=-1&Season=2016-17&SeasonType=Playoffs&LastNGames=1

But I don’t have it sorted out game by game, but they must have it somewhere….

Screenshot 2017-04-16 at 8.02.00 PM.pngI guess since I found out where to put the date I can stick with nba.stats to figure out find stats game by game, I still have to figure out what game ranges I want to work with and if I have to change the scraper for each game since each game has to be extracted by looking at each date.


Heres a useful scraper for R that I’ll try to use:

https://github.com/vojesse/scraping/blob/master/scraper.R


Here are a couple questions I have:

-Would the XML package work for this data?/Are their better packages?

https://www.r-bloggers.com/rselenium-a-wonderful-tool-for-web-scraping/

-Do I need to identify “what kind of site it is, such as html” to figure out what scraper to use?

-Where does the data frame go once the data is scraped?

-Would I just copy over the data frame to use for R?

-Once the data frame is produced is this something that I can copy into SAS to use the variables (finally) for the methods?

-How do I know how much I want to scrape (30 is the stat magic number but I don’t think that’s enough)?

 

 

 

 

Advertisements