Auckland Marathon 2015 runners (population data)

auckland_marathon_2015

The data for each runner entered in the Auckland Marathon 2015 was obtained from https://www.aucklandmarathon.co.nz/. This data is owned by the organisers of the Auckland Marathon and can not be used for commercial purposes unless by prior written permission from the organisers.

For each runner, the following was recorded:

  • bib number
  • name
  • time in hours (this is blank if the runner did not compete in the race)
  • place (this is blank if the runner did not compete in the race)
  • gender
  • division
  • age division
  • distance in km (this is blank if the runner did not compete in the race)
  • mean pace km per hr (this is blank if the runner did not compete in the race)

NB: This data set contains information about the five different races which are part of the Auckland Marathon 2015. It may be necessary to focus on just one of these races for a meaningful investigation, for example if comparing running times for male and female runners (whether as part of a sample-to-population inference or as part of exploring the population data).

Here is the population data set as a CSV file: all_races_auckland_marathon_2015_final

Rugby World Cup 2015 players (population data)

rugby_world_cup_2015

The data for each player in the Rugby World Cup 2015 was obtained from http://www.rugbyworldcup.com/. This data is owned by the Rugby World Cup Ltd (RWC) and can not be used for commercial purposes unless by prior written permission from the RWC.

Thanks to @cushlat for the idea 🙂

For each player, the following was recorded:

  • team played for (team)
  • name (name)
  • number of international matches played (caps)
  • position (position)
  • number of years since debuted (years_since_debut)
  • date of debut (debut)
  • age at Rugby World Cup 2015 (age)
  • age minus years_since_debut (approx_age_debuted)
  • height in cm (height_cm)
  • weight in kg (weight_kg)

NB: This data set should be used with care for sample-to-population inference involving comparison, as both categorical variables (team and position) involve a large number of outcomes (16 teams and 11 positions). This means it is not likely that a random sample of 80 players from the population of Rugby World Cup 2015 players, for example, will contain sufficient numbers of players in any two groups for comparison e.g. England vs New Zealand OR forwards vs backs. If you use all the data for NZ and all the data for England to compare the age of players, for example, you will have used all of the data for this population and so there is no need to “make a call” about what is going on “back in the population” 🙂

My advice would be to use this data set for either single variable sampling investigations OR exploratory data analysis for the entire population. There is also something interesting in using the time variable (debut) to explore other variables 🙂

Here is the population data set as a CSV file: rubgy_world_cup_2015