Statistics and probability

Data

Data are characteristics or information, usually numerical, that are collected through observation. In a more technical sense, data are a set of qualitative or quantitative variables about one or more persons or objects, while a datum (singular of data) is a single value of a single variable.

 

Types of Data

Qualitative data: is non-numerical, eg “it was fun", "blue".

Quantitative data: is numerical. Quantitative data can be discrete or continuous.

Discrete data: is data which takes specific (discrete) values, eg “number of accidents”, "points in the IB diploma".

Continuous data: is data which can take a full range of values, eg "height", “speed".


Population

Data Samples and error visualization techniques | by Anthony Figueroa |  Towards Data Science

Population: all members of a defined group.
Sample: a subset of the population, a selection of individuals from the population.

Biased sampling is where the method may cause you to draw misleading conclusions about the population.

 

Survivorship Bias - an example of biased sampling.

 

Types of sampling

Simple random sampling: every member of the population is equally likely to be chosen. For example, allocate each member of the population a number. Then use random numbers to choose a sample.

Systematic sampling: find a sample of size \(n\) from a population of size \(N\) by selecting every \(k\)th member where \(k = \frac{N}{n}\) to the nearest whole number.

Stratified sampling: is selecting a random sample where numbers in certain categories proportional to the numbers in the population. (E.G. in polls)

 

Stratified Polling in New Zealand

 

Mean, Median, Mode

 

Standard Deviation (TBC)

 

Editors

View count: 4087