Basic Statistics : Measure

Saturday, 2 November 2013

Measures of dispersion

Measures of Dispersion
Introduction:

Measures of dispersion are descriptive information that explains how connected set of scores are comparable to each other. Statistician know the dispersion as variability, scatter, or spread. Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions. Using dispersion, a person can easily interpret how stretched or squeezedis a distribution . The most common measures of statistical dispersion are the variance, standard deviation and interquartile range.

A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the same and increases as the data become more diverse. Most measures of dispersion have the same units as the quantity being measured. In other words, if the measurements are in meters or seconds, so is the measure of dispersion. Dispersion is very sensitive to outliers and does not use all the observations in a data set. It is more informative to provide the minimum and the maximum values rather than providing the range.

Standard Deviation:

Standard deviation (SD) is the most commonly used measure of dispersion. It is a measure of spread of data about the mean and it is the square root of sum of squared deviation from the mean divided by the number of observations. In Statistics we have two formulas to calculate SD.

1. For sample SD:

In sample SD formulas we use n - 1 instead of n in the denominator, because this produces a more accurate estimate of sample SD.

2. For population SD:

Range:

This spread measure, which is sometimes used , is defined as the difference between the highest and lowest values.

Interquartile range:

This measure is defined as the difference between the 1st and 3rd quartiles.

Variance:

Variance is defined as the measure obtained by adding together the squares of the deviation of the sample values from their mean, and dividing the result by the number of values in sample.

We calculate the Variance as:

1.For Sample Variance:

2. For Population Variance:

Saturday, 26 October 2013

Measures of Central Tendency

Measures of Central Tendency

Introduction:

A measure of central tendency is a single value that attempts to describe a set of data by identifying the middle or the center position within that set of statistics. It is occasionally called an average, center of the distribution, measures of central location and also classed as summary statistics. The mean often called the average is most likely the measure of central tendency that you are most familiar with, but there are others, such as the median and the mode. The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others.

Mean:

Mean is what most people commonly refer to as an average. It can be used with both discrete and continuous data, although its use is most often with continuous data. The mean is equal to the sum up a given set of numbers and then divide this sum by the total number in the set. Mean is also referred to more correctly as arithmetic mean. In statistics there are two kinds of means: population mean and sample mean. A population mean is the true mean of the entire population of the data set while a sample mean is the mean of a small sample of the population.

So, if we have n values in a data set and they have values x₁, x₂, ..., x_n, the sample mean, and it is pronounce by X bar:

This formula is usually written in a slightly different manner using the Greek capitol letter,∑, pronounced "sigma", which means "sum":

Population mean is represented by the Greek letter μ pronounced mu. The total number of elements in a population is represented by N:

The sample mean is commonly used to estimate the population mean when the population mean is unknown. This is because they have the same expected value.

Median:

The median is defined as the number in the middle of a given set of numbers arranged in order of increasing magnitude. When given a set of numbers, the median is the number positioned in the exact middle of the list when you arrange the numbers from the lowest to the highest. The median is important because it describes the behaviour of the entire set of numbers. For example, we have a set of number

15, 16, 15, 7, 21, 18, 19, 20, 11

From the definition of median, first step is to rearrange the given set of numbers in order of increasing magnitude:

7, 11, 15, 15, 16, 18, 19, 20, 21

Then we inspect the set to find that number which lies in the exact middle.

Median = 16

So, if we look at the another example that often occurs when solving for the median.:

We again rearrange that data into order of magnitude (smallest first):

Only now we have to take the 5th and 6th score in our data set and average them to get a median of 55.5.

Mode:

The mode is defined as the element that appears most frequently in a given set of elements. On a histogram it represents the highest bar in a bar chart or histogram. Therefore, sometimes consider the mode as being the most popular option.

For example, in the 3, 3, 6, 9, 16, 16, 16, 27, 27, 37, 48 are the set of numbers, here 16 is the mode since it appears more times than any other number in the set.
A set of numbers can have more than one mode this is known as bimodal, if there are multiple numbers that occur with equal frequency, and more times than the others in the set.
3, 3, 3, 9, 16, 16, 16, 27, 37, 48
In this example, both the number 3 and the number 16 are modes.
If no number in a set of numbers occurs more than once, that set has no mode:
3, 6, 9, 16, 27, 37, 48
in this list there is no mode.