Rule #1: Life is supposed to be fun!

January 31, 2023

# What are Statistical Tests? How to choose the right test

Statistical tests are an essential tool for analyzing data and making informed decisions based on the results. These tests provide a systematic and objective way to evaluate the presence or absence of relationships between variables, or to compare the means or proportions of different groups. They play a crucial role in many fields, including business, finance, health care, psychology, and social sciences, to name a few.

## What are statistical tests?

Statistical tests are techniques used to analyze and make decisions based on data. They provide a systematic and objective way to analyze data and make inferences about a population based on a sample of data. These tests are used to determine if there is a relationship between variables, or to compare the means or proportions of different groups. They are widely used in various fields such as business, finance, healthcare, psychology, and social sciences to answer research questions and make evidence-based decisions.

The results of statistical tests are often reported as p-values, confidence intervals, and effect sizes, and they must be interpreted carefully in the context of the research question and the limitations of the test. The appropriate test to use depends on the type of data and research question.

## Types of statistical tests: Parametric and non-parametric

There are two broad categories of statistical tests: parametric tests and non-parametric tests. Parametric tests make assumptions about the data being normally distributed and are typically used for interval or ratio level data, which have a meaningful zero point and allow for meaningful comparisons. Examples of parametric tests include t-tests, ANOVA (Analysis of Variance), and regression analysis.

Non-parametric tests, on the other hand, do not make assumptions about the distribution of the data and are typically used for ordinal or nominal level data, which do not have a meaningful zero point and do not allow for meaningful comparisons. Examples of non-parametric tests include Wilcoxon signed-rank test, Mann-Whitney U test, and Kruskal-Wallis test.

Regardless of the type of test used, it is important to remember that statistical tests provide only a tool to make decisions, and they must be used in conjunction with other information and critical thinking to make informed decisions.

### What is parametric tests? Common types of parametric tests

Parametric tests are a type of statistical test that make assumptions about the population distribution, including normality and equal variances. Some common examples of parametric tests include:

1. t-test: This test is used to compare the means of two groups and determine if there is a significant difference between them.
2. ANOVA (Analysis of Variance): This test is used to compare the means of more than two groups and determine if there is a significant difference among them.
3. Regression analysis: This test is used to model the relationship between a dependent variable and one or more independent variables.
4. Pearson’s correlation coefficient: This test is used to measure the strength and direction of the linear relationship between two continuous variables.
5. Chi-square test: This test is used to compare the observed frequencies in a categorical data set to the expected frequencies.

It is important to keep in mind that parametric tests make assumptions about the population distribution, and it is crucial to assess the assumptions before conducting these tests. When the assumptions are not met, alternative non-parametric tests should be used instead.

#### Introduction to T-test

A t-test is a type of parametric test used to compare the means of two groups and determine if there is a significant difference between them. It assumes that the data is normally distributed and has equal variances. The t-test calculates a t-statistic and a p-value, which are used to make inferences about the population mean. The t-statistic is a measure of the difference between the sample means and the population mean, and the p-value is the probability of observing a t-statistic as extreme as the one calculated from the sample data, given that the null hypothesis is true.

The null hypothesis in a t-test is usually that there is no difference between the means of the two groups. If the p-value is less than a predetermined level of significance (e.g. 0.05), the null hypothesis is rejected and it is concluded that there is a significant difference between the means. If the p-value is greater than the level of significance, the null hypothesis is not rejected and it is concluded that there is not enough evidence to support a difference between the means.

There are two types of t-tests: a dependent t-test (also known as a paired t-test) and an independent t-test (also known as a two-sample t-test). A dependent t-test is used when the same individuals are being compared before and after some intervention, while an independent t-test is used when comparing the means of two distinct groups.

Examples of when a t-test might be used include:

1. Comparing the mean height of two groups of individuals, one group who received a new growth hormone treatment and one group who received a placebo.
2. Comparing the mean test scores of two classes, one class taught by a traditional teacher and one class taught by an online teacher.
3. Comparing the mean weight loss of two diet programs, one low-carbohydrate diet and one low-fat diet.

In each of these examples, a t-test would be used to determine if there is a significant difference between the means of the two groups, and to make inferences about the population mean.

#### Introduction to ANOVA test

ANOVA (Analysis of Variance) is a statistical test used to determine if there is a significant difference among the means of two or more groups. It is an extension of the t-test and is used when there are more than two groups being compared. ANOVA tests the null hypothesis that the means of all groups are equal.

In ANOVA, the total variance in the data is partitioned into two components: between-group variability and within-group variability. The between-group variability measures the variation between the means of the different groups, while the within-group variability measures the variation within each group. If the between-group variability is large relative to the within-group variability, it suggests that the means of the groups are different.

There are several types of ANOVA tests, including one-way ANOVA, two-way ANOVA, and repeated measures ANOVA. The one-way ANOVA tests the differences between the means of two or more groups with a single independent variable. The two-way ANOVA tests the differences between the means of two or more groups with two independent variables. The repeated measures ANOVA tests the differences between the means of two or more groups with a single dependent variable and a within-subjects independent variable.

Example of when ANOVA might be used:

1. Comparing the mean scores of three different teaching methods (lecture, group discussion, and individual projects) on a standardized test.
2. Comparing the mean yields of four different fertilizer treatments (A, B, C, and D) on crop growth.
3. Comparing the mean scores of three different diets (low-fat, low-carb, and balanced) on a weight loss program.

In each of these examples, ANOVA would be used to determine if there is a significant difference among the means of the groups, and to make inferences about the population means.

### Overview of non-parametric statistical tests

Non-parametric statistical tests are a type of statistical test that do not assume a specific distribution (such as normal distribution) of the data. They are used when the data is not normally distributed or when the sample size is small. Non-parametric tests are also known as distribution-free tests.

Some common non-parametric tests include the Wilcoxon rank-sum test, the Kruskal-Wallis test, the Mann-Whitney U test, and the Friedman test. These tests are used to compare the difference between two or more groups in terms of their central tendency or location.

For example, the Wilcoxon rank-sum test is used to compare the median of two groups. The Kruskal-Wallis test is used to compare the medians of more than two groups. The Mann-Whitney U test is used to compare the medians of two groups and the Friedman test is used to compare the medians of more than two groups with a single dependent variable and a within-subjects independent variable.

Non-parametric tests are typically more robust than parametric tests and are less sensitive to outliers or skewness in the data. However, they may have lower power (the ability to detect a difference when it exists) compared to parametric tests.

Example of when non-parametric test might be used:

1. Comparing the median salary of employees in two different departments (marketing and finance).
2. Comparing the median number of sales made by three different salespeople (A, B, and C).
3. Comparing the median weight loss of two different diets (low-fat and low-carb).

In each of these examples, a non-parametric test would be used to determine if there is a significant difference between the medians of the groups and to make inferences about the population medians.

#### Wilcoxon Rank-Sum Test: an comprehensive overview

The Wilcoxon Rank-Sum Test, also known as the Mann-Whitney U Test, is a non-parametric test used to compare two independent groups and determine if there is a significant difference in the central tendency (e.g. median) between the two groups. The test is used when the data is not normally distributed or the sample sizes are small.

The Wilcoxon Rank-Sum Test works by ranking all the observations from the two groups together and then comparing the sum of ranks for each group. If the sum of ranks for one group is significantly larger than the other group, we can conclude that the central tendency for that group is higher.

Suppose we want to compare the effectiveness of two different types of headache medication. We conduct a study where 20 patients are randomly assigned to either medication A or medication B and the severity of their headache is rated on a scale of 0 to 10, with 0 being no headache and 10 being the worst headache. The data collected is as follows:

Medication A: 6, 5, 4, 7, 8, 9, 5, 7, 6, 8, 4, 6, 7, 5, 6, 7, 8, 9, 5, 7 Medication B: 5, 8, 9, 6, 7, 4, 8, 6, 5, 7, 8, 9, 5, 7, 6, 5, 4, 7, 8, 9

To perform the Wilcoxon Signed-Rank Test, we first rank the difference in headache severity for each patient, giving a positive rank for an improvement and a negative rank for a worsening. Then, we sum the positive ranks and negative ranks separately and compare the sum of positive ranks to a critical value. If the sum of positive ranks is significantly larger than the critical value, we can conclude that medication A is more effective in reducing headache severity.

This is just one example of a non-parametric test. Non-parametric tests are widely used in a variety of research areas when the data does not meet the assumptions of parametric tests, such as normality. Other common non-parametric tests include the Kruskal-Wallis Test, the Friedman Test, and the Wilcoxon Rank-Sum Test.

#### What is Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test, also known as the Mann-Whitney U Test, is a non-parametric test used to compare two independent groups and determine if there is a significant difference in the central tendency (e.g. median) between the two groups. The test is used when the data is not normally distributed or the sample sizes are small.

The Wilcoxon Rank-Sum Test works by ranking all the observations from the two groups together and then comparing the sum of ranks for each group. If the sum of ranks for one group is significantly larger than the other group, we can conclude that the central tendency for that group is higher.

For example, suppose we want to compare the height of a sample of men and women. We measure the height of 20 men and 20 women and the data is as follows:

• Men: 68, 70, 71, 72, 72, 72, 73, 73, 74, 74, 75, 75, 75, 76, 76, 77, 77, 78, 78, 79
• Women: 63, 64, 65, 66, 66, 67, 67, 68, 68, 69, 69, 69, 70, 70, 71, 71, 72, 73, 74, 75

To perform the Wilcoxon Rank-Sum Test, we first rank all the observations from both groups and compare the sum of ranks for each group. If the sum of ranks for men is significantly larger than the sum of ranks for women, we can conclude that men have a higher central tendency for height.

This is a simple example of the Wilcoxon Rank-Sum Test. It is widely used in fields such as medical research, social sciences, and engineering when the data does not meet the assumptions of parametric tests.Regenerate resp

## How to perform statistical tests correctly

When conducting a statistical test, it is important to specify the null hypothesis and the alternative hypothesis. The null hypothesis represents the default assumption that there is no relationship between the variables, while the alternative hypothesis represents the opposite assumption. The goal of the statistical test is to determine whether the sample data provides sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis.

The significance level of a statistical test is used to control the probability of making a Type I error, which occurs when the null hypothesis is rejected when it is actually true. The most common significance level used is 0.05, which means that there is a 5% chance of making a Type I error. This level can be adjusted based on the consequences of making a Type I error in a particular situation.

Choosing the appropriate statistical test is a critical step in the data analysis process. The type of data, research question, and number of variables involved all play a role in determining the appropriate statistical test. It is important to understand the assumptions and limitations of each test and to interpret the results with caution, in the context of the research question and the limitations of the test.

One of the key challenges in using statistical tests is understanding the results and interpreting them in a meaningful way. It is essential to understand the meaning of p-values, confidence intervals, and effect sizes, and how they can be used to draw conclusions about the data. It is also important to be aware of potential confounding variables that could affect the results and to take steps to control for these variables.

In conclusion, statistical tests are an important tool for analysing and interpreting data. However, they must be used appropriately and with caution in order to obtain meaningful results. A thorough understanding of the assumptions, limitations, and interpretations of each test is essential for accurate and meaningful results. It is important to use statistical tests in conjunction with other analytical tools and to consider the results in the context of the research question and the limitations of the test.

Posted in Uncategorized