Student's t-tests

One of the most frequent statistical problems is testing hypotheses about the mean of the samples considered.

One-sample t-test

This test is used to check hypotheses about the fact that the mean of random variable X equals to given μ. Testing sample should be a sample of a normal random variable. During its work, the test calculates t-statistic:

If X has a normal distribution, the t-statistic will have Student's distribution with N-1 degrees of freedom. This allows the use of the Student's distribution to define the significance level which corresponds to the value of t-statistic.

Note #1
If X is not normal, t will have an unknown distribution and, strictly speaking, the t-test is inapplicable. However, according to the central limit theorem, as the sample size increases, the distribution of t tends to be normal. Therefore, if the sample size is big, we can use the t-test even if X is not normal. But there is no way to find out what value is big enough. This value depends on how X deviates from the normal distribution. Some sources claim that N should be greater than 30, but sometimes even this size is not enough. Alternatively, we can use non-parametric test: sign test or Wilcoxon rank-sign test.

Subroutine StudentTTest1 returns three p-values:

Two-sample pooled test

This test checks hypotheses about the fact that the means of two random variables X and Y which are represented by samples x and y are equal. The test works correctly under the following conditions:

During its work, the test calculates t-statistic:

If X and Y have a normal distribution, the t-statistic will have Student's distribution with N+N-2 degrees of freedom. This allows the use of the Student's distribution to define a significance level which corresponds to the value of t-statistic.

Note #2
If X or Y is not normal, t will have an unknown distribution and, strictly speaking, the t-test is inapplicable. However, according to the central limit theorem, as the sample sizes increase, the distribution of t tends to be normal. Therefore, if sample sizes are big enough, we can use the t-test even if X or Y is not normal. But there is no way to find what values for N and N are big enough. These values depend on how X and Y deviate from the normal distribution. Some sources claim that N+N should be greater than 40, but sometimes even these sizes are not enough. If you are not confident that distributions are normal, it's better to use non-parametric test: Mann-Whitney U-test.

Subroutine StudentTTest2 returns three p-values:

Two-sample unpooled test

This test checks hypotheses about the fact that the means of two random variables X and Y which are represented by samples x and y are equal. The test works correctly under the following conditions:

Dispersion equality is not required.

During its work, the test calculates the t-statistic:

If X and Y have a normal distribution, the t-statistic will have Student's distribution with DF degrees of freedom:

This allows the use of the Student's distribution to define the significance level which corresponds to the value of the t-statistic.

Note #3
If X or Y is not normal, t will have an unknown distribution and, strictly speaking, the t-test is inapplicable. However, according to the central limit theorem, as the sample sizes increase, the distribution of t tends to be normal. Therefore, if sample sizes are big enough, we can use the t-test even if X or Y is not normal. But there is no way to find what values for N and N are big enough. These values depend on how X and Y deviate from the normal distribution. Some sources claim that N +N should be greater than 40, but sometimes even these sizes are not enough. If you are not confident that the distributions are normal, it's better to use non-parametric test: Mann-Whitney U-test.

Subroutine UnequalVarianceTTest returns three p-values:

Links

  1. 'Hypothesis testing', Wikipedia
  2. 'P-value', Wikipedia
  3. 'T-test', Wikipedia

This article is licensed for personal use only.

Download ALGLIB for C++ / C# / ...

ALGLIB Project offers you two editions of ALGLIB:

ALGLIB Free Edition:
delivered for free
offers full set of numerical functionality
extensive algorithmic optimizations
no low level optimizations
non-commercial license

ALGLIB Commercial Edition:
flexible pricing
offers full set of numerical functionality
extensive algorithmic optimizations
high performance (SMP, SIMD)
commercial license with support plan

Links to download sections for Free and Commercial editions can be found below:

ALGLIB 3.14.0 for C++

C++ library.
Delivered with sources.
Monolithic design.
Extreme portability.
Editions:   FREE   COMMERCIAL

ALGLIB 3.14.0 for C#

C# library with native kernels.
Delivered with sources.
VB.NET and IronPython wrappers.
Extreme portability.
Editions:   FREE   COMMERCIAL

ALGLIB 3.14.0 for Delphi

Delphi wrapper around C core.
Delivered as precompiled binary.
Compatible with FreePascal.
Editions:   FREE   COMMERCIAL

ALGLIB 3.14.0 for CPython

CPython wrapper around C core.
Delivered as precompiled binary.
Editions:   FREE   COMMERCIAL