Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW ALGORITHM] Kruskal-Wallis Test Algorithm #1362 #1408

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions Miscellaneous Algorithms/Kruskal-Wallis Algorithm/Kruskal-Wallis.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
//Kruskal-Wallis Algorithm

#include <stdio.h>
#include <stdlib.h>

typedef struct {
double value;
int group;
} DataPoint;

int compare(const void *a, const void *b) {
return ((DataPoint *)a)->value > ((DataPoint *)b)->value ? 1 : -1;
}

void kruskal_wallis(int *groups, double *data, int n, int k) {
DataPoint *points = (DataPoint *)malloc(n * sizeof(DataPoint));

for (int i = 0; i < n; i++) {
points[i].value = data[i];
points[i].group = groups[i];
}

// Rank data
qsort(points, n, sizeof(DataPoint), compare);
double *rank = (double *)malloc(n * sizeof(double));
for (int i = 0; i < n; i++) {
rank[i] = i + 1; // Ranks start at 1
}

// Sum of ranks for each group
double *rank_sum = (double *)calloc(k, sizeof(double));
int *group_count = (int *)calloc(k, sizeof(int));
for (int i = 0; i < n; i++) {
rank_sum[points[i].group - 1] += rank[i];
group_count[points[i].group - 1]++;
}

// Calculate H
double H = 0.0;
for (int j = 0; j < k; j++) {
H += (rank_sum[j] * rank_sum[j]) / group_count[j];
}
H = (12.0 / (n * (n + 1))) * H - 3 * (n + 1);

// Free allocated memory
free(points);
free(rank);
free(rank_sum);
free(group_count);

printf("Kruskal-Wallis H statistic: %lf\n", H);
}

int main() {
int groups[] = {1, 1, 2, 2, 3, 3}; // Group identifiers
double data[] = {10, 12, 20, 22, 30, 32}; // Observations
int n = sizeof(data) / sizeof(data[0]);
int k = 3; // Number of groups

kruskal_wallis(groups, data, n, k);
return 0;
}
34 changes: 34 additions & 0 deletions Miscellaneous Algorithms/Kruskal-Wallis Algorithm/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
### Kruskal-Wallis Test Algorithm in C

The Kruskal-Wallis test is a non-parametric statistical test used to determine whether there are statistically significant differences between the medians of three or more independent groups. It is an extension of the Mann-Whitney U test and is particularly useful when the assumptions of ANOVA (normality and homogeneity of variance) are not met.

#### Steps of the Kruskal-Wallis Test

1. **Rank all the data:** Combine all the groups' data into one dataset and rank the values. If there are ties, assign the average rank to the tied values.

2. **Calculate the rank sums:** For each group, sum the ranks of the observations.

3. **Compute the test statistic (H):** The Kruskal-Wallis test statistic is calculated using the formula:

H=N(N+1)12​∑nj​Rj2​​−3(N+1)

where:
- \( N \) = total number of observations across all groups
- \( R_j \) = sum of ranks for group \( j \)
- \( n_j \) = number of observations in group \( j \)

4. **Determine the p-value:** Compare the test statistic \( H \) to the chi-squared distribution with \( k - 1 \) degrees of freedom (where \( k \) is the number of groups).

5. **Decision:** If the p-value is less than the significance level (e.g., 0.05), reject the null hypothesis, indicating that at least one group median is different from the others.

### Importance of the Kruskal-Wallis Test

1. **Non-Parametric Nature:** It does not assume a normal distribution of the data, making it applicable in various real-world scenarios where data may not meet ANOVA assumptions.

2. **Robustness:** The Kruskal-Wallis test is robust against outliers and non-homogeneity of variance, which are common issues in practical data analysis.

3. **Multiple Groups Comparison:** It allows for the comparison of three or more groups simultaneously, saving time and resources compared to multiple pairwise tests.

4. **Broad Applications:** It is widely used in various fields, including biology, psychology, and social sciences, for analyzing experimental and observational data.

5. **Foundation for Further Analysis:** If the Kruskal-Wallis test indicates significant differences, post-hoc tests (like Dunn’s test) can be conducted to identify specific group differences.