Is the scipy.stats.ks_2samp function supposed to take in raw data arrays or the ECDF of the data?

 The `scipy.stats.ks_2samp` function in the SciPy library is used to perform the Kolmogorov-Smirnov two-sample test, which tests whether two samples come from the same continuous distribution. This function takes raw data arrays as input, not the ECDF (Empirical Cumulative Distribution Function) of the data.


Here's how you typically use the `ks_2samp` function:


```python

from scipy import stats


# Two sets of data

data1 = [1, 2, 3, 4, 5]

data2 = [2, 3, 4, 5, 6]


# Perform the KS two-sample test

statistic, p_value = stats.ks_2samp(data1, data2)


# Print the results

print("KS Statistic:", statistic)

print("P-Value:", p_value)

```


In the example above, `data1` and `data2` are raw data arrays, and the `ks_2samp` function is used to compare whether the two datasets come from the same distribution.


If you have the ECDF of the data, you typically won't use `ks_2samp`. Instead, you might perform other statistical tests or analyses based on the ECDF, but the `ks_2samp` function itself expects raw data arrays.

Comments

Popular posts from this blog

bad character U+002D '-' in my helm template

GitLab pipeline stopped working with invalid yaml error

How do I add a printer in OpenSUSE which is being shared by a CUPS print server?