statistics - t test in python on a frequency table -
if have 2 lists of numbers x
, y
, can run t-test on them using scipy.stats.ttest_ind(x,y)
. far good. if instead of x
, y
, have frequency counts; there pythonic way run efficient t test or have "manually" calculate original vectors?
edit (frequency count): if x = [1,0,3,0,1,3,2]
corresponding frequency count is:
+---+---+ | 0 | 2 | | 1 | 2 | | 2 | 1 | | 3 | 2 | +---+---+
where first column value , second corresponding count/frequency.
you can use rv_discrete scipy.stats generate data according distribution marked frequencies.
using example of frequency counts provide in edit, generate random variable this,
import scipy.stats stats x = [0, 1, 2, 3] freq = [2, 2, 1, 2] total = sum(freq) p = [i/total in freq] custm = stats.rv_discrete(name='custm', values=(x, p))
where take account vector of probabilities p
has sum 1.
and can generate data distribution easily,
in [7]: custm.rvs(size=7) out[7]: array([2, 0, 3, 1, 3, 2, 0])
hope helps.
Comments
Post a Comment