Safe Haskell | None |
---|---|

Language | Haskell2010 |

Online statistics for ordered data (such as time-series data), modelled as mealy machines

## Synopsis

- newtype Mealy a b = Mealy {}
- pattern M :: (a -> c) -> (c -> a -> c) -> (c -> b) -> Mealy a b
- scan :: Mealy a b -> [a] -> [b]
- fold :: Mealy a b -> [a] -> b
- newtype Averager a b = Averager {
- sumCount :: (a, b)

- pattern A :: a -> b -> Averager a b
- av :: Divisive a => Averager a a -> a
- av_ :: (Eq a, Additive a, Divisive a) => Averager a a -> a -> a
- online :: (Divisive b, Additive b) => (a -> b) -> (b -> b) -> Mealy a b
- ma :: (Divisive a, Additive a) => a -> Mealy a a
- absma :: (Divisive a, Signed a) => a -> Mealy a a
- sqma :: (Divisive a, Additive a) => a -> Mealy a a
- std :: (Divisive a, ExpField a) => a -> Mealy a a
- cov :: Field a => Mealy a a -> Mealy (a, a) a
- corrGauss :: ExpField a => a -> Mealy (a, a) a
- corr :: ExpField a => Mealy a a -> Mealy a a -> Mealy (a, a) a
- beta1 :: ExpField a => Mealy a a -> Mealy (a, a) a
- alpha1 :: ExpField a => Mealy a a -> Mealy (a, a) a
- reg1 :: ExpField a => Mealy a a -> Mealy (a, a) (a, a)
- beta :: (Field a, KnownNat n, Fractional a, Eq a) => a -> Mealy (Array '[n] a, a) (Array '[n] a)
- alpha :: (ExpField a, KnownNat n, Fractional a, Eq a) => a -> Mealy (Array '[n] a, a) a
- reg :: (ExpField a, KnownNat n, Fractional a, Eq a) => a -> Mealy (Array '[n] a, a) (Array '[n] a, a)
- asum :: Additive a => Mealy a a
- aconst :: b -> Mealy a b
- delay1 :: a -> Mealy a a
- delay :: [a] -> Mealy a a
- depState :: (a -> b -> a) -> Mealy a b -> Mealy a a
- data Model1 = Model1 {}
- zeroModel1 :: Model1
- depModel1 :: Double -> Model1 -> Mealy Double Double
- data Medianer a b = Medianer {}
- onlineL1 :: (Ord b, Field b, Signed b) => b -> b -> (a -> b) -> (b -> b) -> Mealy a b
- onlineL1' :: (Ord b, Field b, Signed b) => b -> b -> (a -> b) -> (b -> b) -> Mealy a (b, b)
- maL1 :: (Ord a, Field a, Signed a) => a -> a -> a -> Mealy a a
- absmaL1 :: (Ord a, Field a, Signed a) => a -> a -> a -> Mealy a a

# Types

A `Mealy`

is a triple of functions

- (a -> b)
**inject**Convert an input into the state type. - (b -> a -> b)
**step**Update state given prior state and (new) input. - (c -> b)
**extract**Convert state to the output type.

By adopting this order, a Mealy sum looks like:

M id (+) id

where the first id is the initial injection to a contravariant position, and the second id is the covriant extraction.

**inject** kicks off state on the initial element of the Foldable, but is otherwise be independent of **step**.

scan (M e s i) (x : xs) = e <$> scanl' s (i x) xs

pattern M :: (a -> c) -> (c -> a -> c) -> (c -> b) -> Mealy a b Source #

Pattern for a `Mealy`

.

M extract step inject

scan :: Mealy a b -> [a] -> [b] Source #

Run a list through a `Mealy`

and return a list of values for every step

length (scan _ xs) == length xs

Most common statistics are averages, which are some sort of aggregation of values (sum) and some sort of sample size (count).

av :: Divisive a => Averager a a -> a Source #

extract the average from an `Averager`

av gives NaN on zero divide

av_ :: (Eq a, Additive a, Divisive a) => Averager a a -> a -> a Source #

substitute a default value on zero-divide

av_ (Averager (0,0)) x == x

online :: (Divisive b, Additive b) => (a -> b) -> (b -> b) -> Mealy a b Source #

`online f g`

is a `Mealy`

where f is a transformation of the data and
g is a decay function (usually convergent to zero) applied at each step.

online id id == av

`online`

is best understood by examining usage
to produce a moving average and standard deviation:

An exponentially-weighted moving average with a decay rate of 0.9

ma r == online id (*r)

An exponentially-weighted moving average of the square.

sqma r = online (\x -> x * x) (* r)

Applicative-style exponentially-weighted standard deviation computation:

std r = (\s ss -> sqrt (ss - s ** 2)) <$> ma r <*> sqma r

# Statistics

The doctest examples are composed from some random series generated with Data.Mealy.Simulate.

- xs0, xs1 & xs2 are samples from N(0,1)
- xsp is a pair of N(0,1)s with a correlation of 0.8

`>>>`

`:set -XDataKinds`

`>>>`

`import Data.Mealy.Simulate`

`>>>`

`g <- create`

`>>>`

`xs0 <- rvs g 10000`

`>>>`

`xs1 <- rvs g 10000`

`>>>`

`xs2 <- rvs g 10000`

`>>>`

`xsp <- rvsp g 10000 0.8`

ma :: (Divisive a, Additive a) => a -> Mealy a a Source #

A moving average using a decay rate of r. r=1 represents the simple average, and r=0 represents the latest value.

`>>>`

100.0`fold (ma 0) ([1..100])`

`>>>`

50.5`fold (ma 1) ([1..100])`

`>>>`

9.713356299018187e-2`fold (ma 0.99) xs0`

absma :: (Divisive a, Signed a) => a -> Mealy a a Source #

absolute average

`>>>`

0.8075705557429647`fold (absma 1) xs0`

sqma :: (Divisive a, Additive a) => a -> Mealy a a Source #

average square

fold (ma r) . fmap (**2) == fold (sqma r)

std :: (Divisive a, ExpField a) => a -> Mealy a a Source #

standard deviation

The construction of standard deviation, using the Applicative instance of a `Mealy`

:

(\s ss -> sqrt (ss - s ** (one+one))) <$> ma r <*> sqma r

The average deviation of the numbers 1..1000 is about 1 / sqrt 12 * 1000 https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)#Standard_uniform

`>>>`

288.9636655359978`fold (std 1) [0..1000]`

The average deviation with a decay of 0.99

`>>>`

99.28328803163829`fold (std 0.99) [0..1000]`

`>>>`

1.0126438036262801`fold (std 1) xs0`

cov :: Field a => Mealy a a -> Mealy (a, a) a Source #

The covariance of a tuple given an underlying central tendency fold.

`>>>`

0.7818936662586868`fold (cov (ma 1)) xsp`

corrGauss :: ExpField a => a -> Mealy (a, a) a Source #

correlation of a tuple, specialised to Guassian

`>>>`

0.7978347126677433`fold (corrGauss 1) xsp`

corr :: ExpField a => Mealy a a -> Mealy a a -> Mealy (a, a) a Source #

a generalised version of correlation of a tuple

`>>>`

0.7978347126677433`fold (corr (ma 1) (std 1)) xsp`

corr (ma r) (std r) == corrGauss r

beta1 :: ExpField a => Mealy a a -> Mealy (a, a) a Source #

The beta in a simple linear regression of an (independent variable, single dependent variable) tuple given an underlying central tendency fold.

This is a generalisation of the classical regression formula, where averages are replaced by `Mealy`

statistics.

\[ \begin{align} \beta & = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2} \\ & = \frac{n^2 \overline{xy} - n^2 \bar{x} \bar{y}}{n^2 \overline{x^2} - n^2 \bar{x}^2} \\ & = \frac{\overline{xy} - \bar{x} \bar{y}}{\overline{x^2} - \bar{x}^2} \\ \end{align} \]

`>>>`

0.999747321294513`fold (beta1 (ma 1)) $ zipWith (\x y -> (y, x + y)) xs0 xs1`

alpha1 :: ExpField a => Mealy a a -> Mealy (a, a) a Source #

The alpha in a simple linear regression of an (independent variable, single dependent variable) tuple given an underlying central tendency fold.

\[ \begin{align} \alpha & = \frac{\sum y \sum x^2 - \sum x \sum xy}{n\sum x^2 - (\sum x)^2} \\ & = \frac{n^2 \bar{y} \overline{x^2} - n^2 \bar{x} \overline{xy}}{n^2 \overline{x^2} - n^2 \bar{x}^2} \\ & = \frac{\bar{y} \overline{x^2} - \bar{x} \overline{xy}}{\overline{x^2} - \bar{x}^2} \\ \end{align} \]

`>>>`

1.3680496627365146e-2`fold (alpha1 (ma 1)) $ zipWith (\x y -> ((3+y), x + 0.5 * (3 + y))) xs0 xs1`

reg1 :: ExpField a => Mealy a a -> Mealy (a, a) (a, a) Source #

The (alpha, beta) tuple in a simple linear regression of an (independent variable, single dependent variable) tuple given an underlying central tendency fold.

`>>>`

(1.3680496627365146e-2,0.4997473212944953)`fold (reg1 (ma 1)) $ zipWith (\x y -> ((3+y), x + 0.5 * (3 + y))) xs0 xs1`

beta :: (Field a, KnownNat n, Fractional a, Eq a) => a -> Mealy (Array '[n] a, a) (Array '[n] a) Source #

multiple regression

\[ \begin{align} {\hat {{\mathbf {B}}}}=({\mathbf {X}}^{{{\rm {T}}}}{\mathbf {X}})^{{ -1}}{\mathbf {X}}^{{{\rm {T}}}}{\mathbf {Y}} \end{align} \]

\[ \begin{align} {\mathbf {X}}={\begin{bmatrix}{\mathbf {x}}_{1}^{{{\rm {T}}}}\\{\mathbf {x}}_{2}^{{{\rm {T}}}}\\\vdots \\{\mathbf {x}}_{n}^{{{\rm {T}}}}\end{bmatrix}}={\begin{bmatrix}x_{{1,1}}&\cdots &x_{{1,k}}\\x_{{2,1}}&\cdots &x_{{2,k}}\\\vdots &\ddots &\vdots \\x_{{n,1}}&\cdots &x_{{n,k}}\end{bmatrix}} \end{align} \]

let ys = zipWith3 (\x y z -> 0.1 * x + 0.5 * y + 1 * z) xs0 xs1 xs2 let zs = zip (zipWith (\x y -> fromList [x,y] :: F.Array '[2] Double) xs1 xs2) ys fold (beta 0.99) zs

- 0.4982692361226971, 1.038192474255091

alpha :: (ExpField a, KnownNat n, Fractional a, Eq a) => a -> Mealy (Array '[n] a, a) a Source #

alpha in a multiple regression

reg :: (ExpField a, KnownNat n, Fractional a, Eq a) => a -> Mealy (Array '[n] a, a) (Array '[n] a, a) Source #

multiple regression

let ys = zipWith3 (\x y z -> 0.1 * x + 0.5 * y + 1 * z) xs0 xs1 xs2 let zs = zip (zipWith (\x y -> fromList [x,y] :: F.Array '[2] Double) xs1 xs2) ys fold (reg 0.99) zs

([0.4982692361226971, 1.038192474255091],2.087160803386695e-3)

:: [a] | initial statistical values, delay equals length |

-> Mealy a a |

delays values by n steps

delay [0] == delay1 0

delay [] == id

delay [1,2] = delay1 2 . delay1 1

`>>>`

[-2,-1,0,1]`scan (delay [-2,-1]) [0..3]`

Autocorrelation example:

scan (((,) <$> id <*> delay [0]) >>> beta (ma 0.99)) xs0

depState :: (a -> b -> a) -> Mealy a b -> Mealy a a Source #

Add a state dependency to a series.

Typical regression analytics tend to assume that moments of a distributional assumption are unconditional with respect to prior instantiations of the stochastics being studied.

For time series analytics, a major preoccupation is estimation of the current moments given what has happened in the past.

IID:

\[ \begin{align} x_{t+1} & = alpha_t^x + s_{t+1}\\ s_{t+1} & = alpha_t^s * N(0,1) \end{align} \]

Example: including a linear dependency on moving average history:

\[ \begin{align} x_{t+1} & = (alpha_t^x + beta_t^{x->x} * ma_t^x) + s_{t+1}\\ s_{t+1} & = alpha_t^s * N(0,1) \end{align} \]

`>>>`

`let xs' = scan (depState (\a m -> a + 0.1 * m) (ma 0.99)) xs0`

`>>>`

`let ma' = scan ((ma (1 - 0.01)) >>> delay [0]) xs'`

`>>>`

`let xsb = fold (beta1 (ma (1 - 0.001))) $ drop 1 $ zip ma' xs'`

`>>>`

`-- beta measurement if beta of ma was, in reality, zero.`

`>>>`

`let xsb0 = fold (beta1 (ma (1 - 0.001))) $ drop 1 $ zip ma' xs0`

`>>>`

0.10000000000000009`xsb - xsb0`

a linear model of state dependencies for the first two moments

\[ \begin{align} x_{t+1} & = (alpha_t^x + beta_t^{x->x} * ma_t^x + beta_t^{s->x} * std_t^x) + s_{t+1}\\ s_{t+1} & = (alpha_t^s + beta_t^{x->s} * ma_t^x + beta_t^{s->s} * std_t^x) * N(0,1) \end{align} \]

#### Instances

Eq Model1 Source # | |

Show Model1 Source # | |

Generic Model1 Source # | |

type Rep Model1 Source # | |

Defined in Data.Mealy type Rep Model1 = D1 ('MetaData "Model1" "Data.Mealy" "mealy-0.1.0-ELj6S8h9dj8DSoDt2dhzTT" 'False) (C1 ('MetaCons "Model1" 'PrefixI 'True) ((S1 ('MetaSel ('Just "alphaX") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "alphaS") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedStrict) (Rec0 Double) :*: S1 ('MetaSel ('Just "betaMa2X") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedStrict) (Rec0 Double))) :*: (S1 ('MetaSel ('Just "betaMa2S") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "betaStd2X") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedStrict) (Rec0 Double) :*: S1 ('MetaSel ('Just "betaStd2S") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedStrict) (Rec0 Double))))) |

zeroModel1 :: Model1 Source #

zeroised Model1

depModel1 :: Double -> Model1 -> Mealy Double Double Source #

Apply a model1 relationship using a single decay factor.

`>>>`

`:set -XOverloadedLabels`

`>>>`

`import Control.Lens`

`>>>`

-0.4591515493154126`fold (depModel1 0.01 (zeroModel1 & #betaMa2X .~ 0.1)) xs0`

# median

A rough Median. The average absolute value of the stat is used to callibrate estimate drift towards the median

onlineL1 :: (Ord b, Field b, Signed b) => b -> b -> (a -> b) -> (b -> b) -> Mealy a b Source #

onlineL1 takes a function and turns it into a `Fold`

where the step is an incremental update of an (isomorphic) median statistic.

onlineL1' :: (Ord b, Field b, Signed b) => b -> b -> (a -> b) -> (b -> b) -> Mealy a (b, b) Source #

onlineL1' takes a function and turns it into a `Mealy`

where the step is an incremental update of an (isomorphic) median statistic.