Data

Data

Data

Apr 4, 2014

Calculating the Minimum Variance Portfolio in R, Pandas and IAP

Calculating the Minimum Variance Portfolio in R, Pandas and IAP

Calculating the Minimum Variance Portfolio in R, Pandas and IAP

As part of producing a demo for FP Complete's new IAP product, I

wound up implementing the Minimum Variance Portfolio calculation

for a stock portfolio in R, then in Haskell for the IAP, and

finally in Python using the NumPy and SciPy extension libraries. I

want to look at the process of writing each of these, and the

resulting code in this article. I am not an expert in these things,

so if you know a better way to do them, please let me know. In

particular, the R example was borrowed from someone else.


The function

First, we find a version of the formula for MVP that can be converted into those systems. I like the one used by Yue Kuen KWOK:

ω = Ω⁻¹ 1/ 1 ⋅ Ω⁻¹ 1

where Ω is the covariant matrix of the returns for the stocks in question.

R version

The R version is fairly straightforward - we just need to put the pieces together:

minvar <- function (px){
    Rt <- px[-1,]/px[-nrow(px),]-1
    cov.Rt <- cov(Rt)
    one.vec <- rep(1,nrow(cov.Rt))
    num <- solve(cov.Rt) %*% one.vec
    den <- as.numeric(t(one.vec) %*% num)
    return(num/den)
}

Calculating returns can be done with standard matrix operations

and slicing. The covariant function is built in, as is inverting it

(solve). Since the numerator Ω⁻¹ 1 appears in the denominator, I reuse it's value there.


All the array operations were documented in the same place. That

I only needed one unit vector was a bit of a surprise, but R sized

it dynamically to work. That I had to transpose the unit vector and

use the cross product operator (%*%) to get a dot

product was a a less pleasant surprise, but is apparently a

standard R idiom.


Python version

The Python version is almost a direct copy of the R version.

def minvar(prices):
    cov = np.cov((prices[1:] / prices[:-1] - 1).transpose())
    vu = np.array(cov.shape[1] * [1], float)
    num = np.dot(np.linalg.inv(cov), vu)
    den = np.dot(vu, num)
    return num / den

In this case, I passed the returns matrix to the covariant function directly. The NumPy dot function performs

both cross products and dot products, and again the unit vector

adopts it's length dynamically.


Documentation was a bit more scattered. Being a mix of

traditional imperative and object-oriented, some functions are

functions in the module, and others are object methods. The biggest

surprise was that the returns matrix needed to be transposed before

the covariant was calculated.


Haskell version

Haskell is not quite as obvious a translation of the R and

Python versions, but is a more straightforward translation of the

original formula - once you notice that Ω⁻¹ 1 has been factored into tv. It reads from top to bottom like the

original, with the main formula at the top and the various terms

defined below.


Returns again use the standard Haskell idiom for slicing the

array. This is a bit more verbose than either R or Python, as they

are functions rather than special syntax.


The documentation was again straightforward, with everything

being a function in the hmatrix package. In this case, both unit

vectors were needed, as Haskell does not scale them automatically.

It was the least surprising in at least one way - it used a

distinct dot product operator for the two vectors rather than

transposing - whether implicitly like Python or explicitly like R -

the unit vector in a cross product.


Performance

While performance comparisons with IAP aren't very useful, as it

runs in the cloud so doing comparisons on identical systems may be

impossible, Haskell does have one interesting advantage.


All three of these systems have the same basic architecture - a

high-level language running in an interactive environment, with

bindings to low-level, fast implementations of the matrix

manipulations. Haskell adds the ability to compile your program

into native x86_64 code. Doing so reduced the wall clock time of

this short demo by roughly two orders of magnitude.


Summary

I found the IAP version a little easier to deal with. Having

custom operators and functions for everything - and this will only

get better with time - along with being able to use the

mathematical layout of the where statement just made things

a little easier to deal with. While not having unit vectors that

automatically size themselves - or transpose into matrices - is a

little inconvenient, this also exposed a problem in the original R

version, in that the unit vector's length was initially set wrong.

I'm not sure that will make any real difference, but the thought

that the language can catch such errors for me is comforting.