Mixed Precision Training of Gaussian Processes
Gaussian processes are one of the most popular tools in scientific computing. Depending on the application, different kernel functions and their gradients can be used. Computing the gradient of the kernel matrix can be computationally very expensive and may exceed the memory limit. Due to matrix and vector operations, gradient calculation dominates the cost of the whole algorithm depending on the size of the dataset. There are various techniques currently being used in Gaussian processes to avoid memory and cost-related problems, such as low-rank approximation and using matrix determinant lemma. Another technique can be using multiple precisions in the training algorithm. When the data is large, one may not need a high accuracy for training. Low precision is thus commonly used in data science applications. However, using lower precision can introduce stability issues in direct or iterative solvers used in the training algorithm. Thus, one must use a higher precision in such parts of the algorithm without sacrificing the performance. In this talk, we introduce a new mixed-precision approach to train Gaussian processes using several techniques. Motivated by the recent emergence of commercially available low-precision hardware, we propose to use multiple precisions during training, low-rank approximation, and a stable direct solver to have a cheap, fast, and stable algorithm.