PaCoBi: Scaling Parallelism and Convexity Hurdles in Bi-Level Machine Learning

The accuracy of artificial neural network (ANN) models depends on the ability to effectively and efficiently incorporate adequately large data sets into the model training. The training of these models may include not only the tuning of network weights and biases for minimising inference error, but also the tuning of network topologies and other hyperparameters that have impact on generalisation quality, robustness, and inference efficiency. These include features such as the sparse utilisation of network edges and neurons, and binarised and/or quantised restrictions on parameter and activation values. Thus, we pose the research objective of improving upon the ability to incorporate additional features into optimisation modelling of ANN training while retaining the scale at which training data may be incorporated.  These objectives require improvement in both the optimisation methodologies and in the parallelisation of the methods that underly the training of machine learning (ML) models under these multiple objectives.  

We explore various ML model formulations, including variants of SGD approaches, and also alternative optimisation models based on convexified and discretised models including mixed-integer linear reformulations of binarised/quantised models.  For these models, we consider the decompositions that are likely to yield effective application of parallelisation approaches. We address the challenges presented by the non-convexity, the combinatorial, and large-scale qualities of the problems as they arise in parallel algorithmic paradigms such as the alternating direction method of multipliers (ADMM), which recent literature results provide at least limited guarantees for various nonconvexity structures. Importantly, we address the mitigation of nonconvexity as it 1) impedes the coordination of loosely coupled subproblems in approaches such as ADMM and 2) as it leads to suboptimal locally optimal solutions. For this, we apply a variety of approaches based on combinatorial reformulations, applications of convexifications, and embeddings in combinatorial frameworks for tightening the convexification. The relationship between the different ANN training objectives intersects with the challenging areas of mixed-integer/combinatorial optimisation, multiobjective optimisation, and bilevel optimisation. Developments in these areas would enable the improved capacity to solve increasingly complicated ML models for improved versatility and robustness. 

PaCoBi is Brian Dandurand’s individual Marie Skłodowska Curie Fellowship (Horizon Europe project number 101153359), supported by UKRI EPSRC project grant EP/Z001110/1.