{"id":3461,"date":"2025-03-24T10:15:21","date_gmt":"2025-03-24T10:15:21","guid":{"rendered":"https:\/\/blogs.qub.ac.uk\/dipsa\/?p=3461"},"modified":"2025-03-24T15:05:45","modified_gmt":"2025-03-24T15:05:45","slug":"pacobi-scaling-parallelism-and-convexity-hurdles-in-bi-level-machine-learning","status":"publish","type":"post","link":"https:\/\/blogs.qub.ac.uk\/dipsa\/pacobi-scaling-parallelism-and-convexity-hurdles-in-bi-level-machine-learning\/","title":{"rendered":"PaCoBi: Scaling Parallelism and Convexity Hurdles in Bi-Level Machine Learning"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>The accuracy of artificial neural network (ANN) models depends on the ability to effectively and efficiently incorporate adequately large data sets into the model training. The training<\/em> <em>of these models may include not only the tuning of network weights and biases for minimising inference error, but also the tuning of network topologies and other hyperparameters that have impact on generalisation quality, robustness, and inference efficiency. These include features such as the sparse utilisation of network edges and neurons, and binarised and\/or quantised restrictions on parameter and activation values.\u00a0Thus, we pose the research objective of improving upon the ability to incorporate additional features into optimisation modelling of ANN training while retaining the scale at which training data may be incorporated.\u00a0 These objectives require improvement in both the optimisation methodologies and in the parallelisation of the methods that underly the training of machine learning (ML) models under these multiple objectives. \u00a0<\/em><\/p>\n\n\n\n<p><em>We explore various ML model formulations, including variants of SGD approaches, and also alternative optimisation models based on convexified and discretised models including mixed-integer linear reformulations of binarised\/quantised models.\u00a0 For these models, we consider the decompositions that are likely to yield effective application of parallelisation approaches. We address the challenges presented by the non-convexity, the combinatorial, and large-scale qualities of the problems as they arise in parallel algorithmic paradigms such as the alternating direction method of multipliers (ADMM), which recent literature results provide at least limited guarantees for various nonconvexity structures. Importantly, we address the mitigation of nonconvexity as it 1) impedes the coordination of loosely coupled subproblems in approaches such as ADMM and 2) as it leads to suboptimal locally optimal solutions. For this, we apply a variety of approaches based on combinatorial reformulations, applications of convexifications, and embeddings in combinatorial frameworks for tightening the convexification. The relationship between the different ANN training objectives intersects with the challenging areas of mixed-integer\/combinatorial optimisation, multiobjective optimisation, and bilevel optimisation. Developments in these areas would enable the improved capacity to solve increasingly complicated ML models for improved versatility and robustness.\u00a0<\/em><\/p>\n\n\n\n<p><em>PaCoBi is Brian Dandurand&#8217;s individual Marie Sk\u0142odowska Curie Fellowship (Horizon Europe project number\u00a0101153359), supported by <a href=\"https:\/\/gtr.ukri.org\/projects?ref=EP%2FZ001110%2F1\">UKRI EPSRC project grant EP\/Z001110\/1<\/a>.<\/em><\/p>\n<\/blockquote>\n<\/blockquote>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1316,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[139],"tags":[],"class_list":["post-3461","post","type-post","status-publish","format-standard","hentry","category-pacobi"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/3461","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/users\/1316"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/comments?post=3461"}],"version-history":[{"count":7,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/3461\/revisions"}],"predecessor-version":[{"id":3469,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/3461\/revisions\/3469"}],"wp:attachment":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/media?parent=3461"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/categories?post=3461"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/tags?post=3461"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}