The accuracy of artificial neural network (ANN) models depends on the ability to effectively and efficiently incorporate adequately large data sets into the model training. The training of these models may include not only the tuning of network weights and biases for minimising inference error, but also the tuning of network topologies and other hyperparameters that have impact on generalisation quality, robustness, and inference efficiency. These include features such as the sparse utilisation of network edges and neurons, and binarised and/or quantised restrictions on parameter and activation values. Thus, we pose the research objective of improving upon the ability to incorporate additional features into optimisation modelling of ANN training while retaining the scale at which training data may be incorporated. These objectives require improvement in both the optimisation methodologies and in the parallelisation of the methods that underly the training of machine learning (ML) models under these multiple objectives.
We explore various ML model formulations, including variants of SGD approaches, and also alternative optimisation models based on convexified and discretised models including mixed-integer linear reformulations of binarised/quantised models. For these models, we consider the decompositions that are likely to yield effective application of parallelisation approaches. We address the challenges presented by the non-convexity, the combinatorial, and large-scale qualities of the problems as they arise in parallel algorithmic paradigms such as the alternating direction method of multipliers (ADMM), which recent literature results provide at least limited guarantees for various nonconvexity structures. Importantly, we address the mitigation of nonconvexity as it 1) impedes the coordination of loosely coupled subproblems in approaches such as ADMM and 2) as it leads to suboptimal locally optimal solutions. For this, we apply a variety of approaches based on combinatorial reformulations, applications of convexifications, and embeddings in combinatorial frameworks for tightening the convexification. The relationship between the different ANN training objectives intersects with the challenging areas of mixed-integer/combinatorial optimisation, multiobjective optimisation, and bilevel optimisation. Developments in these areas would enable the improved capacity to solve increasingly complicated ML models for improved versatility and robustness.
PaCoBi is Brian Dandurand’s individual Marie Skłodowska Curie Fellowship (Horizon Europe project number 101153359), supported by UKRI EPSRC project grant EP/Z001110/1.
Category Archives: Machine Learning
Sweeping AAAI’25 success
We have been fortunate to have 3 papers accepted at AAAI’25.
Hung and colleagues will present their work on explainability of time series classification. InteDisUX aims to create explanations that are accessible and meaningful to users (real people) by identifying subsequences of the time series that provide positive or negative influence on a prediction. It uses a segment-level integrated gradient to merge successive segments into variable-length segments with high faithfulness and robustness. Follow the paper here: https://pure.qub.ac.uk/en/publications/intedisux-intepretation-guided-discriminative-user-centric-explan or come visit Hung at poster #8580. This work is funded by the MSCA-DN network RELAX.
Zichi and colleagues will present their work on WaveletMixer, a new time series forecasting method that leverages wavelets to create a latent representation at multiple levels of resolution and phases. It creates distinct forecasting models for each resolution, where the relationships between different frequency domains are exploited to update each of the models. Zichi also introduces a new MLP model for timeseries forecasting that works well in this setting. Follow the paper here: https://pure.qub.ac.uk/en/publications/waveletmixer-a-multi-resolution-wavelets-based-mlp-mixer-for-mult or come visit Zich at poster #10198. Zichi is supported by a scholarship from the China Scholarship Council.
Kazi Hasan Ibn Arif is a PhD student at Virginia Tech who we collaborate with through the US-Ireland project ‘SWEET’ (USI-226). Kazi has developed a new technique to improve the computational efficiency of high-resolution Vision-Language Models. A VLM combines two models, one to generate language tokens from the image, followed by a large language model. The technique uses attention in the token generation model to selectively drop tokens according to predefined budgets. The paper is on arxiv: https://arxiv.org/abs/2408.10945. Come visit Kazi at poster #7547.
Accelerating Scientific Discovery Using Domain Adaptive Language Modeling
Scientific corpora, such as papers and patents, are great source of information. Incorporating this information into scientific discovery pipelines is a great challenge that could reduce the discovery costs and speed-up the process. Motivating by this fact and leveraging the recent advances of the Natural Language Processing (NLP) domain, we provide domain adaptive NLP methods that are able to understand the scientific domain and its specific characteristics and facilitate necessary tasks for the discovery process.