Invited Talk – Efficient Computation through Tuned Approximation by David Keyes

21 February 2024

Abstract

Numerical software is being reinvented to provide opportunities to tune dynamically the accuracy of computation to the requirements of the application, resulting in savings of memory, time, and energy.  Floating point computation in science and engineering has a history of “oversolving” relative to expectations for many models. So often are real datatypes defaulted to double precision that GPUs did not gain wide acceptance until they provided in hardware operations not required in their original domain of graphics.  Computational science is now reverting to employ lower precision arithmetic where possible. Many matrix operations allow for lower precision considered at a blockwise level without loss of accuracy, adapting to the magnitude of the norm of the block. Furthermore, many blocks can be approximated with low-rank near equivalents to a prescribed accuracy, adapting to the smoothness of the coefficients of the block.  This leads to smaller memory footprint, which implies higher residency on memory hierarchies, leading in turn to less time and energy spent on data copying, which may even dwarf the savings from fewer and cheaper flops.  We provide examples from several application domains, including Gordon Bell Prize-nominated research in 2022 in environmental statistics and in 2023 in seismic processing.

Bio

David Keyes directs the Extreme Computing Research Center at the King Abdullah University of Science and Technology (KAUST), where he was a founding Dean in 2009 and currently serves in the Office of the President. He is a professor in the programs of Applied Mathematics, Computer Science, and Mechanical Engineering. He is also an Adjunct Professor of Applied Mathematics and Applied Physics at Columbia University, where he formerly held the Fu Foundation Chair. He works at the interface between parallel computing and PDEs and statistics, with a focus on scalable algorithms that exploit data sparsity. Before joining KAUST, Keyes led multi-institutional scalable solver software projects in the SciDAC and ASCI programs of the US Department of Energy (DoE), ran university collaboration programs at US DoE and NASA institutes, and taught at Columbia, Old Dominion, and Yale Universities. He is a Fellow of SIAM, the AMS, and the AAAS. He has been awarded the Gordon Bell Prize from the ACM, the Sidney Fernbach Award from the IEEE Computer Society, and the SIAM Prize for Distinguished Service to the Profession. He earned a B.S.E. in Aerospace and Mechanical Sciences from Princeton in 1978 and a Ph.D. in Applied Mathematics from Harvard in 1984.

Invited Talk – Fine-Grained and Phase-Aware Frequency Scaling for Energy-efficient Computing on Heterogeneous Multi-GPU Systems by Lorenzo Carpentieri

9 May 2025

Abstract

As computing power demands continue to grow, achieving energy efficiency in high-performance systems has become a key challenge. One of the most promising software techniques for energy efficiency is Dynamic Voltage and Frequency Scaling (DVFS) which optimize the energy-performance trade-off by changing hardware frequencies. 

This presentation introduces two complementary approaches that advance the state-of-the-art in energy-efficient heterogeneous computing through fine-grained and phase-aware frequency tuning.

The first approach, SYnergy, leverages a novel compiler- and runtime-integrated methodology built upon the SYCL programming model to enable fine-grained frequency scaling on heterogeneous hardware. SYnergy allows developers to specify energy goals for each individual kernel such as minimizing Energy-Delay Product (EDP) or achieving predefined energy-performance tradeoffs. Through compiler integration and a machine learning model, the frequency of each kernel is statically optimized based on the specific energy goal. To extend this fine-grained control to large-scale systems, SYnergy includes a custom SLURM plugin that enables execution across all available devices in a cluster, ensuring scalable energy savings.

While fine-grained frequency scaling at the kernel level can significantly improve energy efficiency, it also introduces overhead due to frequent frequency changes—an overhead that can, in some cases, outweigh the potential benefits. To address this, we propose a novel Phase-aware method that detects different phases through application profiling and DAG analysis and sets an optimal frequency for each phase. Our methodology also considers MPI programs, where the overhead can be hidden by overlapping frequency-change with communication. 

Bio

Lorenzo Carpentieri received his master’s degrees from the University of Salerno, Italy in 2022. He is now a PhD student in the Department of Computer Science at University of Salerno, Italy, under the supervision of Prof. Biagio Cosenza. His research interests include high-performance computing, compiler technology, and programming models having a particular interest in energy efficient and approximate computing.

Invited Talk – A Constraint Programming Solver You Can Trust (But Don’t Have To) by Ciaran McCreesh

28 August 2025

Abstract

Constraint programming is a declarative way of solving hard combinatorial, scheduling, resource allocation, and logistics problems. We specify a problem in a high-level language, give it to a solver, and the solver thinks for a while and then gives us the optimal answer. Unfortunately, even the best commercial and academic solvers contain bugs, and will occasionally give a wrong answer, potentially with devastating effects. One way of avoiding this situation is through proof logging, where solvers are modified to output a mathematical proof of correctness alongside their solution. This proof can then be independently audited by a very simple (and potentially even formally verified) proof checking tool, giving us complete confidence in the correctness of solutions (although not the solvers themselves). I’ll explain how proof logging works in general, and give an overview of the challenges and fun involved in bringing it to constraint programming. Ultimately, the aim here is to make algorithms something people can trust with their lives and livelihoods, just as engineers have already done with bridges, planes, and lifts.

Bio

Ciaran McCreesh is a Royal Academy of Engineering Research Fellow working in the Formal Analysis, Theory and Algorithms group in the School of Computing Science at the University of Glasgow. His research looks at practical parallel algorithms, particularly in relation to hard subgraph problems. His publications cover combinatorial search, parallel algorithms, and constraint programming.

FERNS: Holistic integration of eco-Friendly dEsign tools, mateRials, fabrication technologies for the responsible co-creation of future Sustainable integrated electronic systems

The FERNS project is an MSCA Doctoral Network aiming to design eco-friendly electronics and accelerate their uptake. This multi-disciplinary project spans across the fields of material science, engineering, social science and business to acquire a holistic perspective of the field of sustainable electronics. In DIPSA, we will investigate the system software stack for disposable electronic devices, focussing on sensors.

The assumed context is that sensor devices are powered by minimal batteries which are charged through energy harvesting devices that capture energy from ambient environmental sources such as light, heat, and mechanical motion. For the sensors to complete their tasks of capturing and processing sensor data, temporarily storing data and transmitting it over radio signals, they need to manage energy budgets carefully. There are several degrees of freedom that can be leveraged to maximise the utility of the available energy, among which: scheduling tasks based on predicted energy availability, and adapting the precision of tasks to improve the trade-off between energy-efficiency, timeliness and quality-of-service.

This research project will combine insights from the burgeoning field of intermittent computing and from transprecise computing. Intermittent computing investigates the consequences of intermittent power supplies on computing systems design. Transprecision computing studies the trade-off between accuracy, performance and energy consumption when the precision of the computation is varied in a controlled manner. The focus of the project will be on the design of the system software, which includes runtime systems and operating systems.

One PhD position will be available to investigate these issues. The position is salaried and will include a secondment at one of the partner institutions in the project. The project will provide for a broad training in transferable skills and sustainable practices. The intended start date is 1 April 2026.

Best Poster Award at SIMULTECH 2025

We are please to announce that Zohreh Moradinia has won the Best Poster Award at SIMULTECH 2025 for her work on “Machine Learning-Driven Framework for Identifying Parameter-Driven Anomalies in Multiphysics Simulations”. This work investigates whether errors in scientific simulations can be detected using machine learning. Zohreh assumes errors resulting from incorrect configuration of the simulation, such as time steps which are too large. She has trained several models that can identify when and where the simulations have gone wrong. This is useful as a means to check validity of simulation results, especially when the simulation is configured with liberal parameter settings that aim to result in high simulation speed.

Zohreh performed this work during her PhD in DIPSA. She is currently a Research Fellow at Imperial College London.

DIPSA at IPDPS’25

Two of our papers were accepted at IPDPS’25.

Brian will present his work on improving the scalability of parallel molecular dynamics simulation. He has developed a novel way to reduce the scalability bottleneck that exists in the communication between those processes computing short-range forces vs those computing long-range forces. His technique discards data dependences when long-range processes are “too slow” and uses interpolation of the (slowly-varying) long-range forces to progress the computation. Stay tuned for the camera-ready copy of the paper! This work was supported by the EPSRC New Horizons project ASCCED (EP/X01794X/1).

Hans will present a parallel algorithm for the maximum clique problem. The key ideas relate to reducing the amount of work where possible, which includes delaying or avoiding the construction of fast representations of neighbour lists, early-exiting set intersection operations and algorithmic choice between maximum clique search and the complementary minimum vertex cover problem.

Addtionally, Marco will attend IPDPS’25 by virtue of a travel grant from the TCHPC/TCPP HPC student cohort programme.

Sweeping AAAI’25 success

We have been fortunate to have 3 papers accepted at AAAI’25.

Hung and colleagues will present their work on explainability of time series classification. InteDisUX aims to create explanations that are accessible and meaningful to users (real people) by identifying subsequences of the time series that provide positive or negative influence on a prediction. It uses a segment-level integrated gradient to merge successive segments into variable-length segments with high faithfulness and robustness. Follow the paper here: https://pure.qub.ac.uk/en/publications/intedisux-intepretation-guided-discriminative-user-centric-explan or come visit Hung at poster #8580. This work is funded by the MSCA-DN network RELAX.

Zichi and colleagues will present their work on WaveletMixer, a new time series forecasting method that leverages wavelets to create a latent representation at multiple levels of resolution and phases. It creates distinct forecasting models for each resolution, where the relationships between different frequency domains are exploited to update each of the models. Zichi also introduces a new MLP model for timeseries forecasting that works well in this setting. Follow the paper here: https://pure.qub.ac.uk/en/publications/waveletmixer-a-multi-resolution-wavelets-based-mlp-mixer-for-mult or come visit Zich at poster #10198. Zichi is supported by a scholarship from the China Scholarship Council.

Kazi Hasan Ibn Arif is a PhD student at Virginia Tech who we collaborate with through the US-Ireland project ‘SWEET’ (USI-226). Kazi has developed a new technique to improve the computational efficiency of high-resolution Vision-Language Models. A VLM combines two models, one to generate language tokens from the image, followed by a large language model. The technique uses attention in the token generation model to selectively drop tokens according to predefined budgets. The paper is on arxiv: https://arxiv.org/abs/2408.10945. Come visit Kazi at poster #7547.

SIMD Bit Twiddling Hacks

The Bit Twiddling Hacks website collects an array of useful code fragments that implement some very specific computations very efficiently. Here we collect references to some handy code fragments for SIMD based computation.

Technical Posts

Open Position for Post-Doctoral Researcher on transprecise scheduling of machine learning tasks in edge and IoT environments

We are currently seeking to appoint an exceptional candidate to the post of Research Fellow.

The post holder will perform research on deployment of machine-learned models for health analytics on distributed IoT/edge/cloud systems using transprecise computing and contribute to the research project “Sustainable Wearable Edge InTelligence (SWEET)”.

The successful candidate must have, and your application should clearly demonstrate that you meet the following criteria:

  • Normally have, or be about to obtain, a relevant PhD. Relevant areas include high-performance computing, middleware and computing systems.
  • Recent relevant research experience to include:
    • Undertaking research in the area of high-performance / distributed / parallel computing or middleware
    • A proven track record of using experimental models to carry out analyses, critical evaluations, and interpretations of experimental data as relevant to the research project
    • Working effectively as part of a research team in the development and promotion of the research theme.
    • Strong publication record commensurate with stage of career.

Please note the above are not an exhaustive list. For further information about the role including the essential and desirable criteria please check the recruitment web page.

This post is available on a fixed term contract for 33 months.