{"id":4013,"date":"2026-06-15T14:32:20","date_gmt":"2026-06-15T13:32:20","guid":{"rendered":"https:\/\/blogs.qub.ac.uk\/dipsa\/?p=4013"},"modified":"2026-06-15T14:32:21","modified_gmt":"2026-06-15T13:32:21","slug":"configspec-configuration-selection-framework-for-edgecloud-llm-inference","status":"publish","type":"post","link":"https:\/\/blogs.qub.ac.uk\/dipsa\/configspec-configuration-selection-framework-for-edgecloud-llm-inference\/","title":{"rendered":"ConfigSpec at TDIS &#8217;26: Configuration Selection Framework for Edge\u2013Cloud LLM Inference"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/IMG_3349-1024x768.jpeg\" alt=\"\" class=\"wp-image-4030\" srcset=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/IMG_3349-1024x768.jpeg 1024w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/IMG_3349-300x225.jpeg 300w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/IMG_3349-768x576.jpeg 768w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/IMG_3349-1536x1152.jpeg 1536w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/IMG_3349-2048x1536.jpeg 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"768\" data-id=\"4028\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/2026-04-27-14.56.27-1-1024x768.jpg\" alt=\"\" class=\"wp-image-4028\" srcset=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/2026-04-27-14.56.27-1-1024x768.jpg 1024w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/2026-04-27-14.56.27-1-300x225.jpg 300w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/2026-04-27-14.56.27-1-768x576.jpg 768w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/2026-04-27-14.56.27-1-1536x1152.jpg 1536w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/2026-04-27-14.56.27-1-2048x1536.jpg 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"768\" data-id=\"4029\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/tdis2026-1024x768.jpg\" alt=\"\" class=\"wp-image-4029\" srcset=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/tdis2026-1024x768.jpg 1024w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/tdis2026-300x225.jpg 300w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/tdis2026-768x576.jpg 768w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/tdis2026-1536x1152.jpg 1536w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/tdis2026-2048x1536.jpg 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" data-id=\"4024\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/eurosys-1024x768.jpeg\" alt=\"\" class=\"wp-image-4024\" style=\"width:785px;height:auto\" srcset=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/eurosys-1024x768.jpeg 1024w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/eurosys-300x225.jpeg 300w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/eurosys-768x576.jpeg 768w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/eurosys-1536x1152.jpeg 1536w, https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/eurosys.jpeg 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/figure>\n\n\n\n<div style=\"height:33px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">At the <strong>4th International Workshop on Testing Distributed Internet of Things Systems (TDIS &#8217;26),<\/strong> Babar Ali presented <strong>ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge\u2014Cloud Speculative LLM Serving. TDIS<\/strong> held in Edinburgh, Scotland, as part of the broader <strong>EUROSYS<\/strong> conference (April 27\u201330, 2026).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">TDIS focuses on the tools and frameworks for testing and evaluating distributed IoT systems, which is an increasingly critical area as the emerging applications are spreading the computations across the edge&#8211;cloud continuum. This year&#8217;s edition brought together researchers working on distributed systems, edge computing, and AI inference, making it an ideal venue for our work on collaborative LLM deployment across edge and cloud.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What is ConfigSpec?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To remedy this, speculative decoding emerged as a promising solution, which enables collaborative inferencing. In this approach, a lightweight &#8220;draft&#8221; model on the edge device quickly proposes a sequence of candidate tokens, which are then sent to a powerful &#8220;target&#8221; model in the cloud for verification. The accepted tokens become part of the final response, and the rejected are discarded. Speculative decoding aids in distributing the work between edge and cloud while offering the response quality of the \u201ctarget\u201d model.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If not carefully used, the promising Speculative decoding can yield delayed responses and escalate cost, energy, and memory consumption across the heterogeneous hardware. Because there is an enormous number of configuration choices to make: which draft model to deploy? What should be the quantization level? Which model family to prefer? How many tokens to speculate at a time? And on which hardware? Finding answers to select the best configuration can be time-consuming and costly, which leads us to present this work.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>ConfigSpec<\/strong> is our framework for navigating this space systematically. It profiles each edge device and draft model, measuring drafting speed, power draw, and how well the draft model aligns with the target model. From these measurements, it analytically evaluates three integral deployment objectives:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Goodput:<\/strong> How many verified tokens are produced per second?<\/li>\n\n\n\n<li><strong>Cost efficiency<\/strong> &#8211; How many accepted tokens are produced per dollar of cloud API spend?<\/li>\n\n\n\n<li><strong>Energy efficiency<\/strong> &#8211; How many joules are consumed per verified token on the edge device, given the heterogeneous hardware?<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">We evaluated ConfigSpec across three edge platforms (Raspberry Pi 4B, Raspberry Pi 5, and NVIDIA Jetson AGX Orin) and two LLM families (Llama 3 and Qwen3) using the Databricks Dolly dataset, and found interesting insights. To begin with, goodput favours the smallest, fastest draft model with a device-dependent speculation length. Cost efficiency always favours the largest draft model at a speculation length of 2, where Qwen demonstrated better cost efficiency. Finally, energy efficiency agrees with goodput on model size but also converges to a speculation length of 2.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Consequently, no single configuration wins on all three fronts, confirming that profiling-based selection is not just helpful but necessary, where ConfigSpec provides industry-led practical insights for diverse configuration spaces.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Read the Paper<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The full paper is available via the ACM Digital Library:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/doi.org\/10.1145\/3802513.3803483\"><strong>ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge\u2013Cloud Speculative LLM Serving<\/strong><\/a><\/p>\n\n\n\n<div style=\"height:78px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<div data-wp-interactive=\"core\/file\" class=\"wp-block-file\"><object data-wp-bind--hidden=\"!state.hasPdfPreview\" hidden class=\"wp-block-file__embed\" data=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/ConfigSpec.pdf\" type=\"application\/pdf\" style=\"width:100%;height:600px\" aria-label=\"Embed of ConfigSpec.\"><\/object><a id=\"wp-block-file--media-246fad7d-0f32-4581-a9dd-c4de63dd7d83\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/ConfigSpec.pdf\">ConfigSpec<\/a><a href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2026\/06\/ConfigSpec.pdf\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-246fad7d-0f32-4581-a9dd-c4de63dd7d83\">Download<\/a><\/div>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>At the 4th International Workshop on Testing Distributed Internet of Things Systems (TDIS &#8217;26), Babar Ali presented ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge\u2014Cloud Speculative LLM Serving. TDIS held in Edinburgh, Scotland, as part of the broader EUROSYS conference (April 27\u201330, 2026). TDIS focuses on the tools and frameworks for testing and evaluating distributed IoT [&hellip;]<\/p>\n","protected":false},"author":2700,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[115,31],"tags":[15,156,158,157],"class_list":["post-4013","post","type-post","status-publish","format-standard","category-sweet","category-transprecision","tag-distributed-comptuing","tag-large-language-models","tag-llm-inference","tag-speculative-decoding","czr-hentry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/4013","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/users\/2700"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/comments?post=4013"}],"version-history":[{"count":21,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/4013\/revisions"}],"predecessor-version":[{"id":4042,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/4013\/revisions\/4042"}],"wp:attachment":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/media?parent=4013"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/categories?post=4013"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/tags?post=4013"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}