{"id":640,"date":"2021-05-28T17:34:31","date_gmt":"2021-05-28T16:34:31","guid":{"rendered":"https:\/\/blogs.qub.ac.uk\/dipsa\/invited-talk-adaptiveness-and-lock-free-synchronization-in-parallel-stochastic-gradient-descent-by-karl-backstrom\/"},"modified":"2021-05-28T17:34:31","modified_gmt":"2021-05-28T16:34:31","slug":"invited-talk-adaptiveness-and-lock-free-synchronization-in-parallel-stochastic-gradient-descent-by-karl-backstrom","status":"publish","type":"post","link":"https:\/\/blogs.qub.ac.uk\/dipsa\/invited-talk-adaptiveness-and-lock-free-synchronization-in-parallel-stochastic-gradient-descent-by-karl-backstrom\/","title":{"rendered":"Invited Talk: Adaptiveness and Lock-free Synchronization in Parallel Stochastic Gradient Descent by Karl B\u00e4ckstr\u00f6m"},"content":{"rendered":"\n<p>3 June 2021<br><br><strong>Abstract<\/strong>:&nbsp;<br>The emergence of big data in recent years due to the vast societal digitalization and large-scale sensor deployment&nbsp;has entailed significant interest in machine learning methods to enable automatic data analytics. In a majority of the&nbsp;learning algorithms used in industrial as well as academic settings, the first-order iterative optimization procedure&nbsp;Stochastic gradient descent (SGD), is the backbone. However, SGD is often time-consuming, as it typically requires&nbsp;several passes through the entire dataset to converge to a solution of sufficient quality. In order to cope with increasing&nbsp;data volumes, and to facilitate accelerated processing utilizing contemporary hardware, various parallel SGD variants&nbsp;have been proposed. In addition to traditional synchronous parallelization schemes, asynchronous ones have received&nbsp;particular interest in recent literature due to their improved ability to scale due to less coordination, and subsequently&nbsp;waiting time. However, asynchrony implies inherent challenges in understanding the execution of the algorithm and its&nbsp;convergence properties, due the presence of both stale and inconsistent views of the shared state. In this work, we&nbsp;aim to increase the understanding of the convergence properties of SGD for practical applications under asynchronous&nbsp;parallelism and develop tools and frameworks that facilitate improved convergence properties as well as further&nbsp;research and development. First, we focus on understanding the impact of staleness, and introduce models for&nbsp;capturing the dynamics of parallel execution of SGD. This enables (i) quantifying the statistical penalty on the&nbsp;convergence due to staleness and (ii) deriving an adaptation scheme, introducing a staleness-adaptive SGD variant&nbsp;MindTheStep-AsyncSGD, which provably reduces this penalty. Second, we aim at exploring the impact of&nbsp;synchronization mechanisms, in particular consistency-preserving ones, and the overall effect on the convergence&nbsp;properties. To this end, we propose Leashed-SGD, an extensible algorithmic framework supporting various&nbsp;synchronization mechanisms for different degrees of consistency, enabling in particular a lock-free and consistency-preserving implementation. In addition, the algorithmic construction of Leashed-SGD enables dynamic memory&nbsp;allocation, claiming memory only when necessary, which reduces the overall memory footprint. We perform an&nbsp;extensive empirical study, benchmarking the proposed methods, together with established baselines, focusing on the&nbsp;prominent application of Deep Learning for image classification on the benchmark datasets MNIST and CIFAR,&nbsp;showing significant improvements in converge time for Leashed-SGD and MindTheStep-AsyncSGD.<\/p>\n\n\n\n<p><br><strong>Bio<\/strong>: <\/p>\n\n\n\n<p>Karl B\u00e4ckstr\u00f6m is a Ph.D. student at the Distributed Computing and Systems group at Chalmers University of&nbsp;Technology in Sweden. Karl has an academic background in Mathematics, Computer Science, and Engineering&nbsp;physics, with an overarching interest in distributed and parallel computation, optimization, and machine learning. Karl&#8217;s&nbsp;research directions include adaptiveness, synchronization, and consistency in parallel algorithms for iterative&nbsp;optimization. At the&nbsp;35th IEEE International Parallel and Distributed Processing Symposium, Karl with co-authors were&nbsp;awarded&nbsp;Best Paper Honorable Mention&nbsp;for the paper \u201cConsistent Lock-free Parallel Stochastic Gradient Descent for&nbsp;Fast and Stable Convergence\u201d. Karl lives in Gothenburg, a coastal city in western Sweden, together with his Swiss&nbsp;Shepherd Valdi, often enjoying their free time together in nature and wilderness, or at home playing the Piano.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>3 June 2021 Abstract:&nbsp;The emergence of big data in recent years due to the vast societal digitalization and large-scale sensor deployment&nbsp;has entailed significant interest in machine learning methods to enable automatic data analytics. In a majority of the&nbsp;learning algorithms used &hellip; <a href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/invited-talk-adaptiveness-and-lock-free-synchronization-in-parallel-stochastic-gradient-descent-by-karl-backstrom\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":974,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[33],"class_list":["post-640","post","type-post","status-publish","format-standard","hentry","category-uncategorised","tag-seminars"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/640","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/users\/974"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/comments?post=640"}],"version-history":[{"count":0,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/640\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/media?parent=640"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/categories?post=640"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/tags?post=640"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}