BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20241120T082410Z
LOCATION:HG E 1.2
DTSTART;TZID=Europe/Stockholm:20240603T173000
DTEND;TZID=Europe/Stockholm:20240603T180000
UID:submissions.pasc-conference.org_PASC24_sess176_pap117@linklings.com
SUMMARY:Lockstep-Parallel Dualization of Surface Triangulations
DESCRIPTION:Paper\n\nJonas Dornonville de la Cour (Aarhus University); Car
 l-Johannes Johnsen (University of Copenhagen); and James Emil Avery (Aarhu
 s University, University of Copenhagen)\n\nWe present a massively parallel
  lockstep algorithm for dualizing large numbers of surface triangulation g
 raphs, and an effective implementation for CPU, GPU and multi-GPU. The alg
 orithm is fully combinatorial, i.e., it does not require or use a planar o
 r spatial embedding, only the graph.\n\nThis work is motivated by a wish t
 o perform computational chemistry experiments on entire isomerspaces of po
 lyhedral molecules, comprising billions of distinct molecules, each repres
 ented by a cubic graph. However, the algorithm applies not only to triangu
 lations of the sphere, but to any triangulations of oriented surfaces of a
 ny genus, for example toroidal topologies.\n\nOur multi-vendor implementat
 ion in SYCL outperforms the previous sequential state-of-the-art by 4 orde
 rs of magnitude on our consumer NVIDIA RTX3080 Graphics Processing Unit (G
 PU), with average throughput 37ps(+/- 0.1ps) per vertex (varying from 50ps
  to 31ps for C72-C200). Thus, dualizing e.g. all 214,127,742 C200 fulleren
 e molecules adds a mere 1.49s(+/- 0.01s) to the total processing time, neg
 ligible compared to the two hours required to generate the graphs. We subs
 equently perform extreme multi-node-multi-GPU scaling experiments on the L
 UMI-G supercomputer, achieving near-perfect scaling up to 1024 MI250x Grap
 hics Compute Dies (GCD), in total 14.5 million cores. Calculations show th
 at dualization has moved from a bottle-neck to being ready to contribute t
 o our planned large-scale chemical experiments for all 2.7 x 10^12 fullere
 ne molecules from C20 through C400.\n\nDomain: Computational Methods and A
 pplied Mathematics\n\nSession Chair: Jamil Gafur (The University of Iowa, 
 National Renewable Energy Laboratory)
END:VEVENT
END:VCALENDAR
