BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20241120T082410Z
LOCATION:HG E 3
DTSTART;TZID=Europe/Stockholm:20240603T173000
DTEND;TZID=Europe/Stockholm:20240603T180000
UID:submissions.pasc-conference.org_PASC24_sess177_pap106@linklings.com
SUMMARY:SoftCache: A Software Cache for PCIe-Attached Hardware Accelerator
 s
DESCRIPTION:Paper\n\nSteven Wijnja and Nikolaos Alachiotis (University of 
 Twente)\n\nHardware accelerators are used to speed up computationally expe
 nsive<br />applications. Offloading<br />tasks to accelerator cards requir
 es data to be transferred between<br />the memory of the host and the exte
 rnal memory of the accelerator<br />card; this data movement becomes the b
 ottleneck for increasing<br />accelerator performance. Here, we explore th
 e use<br />of a software cache to optimize communication and alleviate the
 <br />data-movement bottleneck by transparently exploiting locality and<br
  />data reuse. We present a generic, application-agnostic framework,<br />
 dubbed SoftCache, that can be used with GPU and FPGA accelerator<br />card
 s. SoftCache exploits locality to optimize data movement<br />in a non-int
 rusive manner (i.e., no algorithmic changes are<br />necessary) and allows
  the programmer to tune the cache size, <br />organization, and replacemen
 t policy toward the application needs.<br />Each cache line can store data
  of any size, thereby eliminating the<br />need for separate caches for di
 fferent data types. We used a phylogenetic<br />application to showcase So
 ftCache. Phylogenetics study<br />the evolutionary history and relationshi
 ps among different species<br />or groups of organisms. The phylogenetic a
 pplication implements<br />a tree-search algorithm to create and evaluate 
 phylogenetic trees,<br />while hardware accelerators are used to reduce th
 e computation<br />time of probability vectors at every tree node. Using S
 oftCache,<br />we observed that the total number of bytes transferred duri
 ng a<br />complete run of the application was reduced by as much as 89%,<b
 r />resulting in up to 1.7x (81% of the theoretical peak) and 3.5x (75%<br
  />of the theoretical peak) higher accelerator performance (as seen by<br 
 />the application) for a GPU and an FPGA accelerator, respectively.\n\nDom
 ain: Engineering\n\nSession Chair: Carla Judith López Zurita (ETH Zurich)
END:VEVENT
END:VCALENDAR
