We are delighted to announce the posters that have been accepted to ACM SYSTOR’23.
A Smart Inhaler for Medication Adherence (Best Poster)
Noemi Bitterman, Itai Dabran and Tom Sofer, Technion
Asthma is a common inflammatory condition affecting more than 7 million children in US alone, and tens of millions more globally. Despite effective preventive medications, medication adherence in children and adolescents is often below 50% [1]. In this paper we present a novel personalized IoT-based system for improving children’s adherence to inhaler use that is integrated into their daily life.
Near-Memory Processing Offload to Remote (Persistent) Memory
Roei Kisous, Amit Golander, Yigal Korman, Tim Gubner, Rune Humborstad and Manyi Lu, Huawei Cloud
Traditional Von Neumann computing architectures are struggling to keep up with the rapidly growing demand for scale, performance, power-efficiency and memory capacity. One promising approach to this challenge is Remote Memory, in which the memory is over RDMA fabric.
We enhance the remote memory architecture with Near Memory Processing (NMP), a capability that offloads particular compute tasks from the client to the server side.
Similar motivation drove IBM to offload object processing to their remote KV storage.
NMP offload adds latency and server resource costs, therefore, it should only be used when the offload value is substantial, specifically, to save:
network bandwidth (e.g. Filter/Aggregate), round trip time (e.g., tree Lookup) and/or distributed locks (e.g. Append to a shared journal).
Adi Yehoshua and Ilya Kolchinsky, Red Hat; Assaf Schuster, Technion
Cloud computing has transformed the way organizations consume computing resources, offering greater flexibility, scalability, and accessibility. However, the management of cloud costs has emerged as a significant challenge. Public clouds have become popular due to their scalability and cost-effectiveness for complex workloads. Nevertheless, selecting, managing, and monitoring cloud services requires careful consideration, as they can be costly, especially for large and complex workloads. Furthermore, the complexity and magnitude of public cloud offerings present a formidable challenge with substantial combinatorial complexity that makes it increasingly difficult for organizations to make informed decisions about cloud deployment. With thousands of possible virtual machines across various instance types, regions, and operating systems, finding the most cost-effective solution can quickly become impractical and even impossible. Therefore, a “brute force” approach of examining all possible alternatives and selecting the cheapest option is highly impractical, especially for applications requiring a large number of VMs.
ClusterLink: A Multi-Cluster Application Interconnect
Kfir Toledo, Pravein Govindan Kannan, Michal Malka, Etai Lev-Ran, Katherine Barabash and Vita Bortnikov, IBM Research
Enterprises often deploy their business applications in multiple clouds as well as in multiple traditional environments. This work focuses on the connectivity aspects of this new way of operating and consuming digital services. We define the related requirements, analyze the challenges, and present ClusterLink, our solution for interconnecting today’s and future multi-cloud applications.
Efficient Hashing of Sparse Virtual Disks
Nir Soffer, IBM; Erez Waisbard, CyberArk
Verifying the integrity of a file is a fundamental operation in file transfer. Common tools compute a short hash value that is sent along with the file, but computing this value requires going over the entire file and if the file is huge, then this process is slow. We introduce `blkhash` – a novel hash algorithm optimized for disk images, that is *up to 4 orders of magnitude faster* than commonly used tools. We implemented a new command line tool and library that can be used in the virtualization space for verifying storage management operations. Our approach can significantly contribute to use cases such as: (1) Very fast computing of virtual disk hash value in software defined storage, (2) Verifying an entire disk image content as part of a supply chain integrity verification or in the context of confidential computing.
Towards Less Operating System Noise: An Approach with Data Processing Unit
Jun Kato, Koki Kusunoki and Mitsuru Sato, Fujitsu Limited
High performance computing (HPC) cloud, which is a fusion of HPC and cloud, is receiving much attention due to growing demand for large-scale computing. The HPC cloud is targeted for running HPC-derived applications, but it requires solving an HPC-specific challenge, namely operating system (OS) noise. Herein, we focus on the OS noise caused by asynchronous I/O processing and propose offloading it to data processing unit (DPU).
Analyzing large-scale genomic data with cloud data lakes
Grisha Weintraub, Ben Gurion University and IBM, Noam Hadar, Ehud Gudes, Shlomi Dolev and Ohad Birk, Ben Gurion University
In recent years there is huge influx of genomic data and a growing need for its analysis, yet existing genomic databases do not allow easy accessibility. We developed a pipeline that continuously pre-processes raw human genetic data. The data is then stored in a cloud data lake and can be accessed via a simple and intuitive web service and API.
Smart Network Observability – Connection Tracking
Ronen Schaffer, Eran Raichstein and Kalman Meth, IBM Research; Joel Takvorian and Julien Pinsonneau, Red Hat
Flow Logs Pipeline (a.k.a. FLP) is an observability tool that consumes flow logs from various inputs, transforms them and exports logs to Loki and / or time series metrics to Prometheus. While flow logs encompass a lot of valuable data, observing the network from the resolution of flow logs is often too low. In many cases, we are interested in observing it from a higher resolution, the resolution of connections. In this work, we introduce a new processing stage in FLP that allows aggregating flow logs from the same connection – connection tracking
On Latency Awareness with Delayed Hits
Gil Einziger, Nadav Keren and Gabriel Scalosub, Ben Gurion University
We consider a new locality pattern in the form of burstiness to improve cache effectiveness in workflows where items are requested in possibly infrequent yet costly batches. Adding a cache that handles only bursty items to existing State-Of-The-Art algorithms shows a significant improvement in overall average time per query.
Daniel Cohen, Sarel Cohen and Dalit Naor, The Academic College of Tel Aviv-Yaffo; Daniel Waddington and Moshik Hershcovitch, IBM Research
Synchronization of replicated data and program state is an essential aspect of application fault-tolerance. Current solutions use virtual memory mapping to identify page writes and replicate them at the destination. This approach has limitations because the granularity is restricted to a minimum of 4KiB per page, which may result in more data being replicated. Motivated by the emerging CXL hardware, we expand on the work Waddington, et al. [SoCC 22] by evaluating popular compression algorithms on VM snapshot data at cache line granularity. We measure the compression ratio vs. the compression time and present our conclusions.
Self-Adjusting Cache Advertisement and Selection
Itamar Cohen, Ariel University of the Samaria, Israel
Neural Networks for Computer Systems
Alon Rashelbach, Ori Rottenstreich and Mark Silberstein, Technion
SwitchVM: Multi-Tenancy for In-Network Computing
Sajy Khashab and Mark Silberstein, Technion
Benefits of Encryption at the Storage Client
Or Ozeri, Danny Harnik and Effi Ofer, IBM Research
Client side encryption is a setting in which storage I/O is encrypted at the client machine before being sent out to a storage system. This is typically done by adding an encryption layer before the storage client or driver. We identify that in cases where some of the storage functions are performed at the client, it is beneficial to also integrate the encryption into the storage client. We implemented such an encryption layer into Ceph RBD — a popular open source distributed storage system. We explain some the main benefits of this approach: The ability to do layered encryption with different encryption keys per layer, the ability to support more complex storage encryption, and finally we observe that by integrating the encryption with the storage client we managed to achieve a nice performance boost.
Next-Generation Security Entity Linkage Harnessing the Power of Knowledge Graphs and Large Language
Daniel Alfasi and Tal Shapira, Reichman University; Anat Bremler-Barr, Tel-Aviv University
With the continuous increase in reported Common Vulnerabilities and Exposures (CVEs), security teams are overwhelmed by vast amounts of data, which are often analyzed manually, leading to a slow and inefficient process. To address cybersecurity threats effectively, it is essential to establish connections across multiple security entity databases, including CVEs, Common Weakness Enumeration (CWEs), and Common Attack Pattern Enumeration and Classification (CAPECs). In this study, we introduce a new approach that leverages the RotatE [4] knowledge graph embedding model, initialized with embeddings from Ada language model developed by OpenAI [3]. Additionally, we extend this approach by initializing the embeddings for the relations.
Fuzzing LibraryOses for Iago vulnerabilities
Leonid Dyachkov, Meni Orenbach, NVIDIA; Mark Silberstein, Technion
Speeding up reconstruction of declustered RAID with special mapping
Svetlana Lazareva and Grigory Petrunin, Xinnor
It is known that ZFS dRAID [2] provide random data blocks permutation and reconstruction speed up is getting its boost with this initial condition. The question we tried to answer is if there some special permutation that would optimize reconstruction speed at theoretical maximum. We introduce our solution with usage of cyclic matrices of data layout as currently the best found way to get maximum benefit out from initial declustered RAID configuration.
RAM buffering for performance improvement of sequential write workload
Svetlana Lazareva and Grigory Petrunin, Xinnor
This paper presents on-line algorithm that determines further datapath for incoming requests – should they temporarily stay in RAM buffers for future merge operation or should they be written to disks immediately. With workload analysis in real time the delay time spent in RAM buffer is a self-tuned parameter. This approach could  increase sequential write requests latency but sufficiently raises the overall performance of sequential write workloads without use of expensive non-volatile cache.
Development of hybrid storage system based on Open CAS technology, optimized for HPC workload
Svetlana Lazareva and Ivan Petorv, Xinnor
HPC runs in a distributed structure with a single shared pool of data. In our case the distributed structure is Lustre file system, single shared pool of data is our declustered HDD RAID. To increase performance, it is suggested to use Open CAS technology as a cache on RAM/NVDIMM with special parameters, optimized for heavy data-intensive sequential HPC workload and online -algorithm which reduces the number of RMW operations, by merging sequential requests into one full-stripe.
When SkyPilot meets Kubernetes
Gil Vernik, Ronen Kat, Omer Joshua Cohen, IBM Research; Zongheng Yang, UC Berkeley
The Sky vision aims to open a new era in cloud computing. Sky abstracts clouds and dynamically use multiple clouds to optimize workload execution. This enables users to focus on their business logic, rather than interact with multiple clouds, and manually optimize performance and costs. SkyPilot is a novel framework for Sky computing for easily and cost effectively running ML workloads on any cloud. We discuss the importance of integration between SkyPilot and Kubernetes
Reducing The Virtual Memory Overhead in Nested Virtualization
Ori Ben Zur, Shai Bergman, Mark Silberstein (Technion – Israel Institute of Technology)
Virtualization has become a critical aspect of modern computing, and with the advent of virtualization-based containers, fast nested virtualization has become increasingly important.
Nested virtualization is implemented by emulating virtualization capabilities to the guest host which can result in significant overhead.
Another source of overheads in virtualization stems from the address translation mechanisms employed to implement virtualization, which usually causes a mix of slower address translation, frequently trapping guests, and loss of granularity in page tables.
Our research focuses on using guest-managed physical memory with the use of per-VM memory tags for checking each VMs’ access permissions.