ISW Accepted Posters

We are delighted to announce the posters that have been accepted to the Israeli Systems Workshop 2024:

Coverage-based Caching in Cloud Data Lakes

Grisha Weintraub, Ehud Gudes and Shlomi Dolev Ben-Gurion University of the Negev

Cloud data lakes are a modern approach to handling large volumes of data. They separate the compute and storage layers, making them highly scalable and cost-effective. However, query performance in cloud data lakes could be faster, and various efforts have been made to enhance it in recent years. We introduce our approach to this problem, which is based on a novel caching technique where instead of caching actual data, we cache metadata called a coverage set.

Integrity Verification in Cloud Data Lakes

Grisha Weintraub, Leonid Rise, Eli Shemesh and Avraham Illouz IBM

Cloud data lakes support storage and querying at scale. However, traditional data integrity methods do not apply to them due to a different system model. We propose a novel completeness verification protocol based on a data lake partitioning scheme.

Cheap & Fast File-aaS for AI by Combining Scale-out Virtiofs, Block layouts and Delegations

Sagi Manole and Amit Golander Huawei

File service Supply & Demand forces are changing: on the demand side, AI has significantly increased single-tenant File-aaS performance requirements. On the supply side, economic forces are moving the clients from traditional servers to rack-scale computers. Rack-scale computers encourage heterogeneous compute and are equipped with DPUs (Data-Processing Unit, also known as Smart NIC). DPUs offer higher efficiency, but may be over utilized at times. For this reason we would like our software to be flexible and run on general purpose compute when DPU utilization is high.

Composite DNA Reconstruction

Nitai Kluger and Aviv Maayan Technion

Composite DNA letters were introduced in a recent paper. The research presented synthesis methods that exploit the built-in information redundancy of DNA data archiving. These methods involve using a logical enlarged alphabet, rather than pure DNA bases only. In this project, we propose a new approach for DNA reconstruction over composite DNA alphabets. Based on another recent paper “Hedges”, we implement a Hashed based Error Correcting tree Code (HECC) for encoding and decoding binary messages to and from composite alphabets. Our HECC is a generalization of the original Hedges, tailored to deal with multiple copies of DNA and over an alphabet of any size. It handles all three basic types of DNA errors: substitutions, insertions, and deletions, up to an error rate of 2% while synthesizing only 40 copies of the original strand. When handling substitutions only, our HECC succeeds in decoding up to an error rate of 10% while synthesizing only 20 copies of the original strand. Our HECC benefits from several novel optimizations to the encoding-decoding process, e.g. “Two-sided encoding decoding”. With this technique the algorithm managed to correct strands of up to two times the length it could beforehand, when applying the same error rate.

ARISE: AI Right Sizing Engine for AI workload configurations

Rachel Tzoref-Brill, Bruno Wassermann, Eran Raichstein, Dean Lorenz and Praibha Moogi IBM Research – Israel

Data scientists and platform engineers that maintain AI stacks are required to continuously run AI workloads as part of their roles. When executing any part of the AI pipeline, whether data preprocessing, training, finetuning or inference, a frequent question is how to optimally configure the environment to meet Service Level Objectives such as desired throughput, runtime deadlines, and avoid memory and CPU exhaustion. ARISE is a tool that enables making data-driven decisions about AI workload configuration questions. ARISE trains performance prediction machine-learning regression models on historical workloads and performance benchmark metadata, and then predicts the performance of future workloads based on their input metadata. The user can constrain the input metadata space to relevant options only (e.g., to specific large models and accelerator types). ARISE then performs multiple performance predictions to cover the input space defined by the user. The top configuration options that optimize the user’s objective are presented as configuration alternatives to choose from. ARISE can also be plugged into automated tools that can benefit from such predictions, e.g., for auto-scale and scheduling optimization. Initial evaluation of ARISE shows high prediction accuracy (on average 8% Mean Absolute Percentage Error) and interesting configuration trade-offs for finetuning and inference real-world workloads.

Observability Volume Management
Eran Raichstein, Kalman Meth, Seep Goel, Priyanka Naik, Kavya Govindarajan IBM Research

Observability Volume Management (OVM) presents a lightweight, automated processing system to help manage the large amounts of Observability data. The focus is on automating, analyzing, and making recommendations for volume management in multi-cloud, edge, and distributed systems.

Affordable Privacy and Ransomware Detection as part of Cloud Storage Services

Amit Golander, Muhammad Barham and David Segal Huawei

Privacy regulations and Ransomware attacks have created a need to cheaply scan the content of all stored data on regular basis. In this work we claim that to achieve the desired economics, the scan should be a native storage feature, which piggybacks on backup/DR to the Cloud, where it can use temporary compute resources. We propose a hybrid approach, in which the cheaper RegX method is used to traverse all data, while only a tiny subset is also scanned by the higher quality (and cost) NLP models. We propose 4 heuristics that further reduce the RegX scan cost by more than 5.5 times. Overall, across these three proposals, privacy and anti-ransomware RegX detection becomes 2-3 orders of magnitude cheaper and thus affordable for all data.

Random Access In DNA Storage

Saar Cohen, Shahar Trabelsi, Sarel Cohen and Dalit Naor

Data production is rapidly increasing, outpacing traditional storage methods, which also struggle with longevity. DNA storage, encoding digital information into DNA sequences, offers a promising solution but is currently limited by high costs and slow read/write times, and random-access methods.

Our research focuses on generating software creating a library of unique DNA primers, short sequences (about 20 nucleotides) that identify files within large DNA pools. These primers must be diverse and distinct. They must adhere to biological constraints called PCR primer constraints.

Our research is inspired by the article “Random Access in Large-Scale DNA Data Storage,” Random access in large-scale DNA data storage (Organick et al., Nature Biotechnology 2018). Our goal is to build upon its algorithms for generating primers and improve them. We design efficient software to generate these primers, and investigate its computational efficiency. We aim to maximize the number of primers that can be stored in a single test tube, while adhering to the necessary biological constraints. Additionally, this software will be used to investigate how the length of primers impacts the number of primers that can be accommodated in a single test tube.

Migration of Pinned Pages

Omer Daube and Nizan Kafman Raz Technion

Memory migration is essential for improving memory utilization and performance in modern computing systems, particularly for memory compaction, optimizing Non-Uniform Memory Access (NUMA) architectures, and remapping memory using huge-pages. However, OS subsystems and device drivers often hold direct references to memory buffers, making their migration impossible. This limitation arises because graceful handling of page-faults is challenging since device accesses are non-restartable and some kernel accesses occur in code that cannot be interrupted. Consequently, memory migration of large memory parts is often prevented and crucial memory optimizations are not possible.

To address this challenge, we introduce a novel approach to facilitate the migration of pinned pages using mutable kernel mappings and an atomic remapping scheme. This solution allows for seamless remapping of kernel and device references to memory, overcoming the current limitations that prevent memory migration in the vast majority of cases. To avoid data loss during migration, we introduce a new synchronization mechanism that ensures consistency using existing hardware. This method enables the migration of pinned pages without interrupting ongoing operations, introducing high latency, or requiring intrusive OS kernel changes. We implement our solution on Linux, demonstrating its feasibility and applicability to real-world systems.

Deduplication study for RAG

Danny Harnik, Effi Ofer, Michael Factor and Paula Ta-Shma IBM Research – Israel

Retrieval Augmented Generation (RAG) systems complement Large Language Models (LLMs) with a database of documents containing information relevant to queries. The LLM receives queries with their related documents and extract answers on topics it might not have been previously trained on. This study evaluates the impact of data chunk duplication on answer correctness and storage usage. We identify how and when duplication hurts the quality of answers and build a probabilistic model of answer correctness based on the distribution of duplication in the data and the likelihood of the correct, golden document, being selected. We use our model on several real-world RAG datasets to see how much deduplication would benefit them. And finally, we suggest an incremental deduplication implementation mechanism for integration with a RAG system.

ZipNN – A Lossless Compression library tailored for AI Models (Best poster)

Moshik Hershcovitch IBM Research & Tel Aviv University; Andrew Wood Boston University; Leshem Choshen IBM Research & MIT; Ilias Ennmouri IBM; Peter Chin Dartmouth college; Guy Girmonsky, Swaminathan Sundararaman and Danny Harnik IBM Research

With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate them. While there is vast literature about reducing model sizes with quantization or other techniques that fundamentally change the model at hand, we introduce a high performance compression library that is optimized for AI Models and is lossless – namely returns the original model after decompression.
Our compression method is called ZipNN, and it alters the compression of models according to their parameter type. More precisely, it observes the floating point structure of parameters in the model, re-arranges the data into different byte streams and deploys best compression technique for the specific byte stream. For BF16 models ( e.g. , Llama 3.1/ Mistral/ Granite) it achieves a 33% compression savings while for FP32 models it reaches more modest compression savings of 17%. The compression method can reach a throughput of above 1GB/s while the decompression reaches 2GB/s for a single thread. ZipNN outperforms standard compression methods on models both in terms of compression ratio and the speed of compression and decompression.

As ZipNN can store or transfer 50% more models with the same resources it is suitable for a variety of use-cases including reducing the storage and traffic burden of AI HUBs, reducing the storage required for checkpointing and quicker loading of models.

Dictionary Based Cache Line Compression

Daniel Cohen, Sarel Cohen and Dalit Naor ; Daniel Waddington and Moshik Hershcovitch IBM Research

Active-standby mechanisms for VM high-availability demand frequent synchronization of memory and CPU state, involving the identification and transfer of “dirty” memory pages to a standby target. Building upon the granularity offered by CXL-enabled memory devices, as discussed by Waddington et al. [SOCC’22], this paper proposes a dictionary-based compression method operating on 64-byte cache lines to minimize snapshot volume and synchronization latency. The method aims to transmit only necessary information required to reconstruct the memory state at the standby machine, augmented by byte grouping and cache-line partitioning techniques. We assess the compression benefits on memory access patterns across 20 benchmarks snapshots and compare our approach to standard off-the-shelf compression methods. Our findings reveal significant improvements across nearly all benchmarks, with some experiencing over a twofold enhancement compared to standard compression, while others show more moderate gains. We conduct an in-depth experimental analysis on the contribution of each method and examine the nature of the benchmarks. We ascertain that the repeating nature of cache lines across snapshots (caused by transient memory changes) and their concise representation contributes most to the size reduction, accounting for 92% of the gains. Our work paves the way for further reduction in the data transferred to standby machines, thereby enhancing VM high-availability and reducing synchronization latency.

MTASet: A Tree-based Set for Efficient Range Queries in Update-heavy Workloads

Daniel Manor, Mor Perry and Moshe Sulamy Academic College of Tel Aviv-Yaffo

 

In concurrent data structures, the efficiency of set operations can vary significantly depending on the workload characteristics.

Numerous concurrent set implementations are optimized and fine-tuned to excel in scenarios characterized by predominant read operations. However, they often perform poorly when confronted with workloads that heavily prioritize updates. Additionally, current leading-edge concurrent sets optimized for update-heavy tasks typically lack efficiency in handling atomic range queries.

This study introduces the MTASet, which leverages a concurrent (a,b)-tree implementation. Engineered to accommodate update-heavy workloads and facilitate atomic range queries, MTASet surpasses existing counterparts optimized for tasks in range query operations by up to 2x. Notably, MTASet ensures linearizability.

Open Forms Framework

Adir Nissan, Afek Nahum, Ido Tausi, Noam Bitton, Regev Avraham, Tomer Ben-Yamin and Yonatan Azarzar Technion

 

A cutting-edge framework designed to represent and manage dynamic forms through an extensible Abstract Form Representation (AFR).
Whether you’re building a simple survey or a complex multi-step process, our framework adapts to your needs.

// Sponsored by

// In cooperation with

// Platinum Sponsors

// Gold Sponsors

// Silver Sponsors