Accepted Posters

We are delighted to announce the posters that have been accepted to ACM SYSTOR’24.

Coverage-based Caching in Cloud Data Lakes

Grisha Weintraub, Ehud Gudes and Shlomi Dolev Ben-Gurion University of the Negev

Abstract

Cloud data lakes are a modern approach to handling large volumes of data. They separate the compute and storage layers, making them highly scalable and cost-effective. However, query performance in cloud data lakes could be faster, and various efforts have been made to enhance it in recent years. We introduce our approach to this problem, which is based on a novel caching technique where instead of caching actual data, we cache metadata called a coverage set.

Poster

Integrity Verification in Cloud Data Lakes

Grisha Weintraub, Leonid Rise, Eli Shemesh and Avraham Illouz IBM

Abstract

Cloud data lakes support storage and querying at scale. However, traditional data integrity methods do not apply to them due to a different system model. We propose a novel completeness verification protocol based on a data lake partitioning scheme.

Poster

Cheap & Fast File-aaS for AI by Combining Scale-out Virtiofs, Block layouts and Delegations

Sagi Manole and Amit Golander Huawei

Abstract

File service Supply & Demand forces are changing: on the demand side, AI has significantly increased single-tenant File-aaS performance requirements. On the supply side, economic forces are moving the clients from traditional servers to rack-scale computers. Rack-scale computers encourage heterogeneous compute and are equipped with DPUs (Data-Processing Unit, also known as Smart NIC). DPUs offer higher efficiency, but may be over utilized at times. For this reason we would like our software to be flexible and run on general purpose compute when DPU utilization is high.

Poster

Hybrid Cloud Connector: Offloading Integration Complexities

Ronen Kat, Doron Chen, Michael Factor, Chris Giblin, Avi Ziv and Aleksander Slominski IBM Research – Israel

Abstract

Regulated enterprises often seek to extend their workloads into the cloud, but are impeded by integration concerns relating to security, governance and compliance. Further, enterprises running mission-critical applications face throughput and latency challenges due to cloud integration overheads. We present Hybrid Cloud Connector (HCC) to accelerate this integration. HCC offloads non-functional aspects from the application, reducing complexity for the developer, and centralizes administration via a policy-driven control point, simplifying the operator’s job.

Poster

ARISE: AI Right Sizing Engine for AI workload configurations

Rachel Tzoref-Brill, Bruno Wassermann, Eran Raichstein and Dean Lorentz IBM Research – Israel

Abstract

Data scientists and platform engineers that maintain AI stacks are required to continuously run AI workloads as part of their roles. When executing any part of the AI pipeline, whether data preprocessing, training, finetuning or inference, a frequent question is how to optimally configure the environment to meet Service Level Objectives such as desired throughput, runtime deadlines, and avoid memory and CPU exhaustion. ARISE is a tool that enables making data-driven decisions about AI workload configuration questions. ARISE trains performance prediction machine-learning regression models on historical workloads and performance benchmark metadata, and then predicts the performance of future workloads based on their input metadata. The user can constrain the input metadata space to relevant options only (e.g., to specific large models and accelerator types). ARISE then performs multiple performance predictions to cover the input space defined by the user. The top configuration options that optimize the user’s objective are presented as configuration alternatives to choose from. ARISE can also be plugged into automated tools that can benefit from such predictions, e.g., for auto-scale and scheduling optimization. Initial evaluation of ARISE shows high prediction accuracy (on average 8% Mean Absolute Percentage Error) and interesting configuration trade-offs for finetuning and inference real-world workloads.

Poster

Observability Volume Management
Eran Raichstein, Kalman Meth, Seep Goel, Priyanka Naik, Kavya Govindarajan IBM Research

Abstract

Observability Volume Management (OVM) presents a lightweight, automated processing system to help manage the large amounts of Observability data. The focus is on automating, analyzing, and making recommendations for volume management in multi-cloud, edge, and distributed systems.