We are delighted to announce the posters that have been accepted to ACM SYSTOR’24.
Coverage-based Caching in Cloud Data Lakes
Grisha Weintraub, Ehud Gudes and Shlomi Dolev Ben-Gurion University of the Negev
Cloud data lakes are a modern approach to handling large volumes of data. They separate the compute and storage layers, making them highly scalable and cost-effective. However, query performance in cloud data lakes could be faster, and various efforts have been made to enhance it in recent years. We introduce our approach to this problem, which is based on a novel caching technique where instead of caching actual data, we cache metadata called a coverage set.
Integrity Verification in Cloud Data Lakes
Grisha Weintraub, Leonid Rise, Eli Shemesh and Avraham Illouz IBM
Cheap & Fast File-aaS for AI by Combining Scale-out Virtiofs, Block layouts and Delegations
Sagi Manole and Amit Golander Huawei
File service Supply & Demand forces are changing: on the demand side, AI has significantly increased single-tenant File-aaS performance requirements. On the supply side, economic forces are moving the clients from traditional servers to rack-scale computers. Rack-scale computers encourage heterogeneous compute and are equipped with DPUs (Data-Processing Unit, also known as Smart NIC). DPUs offer higher efficiency, but may be over utilized at times. For this reason we would like our software to be flexible and run on general purpose compute when DPU utilization is high.
Hybrid Cloud Connector: Offloading Integration Complexities
Ronen Kat, Doron Chen, Michael Factor, Chris Giblin, Avi Ziv and Aleksander Slominski IBM Research – Israel
Regulated enterprises often seek to extend their workloads into the cloud, but are impeded by integration concerns relating to security, governance and compliance. Further, enterprises running mission-critical applications face throughput and latency challenges due to cloud integration overheads. We present Hybrid Cloud Connector (HCC) to accelerate this integration. HCC offloads non-functional aspects from the application, reducing complexity for the developer, and centralizes administration via a policy-driven control point, simplifying the operator’s job.
ARISE: AI Right Sizing Engine for AI workload configurations
Rachel Tzoref-Brill, Bruno Wassermann, Eran Raichstein and Dean Lorentz IBM Research – Israel
Data scientists and platform engineers that maintain AI stacks are required to continuously run AI workloads as part of their roles. When executing any part of the AI pipeline, whether data preprocessing, training, finetuning or inference, a frequent question is how to optimally configure the environment to meet Service Level Objectives such as desired throughput, runtime deadlines, and avoid memory and CPU exhaustion. ARISE is a tool that enables making data-driven decisions about AI workload configuration questions. ARISE trains performance prediction machine-learning regression models on historical workloads and performance benchmark metadata, and then predicts the performance of future workloads based on their input metadata. The user can constrain the input metadata space to relevant options only (e.g., to specific large models and accelerator types). ARISE then performs multiple performance predictions to cover the input space defined by the user. The top configuration options that optimize the user’s objective are presented as configuration alternatives to choose from. ARISE can also be plugged into automated tools that can benefit from such predictions, e.g., for auto-scale and scheduling optimization. Initial evaluation of ARISE shows high prediction accuracy (on average 8% Mean Absolute Percentage Error) and interesting configuration trade-offs for finetuning and inference real-world workloads.
Observability Volume Management
Eran Raichstein, Kalman Meth, Seep Goel, Priyanka Naik, Kavya Govindarajan IBM Research