Estimators reduce the memory footprint of maintaining network
statistics, while keeping the estimation error of each flow proportional to its size. This is unlike sketches and other approximate
algorithms that only guarantee an error proportional to the entire stream size. In this work we present the CELL algorithm that
combines estimators with efficient flow representation to obtain
superior memory reduction compared to the state of the art.
We also extend CELL to the sliding window model, which priorities recent data over old one, by presenting two variants named
RAND-CELL and SHIFT-CELL.
The management of large data centre (DC) network infrastructure confronts Network Reliability Engineers (NRE) with
challenges. A single DC at a modern cloud services provider
can host thousands of network devices. The syslog messages
generated by these devices are an important type of monitoring data to detect and diagnose failures. Devices in a single
DC produce millions of syslog messages per day in a variety
of formats.
We present an alternative approach developed over the
last few years with the NREs working on IBM Cloud’s networks. DeCorus-NSA assists NREs in three ways. First, it
detects incidents without the need to specify rules manually. Second, it groups large numbers of individual alerts
into a smaller number of higher-level incidents. And finally,
DeCorus-NSA supports NREs with root cause analysis by
extracting additional context.
Cloud data lakes are a modern approach for storing large
amounts of data in a convenient and inexpensive way. The
main idea is the separation of compute and storage layers.
However, to perform analytics on the data in this architecture, the data should be moved from the storage layer to the
compute layer over the network for each calculation. Obviously, that hurts calculation performance and requires huge
network bandwidth. We are exploring different approaches
for adding indexing to the cloud data lakes with the goal of
reducing the amounts of data read from the storage, and as
a result, improving query execution time.
HPC Application Optimisation in SODALITE
Kalman Meth (IBM Research - Haifa); Alfio Lazzaro and Nina Mujkanovic (HPE HPC/AI EMEA Research Lab); Maria Carbonell (ATOS); Dragan Radolovic, Daniel Vladusic, and Joao Pita Costa (XLAB); Elisabetta Di Nitto (Politecnico di Milano)
We propose to tackle the complexity of deploying and operating modern applications onto heterogeneous HPC and
cloud-based systems by providing application developers
and infrastructure operators with tools to abstract their application and infrastructure requirements.
We propose enabling continuous performance optimisation
of distributed hybrid applications in heterogeneous cloud,
Edge, and HPC environments by employing an intelligent
re-deployment feedback loop.
Persistent Memory (PM) is a new device which provides
faster access than conventional storage devices, such as SSDs.
Among several methods prepared for accessing files on PM,
a combination of filesystem direct access (DAX) and mmap()
is used to take advantage of its native abilities. We can avoid
buffer cache and access PM with byte granularity.
Manycore systems enable massive parallel I/O in a single
server due to the number of cores. Among file I/O operations
in a file system, C. Lee et al. [1] applied range lock in F2FS for
parallel data I/O, and showed scalable performance. However,
little research has been done on metadata I/O scalability.
To investigate this, we analyzed unlink() with related data
structures in F2FS. File metadata in F2FS (inode) is called
Node and identified via nid. Nodes are stored in an on-disk
structure, called Node Address Table (NAT), which is cached
in memory with a pool of free nids. F2FS keeps a certain
number of free nids for fast nid allocations during create().
Every time unlink() is called, the number of free nids in the
pool is checked. If it is not sufficient, a Free nid Scan function
is performed to secure sufficient free nids.
We evaluated the I/O performance when multiple threads
call unlink() in F2FS in a manycore system, and it shows no
performance scalability. From the analysis above, we identified that a large critical section(CS) in Free nid Scan by a
mutex lock is the leading cause of the scalability bottleneck.
Container frameworks have been gaining popularity in recent years, with container native storage being one of the
fastest growing segment. According to IDC report [1], 90%
of applications on cloud platforms and over 95% of new microservices are being deployed in containers. The growth of
container native storage is largely driven by stateful applications [2, 3], the mainstay of enterprise IT environments.
As organizations are increasingly adopting containerized
deployments, they must also deal with data protection to
maintain business continuity
Ransomware is software that uses encryption to disable access to
data until a ransom is paid and such attacks have increased steeply
in recent times. The best current practice to minimize the impact
of ransomware attacks include periodic backups and airgapped
immutable copies. However, undetected attacks can corrupt data
before backups, making backups unusable. Detecting ransomware
attacks quickly and flagging the damaged content enables fast recovery and business continuity. We present some features of our
ransomware attack detection algorithms prototyped and run on a
sandboxed but realistic environment that successfully detected the
live ransomware attacks from open source repositories.
The ability for medical professionals to efficiently process
vast amounts of data is critical. We present a solution which
offers cloud-secure analytics on healthcare data, utilizing
FHIR, the latest standard from the HL7 organization, for the
exchange of health care data, together with Apache Parquet
Modular Encryption.