Tech Research Archives | Weebit

Enabling ‘Few-Shot Learning’AI with ReRAM

Mario Pallo — Thu, 19 Jun 2025 08:24:25 +0000

AI training happens in the cloud because it’s compute-intensive and highly parallel. It requires massive datasets, specialized hardware, and weeks of runtime. Inference, by contrast, is the deployment phase — smaller, faster, and often done at the edge, in real time. The cloud handles the heavy lifting; the edge delivers the result. Now, recent advances in resistive memory technology are making edge AI inferencing more energy-efficient, secure, and responsive.

At the 2025 IEEE Symposium on VLSI Technology and Circuits, researchers from CEA-Leti, Weebit Nano, and the Université Paris-Saclay presented a breakthrough in “on-chip customized learning” — demonstrating how a ReRAM-based platform can support few-shot learning using just five training updates.

Few-shot learning (FSL) is an approach where AI models learn new tasks with only a handful of examples. It is very useful for edge applications, where devices must adapt to specific users or environments and can’t rely on large, labeled datasets.

The team didn’t just train a model — they showed that a memory-embedded chip could adapt in real-time, at the edge, without requiring cloud access, long training cycles, or power-hungry hardware. The core enabler is a combination of Model-Agnostic Meta-Learning (MAML) and multi-level Resistive RAM (ReRAM or RRAM).

MAML provides a clever workaround that can enable learning in power-constrained edge devices. Instead of training from scratch, it trains a model to learn. During an off-chip phase, the system builds a general-purpose model by exposing it to many tasks. This “learned initialization” is then deployed to edge devices, where it can quickly adapt to new tasks with minimal effort.

This means:

No need for the cloud – minimizing bandwidth and latency
Minimal data required – minimizing compute requirements at the edge
Massive time and energy savings

Executing this on edge hardware requires memory technology that can keep up — and that’s where ReRAM comes in.

Because ReRAM is a non-volatile memory that supports analog programming, it is ideal for low-power and in-memory compute architectures. ReRAM can store information as varying conductance states, which can then represent the weights (numerical values that represent the strength or importance of connections between neurons or nodes in a model) in neural networks.

However, ReRAM also comes with challenges — notably variability and some limits on write endurance. Few-shot learning helps overcome both.

Reducing Write Cycles with MAML

In terms of endurance, the key is in leveraging MAML, which enabled the research team to reduce the number of required write operations by orders of magnitude. Instead of millions of updates, they showed that just five updates — each consisting of a handful of conductance tweaks — were enough to adapt to a new task.

For the experiments, a chip fabricated on 130nm CMOS was used which has multi-level Weebit ReRAM integrated in the back end of line (BEOL). The network architecture had four fixed convolutional layers and two trainable fully-connected (FC) layers. Weights in the FC layers were encoded using pairs of ReRAM cells, storing the difference in conductance between them.

Training was carried out using a “computer-in-the-loop” setup, where the system calculated gradients and issued write commands directly to the ReRAM crossbars. In a full deployment, this would be managed by a co-integrated ASIC.

The learning task? Character recognition from the Omniglot dataset, a popular benchmark in FSL. The chip was pre-loaded with the MAML-trained parameters and fine-tuned on-device to recognize new characters using only five gradient updates.

The result:

Starting at 20% accuracy (random guess)
Reaching over 97% accuracy after five updates
Energy use of less than 10 μJ for a 2kbit array

For an optical character recognition (OCR) application using AI with a 2Kbit array, energy consumption of less than 10 μJ represents excellent energy efficiency compared to typical industry benchmarks. This level of power consumption places such a system in the ultra-low-power category suitable for edge AI applications and battery-powered devices.

Programming Strategies to Mitigate Against Drift

In ReRAM conductance levels can drift over time, and adjacent states may overlap, introducing noise. To tackle this, the team tested multiple programming strategies:

Single-shot Set: Simple, fast, but inaccurate
Iterative Set: More precise, but slower
Iterative Reset: Useful for low conductance states
Hybrid strategy: A blend of both, offering the best balance

The hybrid strategy proved most effective, reducing variability and improving long-term retention. After a 12-hour bake at 150°C (equivalent to 10 years at 75°C) the system still maintained over 90% of its accuracy.

This is critical for commercial deployment, where temperature fluctuations and data longevity are real-world concerns.

Looking Ahead

This research points to a compelling future for AI at the edge:

Learn locally: Devices can customize their behavior to individual users
Stay secure: No data needs to be sent to the cloud
Save time and energy: Minimal training and in-memory compute keep power low
Scale affordably: Meta-training can be centralized and shared across devices

And because the platform uses ReRAM, the entire system benefits from ultra-low standby power and reduced silicon area.

This work is more than a proof of concept, it’s a signpost. As more AI applications move to the edge, we’ll need memory technologies that support not just inference, but real learning. ReRAM is emerging as one of the few candidates that can deliver on that vision, especially when paired with smart algorithms like MAML.

View the presentation, “On Chip Customized Learning on Resistive Memory Technology for Secure Edge AI” from the 2025 IEEE Symposium on VLSI Technology and Circuits here.

The post Enabling ‘Few-Shot Learning’
AI with ReRAM appeared first on Weebit.

Relaxation-Aware Programming in ReRAM:Evaluating and Optimizing Write Termination

Marcelo Cueto — Wed, 28 May 2025 12:39:59 +0000

Resistive RAM (ReRAM or RRAM) is the strongest candidate for next-generation non-volatile memory (NVM), combining fast switching speeds with low power consumption. New techniques for managing a memory phenomenon called ‘relaxation’ are making ReRAM more predictable — and easier to specify for real-world applications.

What is the relaxation problem in memory? Short-term conductance drift – known as ‘relaxation’ – presents a challenge for memory stability, especially in neuromorphic computing and multi-bit storage.

At the 2025 International Memory Workshop (IMW), a team from CEA-Leti, CEA-List and Weebit presented a poster session, “Relaxation-Aware Programming in RRAM: Evaluating and Optimizing Write Termination.” The team reported that Write Termination (WT), a widely used energy-saving technique, can make these relaxation effects worse.

So what can be done? Our team proposed a solution: a modest programming voltage overdrive that curbs drift without sacrificing the efficiency advantages of the WT technique.

Energy Savings Versus Stability

Write Termination improves programming efficiency by halting the SET (write) operation once the target current is reached, instead of using a fixed-duration pulse. This reduces both energy use and access times, supporting better endurance across ReRAM arrays.

It’s desirable, but problematic in action.

Tests on a 128kb ReRAM macro showed that unmodified WT increases conductance drift by about 50% compared to constant-duration programming.

In these tests, temperature amplified the effect: at 125°C, the memory window narrowed by 76% under WT, compared to a fixed SET pulse. Even at room temperature, degradation reached 31%.

Such drift risks destabilizing systems that depend on tight resistance margins, including neuromorphic processors and multi-level cell (MLC) storage schemes, where minor shifts can translate into computation errors or data loss.

The experiments used a testchip fabricated on 130nm CMOS, integrating the ReRAM array with a RISC-V subsystem for fine-grained programming control and data capture.

Conductance relaxation was tracked from microseconds to over 10,000 seconds post-programming. A high-speed embedded SRAM buffered short-term readouts, allowing detailed monitoring from 1µs to 1 second, while longer-term behavior was captured with staggered reads.

This statistically robust setup enabled precise analysis of both early and late-stage relaxation dynamics.

To measure stability, the researchers used a metric called the three-sigma memory window (MW₃σ). It looks at how tightly the memory cells hold their high and low resistance states, while ignoring extreme outliers.

When this window gets narrower, the difference between a “0” and a “1” becomes harder to detect — making it easier for errors to creep in during reads.

By focusing on MW₃σ, the team wasn’t just looking at averages — they were measuring how reliably the memory performs under real-world conditions, where even small variations can cause problems.

Addressing Relaxation with Voltage Overdrive

Voltage overdrive is the practice of applying a slightly higher voltage than the minimum required to trigger a specific operation in a memory cell — in this case, the SET operation in ReRAM.

Write Termination cuts the SET pulse short as soon as the target current is reached. That saves energy, but it also means some memory cells are just barely SET. They’re fragile — sitting near the edge of their intended resistance range. That’s where relaxation drift kicks in: over time, conductance slips back toward its original state.

So, the team asked a logical question:

“What if we give the cell just a bit more voltage — enough to push it more firmly into its new state, but not so much that we burn energy or damage endurance?”

Instead of discarding WT, the team increased the SET voltage by 0.2 Arbitrary Units (AU) above the minimum requirement.

Key results:

Relaxation dropped to levels comparable to constant-duration programming
Memory windows remained stable at both room and elevated temperatures
WT’s energy efficiency was mostly preserved, with only a ~20% increase in energy compared to unmodified WT

Modeling predicted that without overdrive, 50% of the array would show significant drift within a day. With overdrive, the same drift level would take more than 10 years, a timescale sufficient for most embedded and computing applications.

Balancing Energy and Stability

The modest voltage increases restored conductance stability without negating WT’s energy and speed benefits. Although the overdrive added some energy overhead, overall consumption remained lower than that of fixed-duration programming.

This adjustment offers a practical balance between robustness and efficiency, critical for commercial deployment.

As ReRAM moves toward wider adoption and is a prime candidate for use in neuromorphic and multi-bit storage applications, conductance drift will become a defining challenge.

The results presented at IMW 2025 show that simple device-level optimizations like voltage overdrive can deliver major gains without requiring disruptive architectural changes.

Check out more details of the research here.

The post Relaxation-Aware Programming in ReRAM:
Evaluating and Optimizing Write Termination appeared first on Weebit.

A Complete No-Brainer:ReRAM for Neuromorphic Computing

Giuseppe Piccolboni — Wed, 05 Jun 2024 07:05:54 +0000

In the last 60 years technology has evolved at such an exponentially fast rate that we are now regularly conversing with AI based chatbots, and that same OpenAI technology has been put into a humanoid robot. It’s truly amazing to see this rapid development.

Above: OpenAI technology in a humanoid robot

Continued advancement of AI development faces numerous challenges. One of these is computing architecture. Since it was first described in 1945, the von Neumann architecture has been the foundation for most computing. In this architecture, instructions and data are stored together in memory and communicate via a shared bus to the CPU. This has enabled many decades of continuous technological advancement.

However, there are bottlenecks created by such an architecture, in terms of bandwidth, latency, power consumption, and security, to name a few. For continued AI development, we can’t just make brute force adjustments to this architecture. What’s needed is an evolution to a new computing paradigm that bypasses the bottlenecks inherent in the traditional von Neumann architecture and more precisely mimics the system is trying to imitate: the human brain.

To achieve this, memory must be closer to the compute engine for better efficiency and power consumption. Even better, computation should be done directly within the memory itself. This paradigm change requires new technology, and ReRAM (or RRAM) is among the most promising candidates for future in-memory computing architectures.

Roadmap for ReRAM in AI

Given its long list of advantages, ReRAM can be used in a broad range of applications ranging from mixed signal and power management to IoT, automotive, industrial, and many other areas. We generally see ReRAM rolling out in AI applications over time in different ways. For AI related applications, relevant advantages of ReRAM include its cost efficiency, ultra-low power consumption, scaling capabilities, small footprint and fit into a long-term roadmap to advanced neuromorphic computing.

The shortest-term opportunity for ReRAM is as an embedded memory (10-100 Mb) for edge AI applications. The idea is to bring the NVM closer to the compute engine, therefore massively reducing power consumption. This opportunity can be realized today using ReRAM for synaptic weight storage, replacing the use of external flash and eliminating some of the local SRAM or DRAM. My colleague Gideon Intrater will present on this topic on Monday, June 24^th at the Design Automation Conference 2024. If you are planning to attend, please attend his presentation as part of the session, ‘Cherished Memories – Exploring the Power of Innovative Memory Architectures for AI applications.’

In the mid-term, ReRAM is a great candidate for in-memory computing where analog behavior is required. In this methodology, ReRAM is used for both computation and weight storage – at first in binary (storing two values per bit) and then moving to multi-level operations (multiple values per bit). An example of in-memory computing was proposed in 2022 using arrays based on Weebit ReRAM as Content Addressable Memories. This work, done in collaboration with the Department of Electrical Engineering, Indian Institute of Technology Delhi, is highlighted in the article, ‘In-Memory Computing for AI Similarity Search using Weebit ReRAM,’ by Amir Regev.

My colleague Amir Regev also recently wrote an article, ‘Towards Processing In-Memory,’ which explains more about the idea of in-memory computing with Weebit ReRAM, based on work done with the Department of Electrical Engineering at the Technion Israel Institute of Technology and CEA-Leti.

Above: A roadmap for ReRAM in AI – short-term, mid-term and long-term

In the longer term, neuromorphic computing comes into play. In the brain, synapses provide the connections between neurons, and they can change their strength and connectivity over time in response to patterns of neural activity. Likewise, ReRAM arrays can be used to create artificial synapses in a neural network which change their strength and connectivity over time in response to patterns of input. This allows them to learn and adapt to new information, just like biological synapses.

Areas of particular interest include Bayesian architectures and meta learning. Bayesian neural networks hold great potential for the development of AI, particularly where decision-making under uncertainty is critical. These networks actually quantify uncertainty, so such methods can help AI models avoid overconfidence in their predictions, potentially leading to more reliable, safer AI systems. The characteristics of ReRAM make it an ideal solution for these networks.

The aim of meta learning is to create models that can generalize well to new tasks by leveraging prior experience. As they ‘learn to learn,’ they continuously update their beliefs based on new data without needing to re-train from scratch, making them more adaptable and flexible than today’s methods. The idea is to develop a standalone system capable of learning, adapting and acting locally at the edge. A model would be trained on a server and then optimized parameters would be saved on the chip at the edge. The edge system would then be able to learn new tasks by itself – like humans and other animals.

Compared to current machine learning where models are trained for specific tasks with fixed algorithm on a huge dataset, there are numerous advantages of this concept, including:

Data is stored locally on the chip and not in the cloud so there is greater security, much faster reaction and lower power consumption
Computation is done in-situ so there is no need to transfer data from memory to the computation unit
The system could adapt to very different real world situations since it would imitate human learning ability

A recent joint paper from Politecnico di Milano, Weebit and CEA-Leti proposed a bio-inspired neural network capable of learning using Weebit ReRAM. The focus is on building a bio-inspired system that requires hardware with plasticity, in other words the ability to adjust its state based on specific inputs and rules, as in the case of biological synapses. You can read about this work in an article by Alessandro Bricalli, ‘AI Reinforcement Learning with Weebit ReRAM.’

This is the future of ReRAM in AI, and I can’t wait!

Overcoming hurdles

Like all memory technologies, ReRAM has both pros and cons for neuromorphic applications. On the ‘pros’ side, this includes its non-volatility, ability to scale to smaller nodes, low power consumption and the ability to have multi-level operation.

The ‘cons’ are largely due to phenomena such as limited precision of the programming conductance. ReRAM technologies are also subject to some resistance drift while cycling. Other phenomena, such as relaxation (linked to both time and temperature), can impact resistance values over time.

As we look towards using ReRAM for neuromorphic computing, we won’t let such resistance variability hold us back. There are not only ways to mitigate such factors, but also ways in which these ‘cons’ can be taken advantage of in certain neuromorphic bio-inspired circuits.

Mitigating resistance variability

One of the main ways we can mitigate resistance variability is by using Program and Verify (P&V) algorithms. The idea is quite simple: whenever a cell doesn’t satisfy a given criterion in some way, we can reprogram it and then re-verify its resistance state. Such methods allow us to fine-tune resistance levels in a given range to attain more than just the levels of low-resistance state (LRS) and high-resistance state (HRS).

We can do this in multiple ways. One way is to use a gradual method, in which we repeat the same operation over and over until a cell satisfies the condition imposed (or the maximum number of allowed repetitions has been completed). This method can be incremental, in which case the programming control parameter increases at each repetition, or cumulative, in which case the parameter is kept constant each time.

There are numerous knobs we can control, including the programming direction and level of the control parameter. The total number of P&V cycles, as well as what happens before the verify itself, can vary depending on the goal we want to achieve – whether it’s improving retention, resilience or endurance, or achieving other goals.

The Ielmini Group at the Politecnico di Milano has proposed numerous state-of-the-art algorithms which can help with further tuning. One of these is called ISPVA, in which the gate voltage of the transistor is kept constant, therefore fixing the compliance current, while the top electrode voltage is increased until the desired conductance is attained. Conversely, in the IGVVA approach, the top electrode voltage is kept constant (high enough to grant a successful set operation), while the gate voltage is increased to gradually increase the compliance current.

Variability of the programmed levels is a key parameter in in-memory computing and hardware implementation of deep neural networks. Therefore, it’s important to use algorithms that not only achieve the right level of electrical conductance but also make sure this conductance is consistent across multiple attempts. There are many other P&V algorithms we can employ, for example to reach a more stable conductive filament, reduce post programming fluctuations, or achieve another goal.

It’s important to note that P&V algorithms are not the only tools available to mitigate ReRAM variability. For instance, pulse shape can play an important role in reducing variability and therefore improving neural network accuracy. Some industry work has shown that compared to regular square pulses, triangular pulses reduce the number of oxygen vacancies after set operation, therefore improving conductive filament stability. Triangular pulses have also been shown to be effective in improving the resistance state after the reset operation.

Above: Triangular pulse shape reduces the Vo after set operation, therefore improving conductive filament stability (Y. Feng et al., EDL 2021)

Taking advantage of ReRAM’s ‘cons’ for neuromorphic computing

In a neural network, we would like synapses to have a linear and symmetric response, a large number of analog states, a high on/off ratio, high endurance and no variability. ReRAM has intrinsic variabilities, and we can at least partly mitigate such non-idealities. For neural networks, we can also use them to our advantage!

One example is in a Bayesian neural network where device variability is actually key to its implementation: the natural differences from one device to another are crucial for how it works. For instance, differences in how memory conducts electricity with each use can actually help by providing randomness, which is useful for generating numbers or for algorithms in AI that need randomness, like Bayesian reasoning.

In Bayesian methods, you don’t just get one answer from a given input; instead, you get a distribution of possible answers. The natural variation in ReRAM can be used to create this distribution. This variation is like having physical random numbers that can help perform calculations directly within the memory. This makes it possible to do complex multiplications right where the data is stored. In addition, Bayesian neural networks are resilient to device-to-device variability and system aging.

Summary

ReRAM is a good match for neuromorphic applications due to its cost-efficiency, ultra-low power consumption, scaling advantage at 28nm and below, small footprint to store very large arrays, analog behavior and ease of fabrication in the back end of the line. The conductance of ReRAM can also be easily modulated by controlling a few electrical parameters.

We can mitigate the ‘cons’ of ReRAM to make it shine in edge AI and in-memory computing applications in the short- and mid-term, respectively. In the long term, the similarity of ReRAM cells to synapses in the brain make it a great fit for neuromorphic computing. As we look to these applications for new applications, such as Bayesian neural networks, the ‘cons’ of ReRAM can not only be mitigated, but can even provide advantages.

I recently presented a tutorial at the International Memory Workshop in Seoul, during which I discussed the requirements of new neuromorphic circuits, why ReRAM is an ideal fit for such applications, existing challenges and possible solutions to improve ReRAM-based neural networks.
Please click here to view the presentation.

The post A Complete No-Brainer:
ReRAM for Neuromorphic Computing appeared first on Weebit.

Towards Processing In-Memory

Amir Regev — Thu, 14 Dec 2023 13:01:29 +0000

One of the most exciting things about the future of computing is the ability to process data inside of the memory. This is especially true since the industry has reached the end of Moore’s Law, and scientists and engineers are focused on finding efficient new architectures to overcome the limitations of modern computing systems. Recent advancements in areas such as generative AI are adding even greater pressure to find such solutions.

Most modern computing systems are based on the von Neumann computing architecture. There is a bottleneck that arises in such systems due to the separation of the processing unit and the memory. In the traditional Von-Neuman architecture, 95% of the energy is consumed by the need to transfer data from the processing unit to the memory back and forth. In systems that need fast response, low latency and high bandwidth, designers are moving the memory closer to the CPU so that data doesn’t need to travel as far. Even better is to do the processing within the memory so the data doesn’t need to travel at all. When computing in memory, logic operations are performed directly in the memory without costly data transfer between the memory and a separate processing unit. Such an architecture promises energy efficiency and the potential to overcome the von Neumann bottleneck.

Computing in-memory can be realized using non-volatile devices, with resistive random access memory (ReRAM or RRAM) as an outstanding candidate due to its various advantages in power consumption, speed, durability, and compatibility for 3D integration.

Above: the evolution of compute towards processing in memory

There are various approaches to processing in memory with ReRAM.

One approach for ReRAM-based computing is stateful logic. In this technique, memory cells are used to perform the logic operations without moving any data outside the memory array. The logical states of inputs and outputs are represented as the resistance states of the memristor devices, with logical ’0’ as a High Resistance State (HRS) and logical ’1’ as a Low Resistance State (LRS).

While promising, stateful logic techniques have yet to be demonstrated for large-scale crossbar array implementation. In addition, stateful logic is incompatible with CMOS logic and is limited by a device’s endurance.

Another approach is non-stateful logic. A non-stateful computational operation does not rely on maintaining or remembering the state of previous operations or data. The in-memory logic processes data or performs computations independently of any historical context, performing computations and making decisions quickly for applications such as real-time data processing.

In non-stateful logic, different electrical variables represent their inputs and outputs. For example, the inputs are voltages, and the output is the resistance state of the memristor. Non-stateful logic combines the advantages of computing in-memory with CMOS compatibility. Memristive non-stateful logic techniques can be integrated into a 1T1R memory array, in a similar way to commercial ReRAM products, using a ReRAM like Weebit ReRAM, which is built in a 1T1R configuration where every memory cell has a transistor and a memristive device.

In a new paper by engineers and scientists from Weebit, CEA-Leti and The Technion, “Experimental Demonstration of Non-Stateful In-Memory Logic with 1T1R OxRAM Valence Change Mechanism Memristors,” Weebit ReRAM devices were used to demonstrate two non-stateful logic PIM techniques: Boolean logic with 1T1R and Scouting logic.

The team experimentally demonstrated various logical functions (such as AND, OR and XOR) of the two techniques using Weebit ReRAM to explore their possibilities for various applications. The experiments showed successful operations of both logic types, and correct functionality of the Weebit ReRAM in all cases.

The 1T1R logic technique exhibited notable advantages due to its simplistic design, employing only a single memristor. Scouting logic demonstrated significant potential as it employs a low voltage and no switching during logical operations, promising reduced power consumption and prolonged device lifespan.

Above: Figure 6 from the paper showing the connection of two cells in parallel in an (a) 1T1R standard array and a (b) pseudo-crossbar array

Through additional research and development, the opportunities of this technology will be further explored, ultimately leading to greater efficiency in time and energy. Read the entire paper (with an IEEE subscription) here.

The post Towards Processing In-Memory appeared first on Weebit.

ReRAM Gets a Boostfrom Smart Algorithms

Ilan Sever — Thu, 22 Jun 2023 07:00:41 +0000

If you’ve ever watched a Formula 1 race, you may have wondered how the cars reach race speeds up to 360km/h (223mph). Part of the magic is of course the very advanced and powerful engines. The design of F1 engines is extremely precise, enabling these sophisticated machines to be compact, lightweight, and highly efficient. However, no less important are all the other elements surrounding the engine that are designed to maximize its efficiency.

Above: A video that breaks down the high-level design of an F1 car.

In the video above, you can see how every single element of each tiny system in the vehicle is painstakingly designed to optimize airflow, decrease heat and weight, maintain a low center of gravity, generate more horsepower, maximize fuel usage, stabilize the driver and, together with the engine, meet the many other goals needed to be an F1 contender. If you’re interested in the math behind how the F1 engines can efficiently reach 1,000 HP, you can check out this video.

While Weebit isn’t designing race cars, we are very focused on optimizing the performance of our ReRAM. As such, we focus not only on the performance of the ReRAM array, but also use a broad range of smart engineering techniques in the Weebit ReRAM module which surrounds our memory array, to maximize that performance.

At the recent 15th IEEE International Memory Workshop (IMW) 2023, Bastien Giraud, a research engineer from CEA-List, presented, “Benefits of Design Assist Techniques on Performances and Reliability of a RRAM Macro,” a new paper written by CEA-List, CEA-Leti and Weebit.

The paper shares various design assist techniques used in development of the Weebit ReRAM module – some of the important methods that help us to optimize performance parameters. This includes state-of-art custom programming strategies including Read-Before-Write (RBW), Current Limiter (CL), Write Termination (WT), Write Verify (WV) and Error Correction Code (ECC), driven by a flexible Smart Write Algorithm (SWA). The authors describe how each one of these techniques enhances the intrinsic performance of the ReRAM.

Above: A die photograph and technology cross section of the test chip (memory module) described in the paper.

A key part of the equation is the SWA, which is a programmable algorithm embedded in the design. With the SWA, the writing parameters such as the Set and Reset voltages, the write pulse width, limit current and write termination current can be widely and independently tuned. This means that there is a wide array of different voltages that can be used for Set and Reset (1,024 different combinations). And there is a similarly wide range of pulse width, limitation and termination settings, as well as other variables – all to enable the best possible combinations for a target application. The parameters are programmable to adapt the SWA patterns to the specific requirements.

For testing purposes, the ReRAM module was integrated into a test chip which emulates system functions. Within the test chip, a RISC-V core optimizes the SWA parameters with software tuning loops during early life of the chip, and the ReRAM itself stores the parameters afterwards. This level of flexibility enables each setting to be determined individually to maximize reliability.

Above: a diagram of the embedded SWA.

The paper explores the individual and cumulative benefits from most common programming techniques deployed in the ReRAM macro. Each technique acts differently on energy consumption, access time, bit error rate (BER) and read margin. And altogether, the techniques, mastered from the flexible smart programming algorithm, efficiently counteracts the inherent variability of the ReRAM.

Compared to the original performance of the ReRAM macro, using the combined design techniques resulted in overall improvements of 87% in energy savings and a reduction of 55% on access time. You can read the paper here.

As we get ready for the next Formula 1 races, and as the teams from Alfa Romeo, AlphaTauri, Alpine, Aston Martin, Ferarri, Haas F1, McLaren, Mercedes, Red Bull Racing, and Williams continue to tweak their car designs, at Weebit we are continuing to explore and implement design techniques to further optimize our ReRAM performance.

The post ReRAM Gets a Boost
from Smart Algorithms appeared first on Weebit.

Setting a Foundationfor Security with ReRAM

Amir Regev — Thu, 15 Jun 2023 13:08:09 +0000

Today we are surrounded by an ever-increasing array of connected devices. Electronic payments are becoming more popular, and we are keeping more and more personal information in the cloud. At the same time, risks continue to rise as hackers get better and better at breaking down security schemes.

Regardless of the end application, security must be based on a multi-layered approach starting with the deepest embedded hardware. This starts within the manufacturing process of the chips themselves and includes a broad range of hardware primitives serving as security keys such as True Random Number Generator (TRNG) and physical unclonable functions (PUFs).

Security founded on manufacturing variations

A PUF is a technique that leverages the inherent physical variations in manufacturing processes to generate unique and unpredictable responses for each individual chip. Each instance of a hardware device exhibits slight variations in its electrical characteristics due to manufacturing variations, such as transistor size, doping levels, and other process parameters, and these variations can create a unique ‘fingerprint’ for each device. This hardware-based mechanism can be used to establish trust and ensure the uniqueness of each device without the need for stored secrets or traditional cryptographic algorithms.

PUFs are getting increasing attention as a hardware approach for information security in IoT devices. Since IoT devices have constrained area and power requirements and are often deployed in harsh environments, it is critical that the PUF be cost-efficient in terms of area per bit, and reliable with low bit error rate (BER).

PUFs can be based on a range of characteristics implemented on CMOS technology. Examples of strong PUFs are those based on ring oscillators while weaker PUFs are based on SRAM, Butterfly, and Flip-flops. Since these are all implemented on CMOS, they require large area, and also face difficulties in scaling beyond 10nm. Such conventional PUF schemes generally suffer from poor area efficiency and high BER under process, voltage and temperature (PVT) variations. While stabilization and error correction techniques can help to improve the output reliability, they result in considerable costs in area and power.

ReRAM – an inherently secure technology

ReRAM (or RRAM) has been investigated to implement efficient PUFs by using inherent variations in storage resistance, switching time and nonlinearity.

In addition, ReRAM has inherent physical attributes that enable it to protect its content from hacking attacks and make it more difficult to reverse engineer. Since ReRAM does not use any charges or other particles like flash does, it is more difficult to sense or change its internal state using electron beams. ReRAM can also easily withstand magnetic attacks because it is immune to electromagnetic fields. Because the ReRAM bit cell is deeply embedded between two metal layers integrated at the back-end-of-line (BEOL), it is also more immune to optical (laser) attacks. In addition, careful design of Weebit ReRAM cells and their associated control logic enables a balanced power profile which makes it less vulnerable to power analysis hacking.

Above: The ReRAM bit cell is deeply embedded between two
metal layers integrated at the back-end-of-line (BEOL)

Given all of this, ReRAM is an ideal solution for PUFs and other security mechanisms. In addition, ReRAM requires less area than traditional PUF schemes because it is integrated in BEOL.

At the recent International Memory Workshop, I presented a poster session based on a new paper written by technologists from the Indian Institute of Technology Delhi, led by Prof Manan Suri and Vivek Parmar, CEA-Leti and Weebit. The paper presents a ReRAM PUF with excellent reliability, demonstrating immunity to modern ML-based SCA (side-channel-attacks) by introducing a secondary low-energy HRS (high-resistance state) programming step.

The paper highlights the benefits of using Weebit ReRAM to provide PUF capability. In it, we successfully demonstrate and validate the design of a 2T-2R based PUF over a large array of 16Kb developed using Weebit’s technology by exploiting Vform variability between devices in consecutive rows. The design is validated for two different ReRAM stacks at 130nm and using one stack at 28nm.

Above: TEM (transmission electron microscope) image of fabricated ReRAM array at 130nm: a) top view; b) cross section with ReRAM device highlighted

The arrays show high immunity to the high-temperature SMT (surface mount technology)-reflow process that is critical to electronic component manufacturing. The ReRAM PUF arrays also demonstrate resilience against modern Machine-Learning (ML) based side channel attacks (SCAs), a growing security threat for chips today. In addition, the fabricated arrays exhibit excellent performance in terms of speed, data retention and memory window.

You can read the paper here.

The post Setting a Foundation
for Security with ReRAM appeared first on Weebit.

AI Reinforcement Learningwith Weebit ReRAM

Alessandro Bricalli — Mon, 05 Jun 2023 07:00:20 +0000

A paper from Weebit and our partners at CEA-Leti and the Nano-Electronic Device Lab (NEDL) at Politecnico di Milano was recently published in the prestigious journal Nature Communications. It details how bio-inspired systems can learn using ReRAM (RRAM) technology in a way that is much closer to how our own brains learn to solve problems compared to traditional deep learning techniques.

The teams demonstrated this by implementing a bio-inspired neural network using ReRAM arrays in conjunction with an FPGA system and testing whether the network could learn from its experiences and adapt to its environment. The experiments showed that our in-memory hardware not only does this better than conventional deep learning techniques, but it has the potential to achieve a significant boost in speed and power-saving.

Learning by experience

Humans and other animals continuously interact with each other and the surrounding environment to refine their behavior towards the best possible reward. Through a continuous stream of trial-and-error events, we are constantly evolving, learning, improving the efficiency of routine tasks and increasing our resilience to daily life.

The acquisition of experience-based knowledge is an interdisciplinary subject of biology, computer science and neuroscience known as “reinforcement learning,” and it is at the heart of a major objective of the AI community: to build machines that can learn by experience. The goal is machines that can infer concepts and make autonomous decisions in the context of constantly evolving situations.

In reinforcement learning, an agent (the neural network) interacts with its environment and receives feedback based on that interaction in the form of penalties or rewards. Through this feedback, it learns from its experiences and constructs a set of rules that will enable it to reach the best possible outcomes.

In developing such resilient bio-inspired systems, what’s needed is hardware with plasticity, i.e., the ability to adjust its state based on specific inputs and rules, as in the case of biological synapses. The lack of such commercial hardware is one of the current main limitations in implementing systems capable of learning from experience in an efficient way.

NVMs for in-memory computing

Researchers are now looking at non-volatile memories (NVMs) like ReRAM to enable hardware plasticity for neuromorphic computing. ReRAM is particularly well-suited for use in hardware capable of plastic adaptation, as its conductance can be easily modulated by controlling few electrical parameters. We’ve talked about this previously in several papers and a recent demonstration.

When voltage pulses are applied, the conductance of ReRAM can be increased or decreased by set and reset processes. This is how ReRAM stores information. In the brain, synapses provide the connections between neurons, and they can change their strength and connectivity over time in response to patterns of neural activity. Because of this similarity, ReRAM (RRAM) arrays can be used to create artificial synapses in a neural network which change their strength and connectivity over time in response to patterns of input. This allows them to learn and adapt to new information, just like biological synapses.

In addition to their ability to mimic the plasticity of biological synapses, memristors like ReRAM have several other advantages for these systems. ReRAM is small, low-power, and can be fabricated using standard semiconductor manufacturing techniques in the backend-of-the-line (BEOL), making it easy to integrate into electronic systems.

Power and bandwidth

Deep learning is extremely computationally intensive, involving large numbers of computations which can be very power-hungry, particularly when training large models on large datasets. A great deal of power is also consumed through the high number of iterative optimizations needed to adjust the weights of the network.

Deep learning models also require a lot of memory to store the weights and activations of the neurons in the network, and since they rely on traditional computing architectures, they are impacted by communication delays between the processing unit and the memory elements. This can be a bottleneck that not only slows down computations but also consumes a lot of power.

In the brain, there are no such bottlenecks. Processing and storage are inextricably intertwined, leading to fast and efficient learning. This is where in-memory computing with ReRAM can make a huge difference for neural networks. With ReRAM, fast computation can be done in-situ, with computing and storage in the same place.

The maze runner

While memristor-based networks are not always as accurate as standard deep learning approaches, they are very well-suited to implementing systems capable of adapting to changing situations. In our joint paper with CEA-Leti and NEDL we propose a bio-inspired recurrent neural network (RNN) using arrays of ReRAM devices as synaptic elements, that achieves plasticity as well as state-of-the-art accuracy.

To test our proposed architecture for reinforcement learning tasks, we studied the autonomous exploration of continually evolving environments including a two-dimensional dynamic maze showing environmental changes over time. The maze was experimentally implemented using a microcontroller and a field programmable-gate-array (FPGA), which ran the main program, enabled learning rules and kept track of the position of the agent. Weebit’s ReRAM devices were used to store information and adjust the strength of connections between neurons, and also to map the internal state of each neuron.

Above: a Scanning Electron Microscope image of the SiOx RRAM devices and
sample photo of the packaged RRAM arrays used in this work

Our experiments followed the same procedure used in the case of the Morris Water Maze in biology: the agent has a limited time to explore the environment under successive trials, and once a trial starts, the sequence of firing neurons maps the movement of the agent.

Above: Representation of high-level reinforcement learning for autonomous
navigation considering eight main directions of movement

The maze exploration is configured as successive random walks which progressively develop a model of the environment. Here is how it generally progressed:

At the beginning, the network cannot find the solution and spends the maximum amount of time available in the maze.
As the network progressively maps the configuration of its environment, it becomes a master of the problem trial after trial, and it finally finds the optimum path towards the objective.
Once the solution is found, the network decreases the computing time with each successive attempt at solving the same maze configuration, because it remembers the solution.
Next, the maze changes shape and a different escape path must be found. As it attempts to find the solution, the network receives a penalty in unexpected positions. After an exploration period, it successfully gets to the target again.
Finally, the system comes back to the original configuration and the network easily retrieves the first solution – faster than before. This is thanks to the residual memory of the internal states and to the intrinsic recurrent structure.

Above: (left) the system re-learns quickly when presented with “maze 1” the second time; (right) ReRAM resistance can be easily modulated by using different programming currents, enabling some memory of the original maze configuration due to gradual adaptation of the internal voltage of the neurons

You can see a short video here showing the experimental setup and the hardware demonstration of the exploration of the dynamic environment via reinforcement learning.

In our paper, we go into much more detail on the experiments, including testing the hardware for complex cases such as the Mars rover navigation to investigate the scalability and reconfigurability properties of the system.

Saving space with fewer neurons

One of the key features that makes our implementation so effective is that it uses an optimized design based on only eight CMOS neurons, representing the eight possible directions of movement inside the maze. CMOS neurons are generally integrated in the front-end of line (FEOL) and require a large amount of circuitry, so that an increase in the number of neurons is associated to an increase in area/cost.

In our system, the ReRAM, acting as the threshold modulator, is the only thing that changes for each explored position in the maze, while the remaining hardware of the neurons remains the same. For this reason, the size of the network can be increased with very small costs in terms of circuit area by increasing the amount of ReRAM – which is dense and easily integrated in the back-end-of-line (BEOL).

Our bio-inspired approach shows far better management of computing resources compared to standard solutions. In fact, to carry out an exploration at a certain average accuracy (99%), our solution turns out to be 10 times less expensive, as it requires 10 times less synaptic elements (the number of computing elements is directly proportional to the area/power consumption).

Above: Thanks to the reinforcement learning, the energy consumed by
each neuron drastically decreases as more and more trials are allowed

Key Takeaways

Deep learning techniques using standard Von Neumann processors can enable accurate autonomous navigation but require a great deal of power and a long time to make training algorithms effective. This is because the environmental information is often sparse, noisy and delayed, while training procedures are supervised and require direct association between inputs and targets during the backpropagation. This means that complex models of convolutional neural networks are needed to numerically find the best combination of parameters for the deep reinforcement computation.

Our proposed solution overcomes the standard approaches used for autonomous navigation using ReRAM based synapses and algorithms inspired by the human brain. The framework highlights the benefits of the ReRAM-based in-situ computation including high efficiency, resilience, low power consumption and accuracy.

Since biological organisms draw their capability from the inherent parallelism, stochasticity, and resilience of neuronal and synaptic computation, introducing bio-inspired dynamics into neural networks would improve robustness and reliability of artificial intelligent systems.

Read the entire paper here: A self-adaptive hardware with resistive switching synapses for experience-based neurocomputing.

The post AI Reinforcement Learning
with Weebit ReRAM appeared first on Weebit.

In-Memory Computing for AI Similarity Search using Weebit ReRAM

Amir Regev — Thu, 22 Dec 2022 08:37:09 +0000

We recently collaborated with our friends at IIT-Delhi, led by Prof. Manan Suri, on a research project demonstrating an efficient ReRAM based in-memory computing (IMC) capability for a similarity search application. The demonstration was done on 28nm ReRAM technology developed by Weebit in collaboration with CEA-Leti. A paper based on this work, “Fully-Binarized, Parallel, RRAM-based Computing Primitive for In-Memory Similarity Search,” was published in IEEE Transactions on Circuits and Systems II: Express Briefs.

A bit of background: CAMs in AI/ML search applications

Associative memories, also called Content Addressable Memories (CAMs), are an important component of intelligent systems. CAMs perform fast search operations by accepting a query and performing a search over multiple data points stored in memory to find one or more matches based on a distance metric, and then return the locations of the matches. This information can be potentially used for applications such as nearest neighbor searches for classification or unsupervised labeling. Ternary Content-Addressable Memory (TCAM) is a type of CAM that incorporates a “don’t care condition” to assist searches for partial matches and is therefore the most commonly used type of CAM.

TCAMs offer a powerful in-memory computing paradigm for efficient parallel-search and pattern-matching applications. With the emergence of big data and AI/ML, TCAMs have become a promising candidate for a variety of edge and enterprise data-intensive applications. In the research project, we proposed a scheme that demonstrates the use of TCAMs for performing hyperspectral imagery (HSI) pixel matching in the context of remote-sensing applications. TCAMs can also be used to enable applications such as biometrics (facial/iris/fingerprint recognition) and to assist in string matching for large scale database searches.

Traditionally, CAMs/TCAMs are designed using standard memory technologies such as SRAM or DRAM. However, these volatile memory-based circuits have performance limitations in terms of search energy/bit (a metric commonly used for evaluating the performance of CAM circuits), and CAMs based on SRAMs are limited in scale due to relatively large cell areas.

ReRAM can overcome performance limitations

CAM performance limitations can be addressed by using an emerging NVM (Non-Volatile Memory) technology like ReRAM instead of volatile memory technologies. Because ReRAM can help reduce power consumption and cell size, it can be used to build compact and efficient TCAMs. Such NVM devices also reduce circuit complexity and provide opportunity to exploit low-area analog in-memory computing), leading to increased design flexibility.

In the recent paper, the joint IIT-Delhi/Weebit team presented a hardware realization for CAM using Weebit ReRAM arrays. In particular, the researchers proposed an end-to-end engine to realize IMSS (In-Memory Similarity Search) in hardware by using ReRAM devices and binarizing data and queries through a custom pre-processing pipeline. The learning capability of the proposed ReRAM based in-memory computing engine was demonstrated on a hyperspectral imagery pixel classification task using the Salinas dataset, demonstrating an accuracy of 91%.

Above: Figure showing energy efficient classification of agricultural land from hyperspectral imagery using proposed In-Memory Computing Technique.

The team experimentally validated the system on fabricated ReRAM arrays, with full-system validation performed through SPICE simulations using an open source SkyWater 130nm CMOS physical design kit (PDK). We were able to significantly reduce the computations required and improve the speed of computations, leading to benefits in terms of both energy and latency. By projecting estimations to advanced nodes (28nm), we demonstrated energy savings of ~1.5x for a fixed workload compared to the current state-of-the-art technology.

You can access the full paper here.

The post In-Memory Computing for AI
Similarity Search using Weebit ReRAM appeared first on Weebit.