Tech Deep Dive Archives | Weebit

Enabling ‘Few-Shot Learning’AI with ReRAM

Mario Pallo — Thu, 19 Jun 2025 08:24:25 +0000

AI training happens in the cloud because it’s compute-intensive and highly parallel. It requires massive datasets, specialized hardware, and weeks of runtime. Inference, by contrast, is the deployment phase — smaller, faster, and often done at the edge, in real time. The cloud handles the heavy lifting; the edge delivers the result. Now, recent advances in resistive memory technology are making edge AI inferencing more energy-efficient, secure, and responsive.

At the 2025 IEEE Symposium on VLSI Technology and Circuits, researchers from CEA-Leti, Weebit Nano, and the Université Paris-Saclay presented a breakthrough in “on-chip customized learning” — demonstrating how a ReRAM-based platform can support few-shot learning using just five training updates.

Few-shot learning (FSL) is an approach where AI models learn new tasks with only a handful of examples. It is very useful for edge applications, where devices must adapt to specific users or environments and can’t rely on large, labeled datasets.

The team didn’t just train a model — they showed that a memory-embedded chip could adapt in real-time, at the edge, without requiring cloud access, long training cycles, or power-hungry hardware. The core enabler is a combination of Model-Agnostic Meta-Learning (MAML) and multi-level Resistive RAM (ReRAM or RRAM).

MAML provides a clever workaround that can enable learning in power-constrained edge devices. Instead of training from scratch, it trains a model to learn. During an off-chip phase, the system builds a general-purpose model by exposing it to many tasks. This “learned initialization” is then deployed to edge devices, where it can quickly adapt to new tasks with minimal effort.

This means:

No need for the cloud – minimizing bandwidth and latency
Minimal data required – minimizing compute requirements at the edge
Massive time and energy savings

Executing this on edge hardware requires memory technology that can keep up — and that’s where ReRAM comes in.

Because ReRAM is a non-volatile memory that supports analog programming, it is ideal for low-power and in-memory compute architectures. ReRAM can store information as varying conductance states, which can then represent the weights (numerical values that represent the strength or importance of connections between neurons or nodes in a model) in neural networks.

However, ReRAM also comes with challenges — notably variability and some limits on write endurance. Few-shot learning helps overcome both.

Reducing Write Cycles with MAML

In terms of endurance, the key is in leveraging MAML, which enabled the research team to reduce the number of required write operations by orders of magnitude. Instead of millions of updates, they showed that just five updates — each consisting of a handful of conductance tweaks — were enough to adapt to a new task.

For the experiments, a chip fabricated on 130nm CMOS was used which has multi-level Weebit ReRAM integrated in the back end of line (BEOL). The network architecture had four fixed convolutional layers and two trainable fully-connected (FC) layers. Weights in the FC layers were encoded using pairs of ReRAM cells, storing the difference in conductance between them.

Training was carried out using a “computer-in-the-loop” setup, where the system calculated gradients and issued write commands directly to the ReRAM crossbars. In a full deployment, this would be managed by a co-integrated ASIC.

The learning task? Character recognition from the Omniglot dataset, a popular benchmark in FSL. The chip was pre-loaded with the MAML-trained parameters and fine-tuned on-device to recognize new characters using only five gradient updates.

The result:

Starting at 20% accuracy (random guess)
Reaching over 97% accuracy after five updates
Energy use of less than 10 μJ for a 2kbit array

For an optical character recognition (OCR) application using AI with a 2Kbit array, energy consumption of less than 10 μJ represents excellent energy efficiency compared to typical industry benchmarks. This level of power consumption places such a system in the ultra-low-power category suitable for edge AI applications and battery-powered devices.

Programming Strategies to Mitigate Against Drift

In ReRAM conductance levels can drift over time, and adjacent states may overlap, introducing noise. To tackle this, the team tested multiple programming strategies:

Single-shot Set: Simple, fast, but inaccurate
Iterative Set: More precise, but slower
Iterative Reset: Useful for low conductance states
Hybrid strategy: A blend of both, offering the best balance

The hybrid strategy proved most effective, reducing variability and improving long-term retention. After a 12-hour bake at 150°C (equivalent to 10 years at 75°C) the system still maintained over 90% of its accuracy.

This is critical for commercial deployment, where temperature fluctuations and data longevity are real-world concerns.

Looking Ahead

This research points to a compelling future for AI at the edge:

Learn locally: Devices can customize their behavior to individual users
Stay secure: No data needs to be sent to the cloud
Save time and energy: Minimal training and in-memory compute keep power low
Scale affordably: Meta-training can be centralized and shared across devices

And because the platform uses ReRAM, the entire system benefits from ultra-low standby power and reduced silicon area.

This work is more than a proof of concept, it’s a signpost. As more AI applications move to the edge, we’ll need memory technologies that support not just inference, but real learning. ReRAM is emerging as one of the few candidates that can deliver on that vision, especially when paired with smart algorithms like MAML.

View the presentation, “On Chip Customized Learning on Resistive Memory Technology for Secure Edge AI” from the 2025 IEEE Symposium on VLSI Technology and Circuits here.

The post Enabling ‘Few-Shot Learning’
AI with ReRAM appeared first on Weebit.

Relaxation-Aware Programming in ReRAM:Evaluating and Optimizing Write Termination

Marcelo Cueto — Wed, 28 May 2025 12:39:59 +0000

Resistive RAM (ReRAM or RRAM) is the strongest candidate for next-generation non-volatile memory (NVM), combining fast switching speeds with low power consumption. New techniques for managing a memory phenomenon called ‘relaxation’ are making ReRAM more predictable — and easier to specify for real-world applications.

What is the relaxation problem in memory? Short-term conductance drift – known as ‘relaxation’ – presents a challenge for memory stability, especially in neuromorphic computing and multi-bit storage.

At the 2025 International Memory Workshop (IMW), a team from CEA-Leti, CEA-List and Weebit presented a poster session, “Relaxation-Aware Programming in RRAM: Evaluating and Optimizing Write Termination.” The team reported that Write Termination (WT), a widely used energy-saving technique, can make these relaxation effects worse.

So what can be done? Our team proposed a solution: a modest programming voltage overdrive that curbs drift without sacrificing the efficiency advantages of the WT technique.

Energy Savings Versus Stability

Write Termination improves programming efficiency by halting the SET (write) operation once the target current is reached, instead of using a fixed-duration pulse. This reduces both energy use and access times, supporting better endurance across ReRAM arrays.

It’s desirable, but problematic in action.

Tests on a 128kb ReRAM macro showed that unmodified WT increases conductance drift by about 50% compared to constant-duration programming.

In these tests, temperature amplified the effect: at 125°C, the memory window narrowed by 76% under WT, compared to a fixed SET pulse. Even at room temperature, degradation reached 31%.

Such drift risks destabilizing systems that depend on tight resistance margins, including neuromorphic processors and multi-level cell (MLC) storage schemes, where minor shifts can translate into computation errors or data loss.

The experiments used a testchip fabricated on 130nm CMOS, integrating the ReRAM array with a RISC-V subsystem for fine-grained programming control and data capture.

Conductance relaxation was tracked from microseconds to over 10,000 seconds post-programming. A high-speed embedded SRAM buffered short-term readouts, allowing detailed monitoring from 1µs to 1 second, while longer-term behavior was captured with staggered reads.

This statistically robust setup enabled precise analysis of both early and late-stage relaxation dynamics.

To measure stability, the researchers used a metric called the three-sigma memory window (MW₃σ). It looks at how tightly the memory cells hold their high and low resistance states, while ignoring extreme outliers.

When this window gets narrower, the difference between a “0” and a “1” becomes harder to detect — making it easier for errors to creep in during reads.

By focusing on MW₃σ, the team wasn’t just looking at averages — they were measuring how reliably the memory performs under real-world conditions, where even small variations can cause problems.

Addressing Relaxation with Voltage Overdrive

Voltage overdrive is the practice of applying a slightly higher voltage than the minimum required to trigger a specific operation in a memory cell — in this case, the SET operation in ReRAM.

Write Termination cuts the SET pulse short as soon as the target current is reached. That saves energy, but it also means some memory cells are just barely SET. They’re fragile — sitting near the edge of their intended resistance range. That’s where relaxation drift kicks in: over time, conductance slips back toward its original state.

So, the team asked a logical question:

“What if we give the cell just a bit more voltage — enough to push it more firmly into its new state, but not so much that we burn energy or damage endurance?”

Instead of discarding WT, the team increased the SET voltage by 0.2 Arbitrary Units (AU) above the minimum requirement.

Key results:

Relaxation dropped to levels comparable to constant-duration programming
Memory windows remained stable at both room and elevated temperatures
WT’s energy efficiency was mostly preserved, with only a ~20% increase in energy compared to unmodified WT

Modeling predicted that without overdrive, 50% of the array would show significant drift within a day. With overdrive, the same drift level would take more than 10 years, a timescale sufficient for most embedded and computing applications.

Balancing Energy and Stability

The modest voltage increases restored conductance stability without negating WT’s energy and speed benefits. Although the overdrive added some energy overhead, overall consumption remained lower than that of fixed-duration programming.

This adjustment offers a practical balance between robustness and efficiency, critical for commercial deployment.

As ReRAM moves toward wider adoption and is a prime candidate for use in neuromorphic and multi-bit storage applications, conductance drift will become a defining challenge.

The results presented at IMW 2025 show that simple device-level optimizations like voltage overdrive can deliver major gains without requiring disruptive architectural changes.

Check out more details of the research here.

The post Relaxation-Aware Programming in ReRAM:
Evaluating and Optimizing Write Termination appeared first on Weebit.

A Complete No-Brainer:ReRAM for Neuromorphic Computing

Giuseppe Piccolboni — Wed, 05 Jun 2024 07:05:54 +0000

In the last 60 years technology has evolved at such an exponentially fast rate that we are now regularly conversing with AI based chatbots, and that same OpenAI technology has been put into a humanoid robot. It’s truly amazing to see this rapid development.

Above: OpenAI technology in a humanoid robot

Continued advancement of AI development faces numerous challenges. One of these is computing architecture. Since it was first described in 1945, the von Neumann architecture has been the foundation for most computing. In this architecture, instructions and data are stored together in memory and communicate via a shared bus to the CPU. This has enabled many decades of continuous technological advancement.

However, there are bottlenecks created by such an architecture, in terms of bandwidth, latency, power consumption, and security, to name a few. For continued AI development, we can’t just make brute force adjustments to this architecture. What’s needed is an evolution to a new computing paradigm that bypasses the bottlenecks inherent in the traditional von Neumann architecture and more precisely mimics the system is trying to imitate: the human brain.

To achieve this, memory must be closer to the compute engine for better efficiency and power consumption. Even better, computation should be done directly within the memory itself. This paradigm change requires new technology, and ReRAM (or RRAM) is among the most promising candidates for future in-memory computing architectures.

Roadmap for ReRAM in AI

Given its long list of advantages, ReRAM can be used in a broad range of applications ranging from mixed signal and power management to IoT, automotive, industrial, and many other areas. We generally see ReRAM rolling out in AI applications over time in different ways. For AI related applications, relevant advantages of ReRAM include its cost efficiency, ultra-low power consumption, scaling capabilities, small footprint and fit into a long-term roadmap to advanced neuromorphic computing.

The shortest-term opportunity for ReRAM is as an embedded memory (10-100 Mb) for edge AI applications. The idea is to bring the NVM closer to the compute engine, therefore massively reducing power consumption. This opportunity can be realized today using ReRAM for synaptic weight storage, replacing the use of external flash and eliminating some of the local SRAM or DRAM. My colleague Gideon Intrater will present on this topic on Monday, June 24^th at the Design Automation Conference 2024. If you are planning to attend, please attend his presentation as part of the session, ‘Cherished Memories – Exploring the Power of Innovative Memory Architectures for AI applications.’

In the mid-term, ReRAM is a great candidate for in-memory computing where analog behavior is required. In this methodology, ReRAM is used for both computation and weight storage – at first in binary (storing two values per bit) and then moving to multi-level operations (multiple values per bit). An example of in-memory computing was proposed in 2022 using arrays based on Weebit ReRAM as Content Addressable Memories. This work, done in collaboration with the Department of Electrical Engineering, Indian Institute of Technology Delhi, is highlighted in the article, ‘In-Memory Computing for AI Similarity Search using Weebit ReRAM,’ by Amir Regev.

My colleague Amir Regev also recently wrote an article, ‘Towards Processing In-Memory,’ which explains more about the idea of in-memory computing with Weebit ReRAM, based on work done with the Department of Electrical Engineering at the Technion Israel Institute of Technology and CEA-Leti.

Above: A roadmap for ReRAM in AI – short-term, mid-term and long-term

In the longer term, neuromorphic computing comes into play. In the brain, synapses provide the connections between neurons, and they can change their strength and connectivity over time in response to patterns of neural activity. Likewise, ReRAM arrays can be used to create artificial synapses in a neural network which change their strength and connectivity over time in response to patterns of input. This allows them to learn and adapt to new information, just like biological synapses.

Areas of particular interest include Bayesian architectures and meta learning. Bayesian neural networks hold great potential for the development of AI, particularly where decision-making under uncertainty is critical. These networks actually quantify uncertainty, so such methods can help AI models avoid overconfidence in their predictions, potentially leading to more reliable, safer AI systems. The characteristics of ReRAM make it an ideal solution for these networks.

The aim of meta learning is to create models that can generalize well to new tasks by leveraging prior experience. As they ‘learn to learn,’ they continuously update their beliefs based on new data without needing to re-train from scratch, making them more adaptable and flexible than today’s methods. The idea is to develop a standalone system capable of learning, adapting and acting locally at the edge. A model would be trained on a server and then optimized parameters would be saved on the chip at the edge. The edge system would then be able to learn new tasks by itself – like humans and other animals.

Compared to current machine learning where models are trained for specific tasks with fixed algorithm on a huge dataset, there are numerous advantages of this concept, including:

Data is stored locally on the chip and not in the cloud so there is greater security, much faster reaction and lower power consumption
Computation is done in-situ so there is no need to transfer data from memory to the computation unit
The system could adapt to very different real world situations since it would imitate human learning ability

A recent joint paper from Politecnico di Milano, Weebit and CEA-Leti proposed a bio-inspired neural network capable of learning using Weebit ReRAM. The focus is on building a bio-inspired system that requires hardware with plasticity, in other words the ability to adjust its state based on specific inputs and rules, as in the case of biological synapses. You can read about this work in an article by Alessandro Bricalli, ‘AI Reinforcement Learning with Weebit ReRAM.’

This is the future of ReRAM in AI, and I can’t wait!

Overcoming hurdles

Like all memory technologies, ReRAM has both pros and cons for neuromorphic applications. On the ‘pros’ side, this includes its non-volatility, ability to scale to smaller nodes, low power consumption and the ability to have multi-level operation.

The ‘cons’ are largely due to phenomena such as limited precision of the programming conductance. ReRAM technologies are also subject to some resistance drift while cycling. Other phenomena, such as relaxation (linked to both time and temperature), can impact resistance values over time.

As we look towards using ReRAM for neuromorphic computing, we won’t let such resistance variability hold us back. There are not only ways to mitigate such factors, but also ways in which these ‘cons’ can be taken advantage of in certain neuromorphic bio-inspired circuits.

Mitigating resistance variability

One of the main ways we can mitigate resistance variability is by using Program and Verify (P&V) algorithms. The idea is quite simple: whenever a cell doesn’t satisfy a given criterion in some way, we can reprogram it and then re-verify its resistance state. Such methods allow us to fine-tune resistance levels in a given range to attain more than just the levels of low-resistance state (LRS) and high-resistance state (HRS).

We can do this in multiple ways. One way is to use a gradual method, in which we repeat the same operation over and over until a cell satisfies the condition imposed (or the maximum number of allowed repetitions has been completed). This method can be incremental, in which case the programming control parameter increases at each repetition, or cumulative, in which case the parameter is kept constant each time.

There are numerous knobs we can control, including the programming direction and level of the control parameter. The total number of P&V cycles, as well as what happens before the verify itself, can vary depending on the goal we want to achieve – whether it’s improving retention, resilience or endurance, or achieving other goals.

The Ielmini Group at the Politecnico di Milano has proposed numerous state-of-the-art algorithms which can help with further tuning. One of these is called ISPVA, in which the gate voltage of the transistor is kept constant, therefore fixing the compliance current, while the top electrode voltage is increased until the desired conductance is attained. Conversely, in the IGVVA approach, the top electrode voltage is kept constant (high enough to grant a successful set operation), while the gate voltage is increased to gradually increase the compliance current.

Variability of the programmed levels is a key parameter in in-memory computing and hardware implementation of deep neural networks. Therefore, it’s important to use algorithms that not only achieve the right level of electrical conductance but also make sure this conductance is consistent across multiple attempts. There are many other P&V algorithms we can employ, for example to reach a more stable conductive filament, reduce post programming fluctuations, or achieve another goal.

It’s important to note that P&V algorithms are not the only tools available to mitigate ReRAM variability. For instance, pulse shape can play an important role in reducing variability and therefore improving neural network accuracy. Some industry work has shown that compared to regular square pulses, triangular pulses reduce the number of oxygen vacancies after set operation, therefore improving conductive filament stability. Triangular pulses have also been shown to be effective in improving the resistance state after the reset operation.

Above: Triangular pulse shape reduces the Vo after set operation, therefore improving conductive filament stability (Y. Feng et al., EDL 2021)

Taking advantage of ReRAM’s ‘cons’ for neuromorphic computing

In a neural network, we would like synapses to have a linear and symmetric response, a large number of analog states, a high on/off ratio, high endurance and no variability. ReRAM has intrinsic variabilities, and we can at least partly mitigate such non-idealities. For neural networks, we can also use them to our advantage!

One example is in a Bayesian neural network where device variability is actually key to its implementation: the natural differences from one device to another are crucial for how it works. For instance, differences in how memory conducts electricity with each use can actually help by providing randomness, which is useful for generating numbers or for algorithms in AI that need randomness, like Bayesian reasoning.

In Bayesian methods, you don’t just get one answer from a given input; instead, you get a distribution of possible answers. The natural variation in ReRAM can be used to create this distribution. This variation is like having physical random numbers that can help perform calculations directly within the memory. This makes it possible to do complex multiplications right where the data is stored. In addition, Bayesian neural networks are resilient to device-to-device variability and system aging.

Summary

ReRAM is a good match for neuromorphic applications due to its cost-efficiency, ultra-low power consumption, scaling advantage at 28nm and below, small footprint to store very large arrays, analog behavior and ease of fabrication in the back end of the line. The conductance of ReRAM can also be easily modulated by controlling a few electrical parameters.

We can mitigate the ‘cons’ of ReRAM to make it shine in edge AI and in-memory computing applications in the short- and mid-term, respectively. In the long term, the similarity of ReRAM cells to synapses in the brain make it a great fit for neuromorphic computing. As we look to these applications for new applications, such as Bayesian neural networks, the ‘cons’ of ReRAM can not only be mitigated, but can even provide advantages.

I recently presented a tutorial at the International Memory Workshop in Seoul, during which I discussed the requirements of new neuromorphic circuits, why ReRAM is an ideal fit for such applications, existing challenges and possible solutions to improve ReRAM-based neural networks.
Please click here to view the presentation.

The post A Complete No-Brainer:
ReRAM for Neuromorphic Computing appeared first on Weebit.

Towards Processing In-Memory

Amir Regev — Thu, 14 Dec 2023 13:01:29 +0000

One of the most exciting things about the future of computing is the ability to process data inside of the memory. This is especially true since the industry has reached the end of Moore’s Law, and scientists and engineers are focused on finding efficient new architectures to overcome the limitations of modern computing systems. Recent advancements in areas such as generative AI are adding even greater pressure to find such solutions.

Most modern computing systems are based on the von Neumann computing architecture. There is a bottleneck that arises in such systems due to the separation of the processing unit and the memory. In the traditional Von-Neuman architecture, 95% of the energy is consumed by the need to transfer data from the processing unit to the memory back and forth. In systems that need fast response, low latency and high bandwidth, designers are moving the memory closer to the CPU so that data doesn’t need to travel as far. Even better is to do the processing within the memory so the data doesn’t need to travel at all. When computing in memory, logic operations are performed directly in the memory without costly data transfer between the memory and a separate processing unit. Such an architecture promises energy efficiency and the potential to overcome the von Neumann bottleneck.

Computing in-memory can be realized using non-volatile devices, with resistive random access memory (ReRAM or RRAM) as an outstanding candidate due to its various advantages in power consumption, speed, durability, and compatibility for 3D integration.

Above: the evolution of compute towards processing in memory

There are various approaches to processing in memory with ReRAM.

One approach for ReRAM-based computing is stateful logic. In this technique, memory cells are used to perform the logic operations without moving any data outside the memory array. The logical states of inputs and outputs are represented as the resistance states of the memristor devices, with logical ’0’ as a High Resistance State (HRS) and logical ’1’ as a Low Resistance State (LRS).

While promising, stateful logic techniques have yet to be demonstrated for large-scale crossbar array implementation. In addition, stateful logic is incompatible with CMOS logic and is limited by a device’s endurance.

Another approach is non-stateful logic. A non-stateful computational operation does not rely on maintaining or remembering the state of previous operations or data. The in-memory logic processes data or performs computations independently of any historical context, performing computations and making decisions quickly for applications such as real-time data processing.

In non-stateful logic, different electrical variables represent their inputs and outputs. For example, the inputs are voltages, and the output is the resistance state of the memristor. Non-stateful logic combines the advantages of computing in-memory with CMOS compatibility. Memristive non-stateful logic techniques can be integrated into a 1T1R memory array, in a similar way to commercial ReRAM products, using a ReRAM like Weebit ReRAM, which is built in a 1T1R configuration where every memory cell has a transistor and a memristive device.

In a new paper by engineers and scientists from Weebit, CEA-Leti and The Technion, “Experimental Demonstration of Non-Stateful In-Memory Logic with 1T1R OxRAM Valence Change Mechanism Memristors,” Weebit ReRAM devices were used to demonstrate two non-stateful logic PIM techniques: Boolean logic with 1T1R and Scouting logic.

The team experimentally demonstrated various logical functions (such as AND, OR and XOR) of the two techniques using Weebit ReRAM to explore their possibilities for various applications. The experiments showed successful operations of both logic types, and correct functionality of the Weebit ReRAM in all cases.

The 1T1R logic technique exhibited notable advantages due to its simplistic design, employing only a single memristor. Scouting logic demonstrated significant potential as it employs a low voltage and no switching during logical operations, promising reduced power consumption and prolonged device lifespan.

Above: Figure 6 from the paper showing the connection of two cells in parallel in an (a) 1T1R standard array and a (b) pseudo-crossbar array

Through additional research and development, the opportunities of this technology will be further explored, ultimately leading to greater efficiency in time and energy. Read the entire paper (with an IEEE subscription) here.

The post Towards Processing In-Memory appeared first on Weebit.

How Low Can You Go?An Inside Look at Weebit ReRAM Power Consumption

Ilan Sever — Wed, 23 Aug 2023 13:31:21 +0000

One of the key advantages of Weebit ReRAM (RRAM) is the technology’s ultra-low power consumption. Some of this advantage is due to the inherent features of the technology, and some of it is due to smart design. In this article we’ll explain why customers need a low power non-volatile memory (NVM) and what makes Weebit ReRAM lower power than other types of NVM. We’ll also explain a bit about some of the design techniques and levers that customers can use to adjust the power.

Why is low power consumption important?
In our rapidly warming climate, it has become critical to minimize carbon emissions, and this includes reducing the power consumption of everything we touch – from our homes to our cars to our personal electronic devices and beyond. This is now a key consideration at the government level in many countries, and is a key consideration for institutional investors.

At the practical level, for companies developing electronic products, low power consumption is often a key consideration, especially when it comes to battery operated IoT devices with Bluetooth® Low Energy or energy harvesting technology, and medical devices such as wearables and implantables.

Such devices must ensure that data gathered by tiny sensors is regularly and reliably delivered, often from remote or inaccessible locations. For many of these applications, whether in medical, transportation, agriculture, or other applications, reliability can have life or death consequences. Long battery life – supporting applications that last up to 10-15 years on one battery – is critical.

Above: Various ultra-low power attributes of Weebit ReRAM lead to longer device battery life.

Even for products that are plugged into power, designing for low power can be an important consideration. Developers want to avoid costly fans and heat sinks, reduce overall electricity costs, and meet consumer product energy efficiency standards including certifications like Energy Star and LEED for products and buildings, as well as EU energy efficiency labeling. Such guidelines consider not only active power consumption, but also ‘leakage’ power consumed when a product is not in use.

The Role of NVM in Reducing Power Consumption
While NVM may not contribute as much to system power consumption as other components such as the CPU, connectivity modules or display, reducing its impact is a key goal for an overall power management strategy.

As part of a system, choosing ultra-low-power NVM helps to enable longer battery life, leading to improved energy efficiency and longer use times between recharges or battery replacements. It can also lead to better thermal management and overall greener technology. Importantly, by reducing power consumption, the memory subsystem can allocate more power to other critical components, such as the processor or display, improving overall system performance.

When it comes to NVM, there are various factors that contribute to power consumption, such as the power consumed by Read and Write operations, standby power, access frequency and overall system design. Let’s look at some of these in a bit more depth.

Read Power Consumption
In an NVM ‘Read’ operation, data is retrieved from a specific memory location. This includes decoding the address to identify the specific memory location to be accessed, retrieving the data, and outputting the data for processing elsewhere in the system. The ‘Read’ operation is the most common NVM operation, happening many more times than programming the cells, and thus consuming more power.
The power consumed during a Read operation depends on several key factors. One of these is the power supply used. Flash and some other types of memory require a special high voltage supply. Weebit ReRAM is able to read out of a low-voltage power supply – the same one that any system needs for basic calculations. With Weebit ReRAM, there is also no need for an always-on charge-pump – something that is needed with flash memory.

Another contributing factor is cell reading voltage and current. The cell reading voltage refers to the voltage level applied to a memory cell during the read operation. Different memory technologies have specific voltage requirements for reading data from their cells, and these can vary based on the specific memory technology, fabrication process, and design considerations. With Weebit ReRAM, a Read operation is performed using only the digital core voltage (VDD), and the Read cell voltage for ReRAM is typically a few hundreds of millivolts (mV) or lower.
The typical Read cell voltage requirements for other NVMs are higher, typically in the range of 1 to 3V for flash, and several hundred mV to a few volts for MRAM. Weebit ReRAM also has a dedicated “read-only-mode” during which program voltage can be completely shut-off.

These are just a few considerations in terms of Read power consumption. Other important things that impact Read power include:

The number of data bits read in parallel, including error correction code (ECC) bits: Weebit’s ReRAM architecture is flexible to support different word widths based on the system architect’s preference.
Memory array capacitance: In Weebit ReRAM, the bitline capacitance is reduced due to array segmentation.
Sense-amplifier efficiency: Weebit’s engineers have innovated and optimized the sensing circuitry to consume extremely low power per bit.
Control logic and self-timing circuitry: Weebit ReRAM has a single-ended Read operation with self-timing to enable the operation to terminate as soon as it is complete.

Fast Read times in Weebit ReRAM also allow Execute-In-Place (XiP) to further save system power. We will cover this in a future article.

Write (Program) Power Consumption
In an NVM Write operation, data or instructions are stored or updated to a specific location. This is a complex operation encompassing many events which, of course, consume power. Power consumed during programming is mainly dependent on:

The number of data bits to be programmed
The power supply used during SET and RESET operations
The current through the cell during the SET/RESET operation
The write circuitry (LDO, limitation, termination) efficiency
The ability to shut off the power as soon as the operation has completed

In terms of the number of data bits to be programmed, one of the key advantages that ReRAM and other emerging NVMs have over flash memory is that these technologies do not require a sector erase. With flash, even if you just want to erase one bit, you actually must erase the entire sector or segment, and before erasing that sector, you first have to program all the new bits including those that didn’t even need to be programmed. This is obviously a lot of extra work and power. Designers working with flash have found ways to work around the challenge and mitigate the penalties associated with the extra programming and erasing, but even with these workarounds, emerging memories like ReRAM are significantly more power efficient.

With ReRAM, using its direct program/erase capability and byte addressability, the programming is bit-wise: each bit can be independently and selectively SET or RESET. Importantly, with Weebit ReRAM, a programming algorithm does a comparison to existing data to avoid unnecessary writes and then masks out the bits that do not need to be reset.

Above: A programming algorithm compares new data to existing data and only resets the new bits.

The programming algorithm also splits Words into Sub-Words to control peak power consumption to mitigate against any issues such as IR-drop or power supply failure. Weebit ReRAM implements smart programming algorithms that control voltage, current and pulse duration during the Write operation, enabling efficient usage of resources.

Flash memory often requires high voltages for programming, sometimes requiring voltages generated by a charge pump or DC-DC converter. These types of converters add area to chip, add cost to the system and waste power. With Weebit ReRAM, programming is ultra-low-power, capable of being done using a lithium cell battery. It also requires low voltages (using a ~3V supply) with no charge pump needed when a ~3V IO voltage is available.

As with a Weebit ReRAM Read operation, Smart algorithms enable the shortest possible Write time and a termination mechanism shuts off the programming pulse as soon as the cell is flipped.

Standby/Sleep/Power-Down Modes
Depending on their specific application and operation, a key design consideration for customers is how often the memory can be in programming mode, standby mode, sleep mode, or very deep power-down mode.
During the inactive states of the system, there is significant cell leakage when using a volatile memory such as SRAM. Similarly with DRAM there are “hidden” refresh cycles that consume power during these states. With any non-volatile memory, there is close to zero power consumption used for retaining the data during inactive states. Like other NVMs, Weebit ReRAM is able to be completely powered down to zero leakage while maintaining stored data. The fact that ReRAM does not require an always-on charge pump makes this advantage even more evident.

The wake-up time of a memory from deep power-down mode to active is also a key factor. A memory that can wake up rapidly from power-down mode to read (or programming) mode allows the system architect to put the memory to sleep even during shorter activity breaks. Said another way, waking up quickly means the system can also go to sleep more often. Again, not having a charge-pump makes this advantage even more meaningful as charge-pumps are known for their slow and power-hungry wake-up times.

Weebit engineers are focused on continuing to reduce the time needed for our ReRAM to wake up from very deep power-down mode. The time is already very fast to switch from power-down to standby mode, and we are in the process of further reducing this by orders of magnitude.

We are focused on providing customers with flexibility when it comes to their choices. One of the benefits of working with Weebit is that our designers are experts at optimizing these parameters and we are willing and able to work with customers to help them optimize their designs to balance performance and power consumption as well as other parameters. If you’d like to learn more about Weebit’s design expertise, read this recent blog on our smart algorithms.

Ultra-low-power NVM
While the factors impacting NVM power consumption can vary widely based on application, Weebit ReRAM is shown to consume significantly less Read, Write and Standby power than embedded flash and other NVMs, contributing to longer battery life for many devices. The low voltage levels used for memory transactions, coupled with its fast memory access time, greatly reduce the overall power consumed by Weebit ReRAM.

Weebit is also shown to be a ‘greener’ type of NVM compared to other technologies. An environmental initiative we completed with our partner CEA-Leti earlier this year examines the environmental impact of Weebit ReRAM compared to MRAM. You can read about that study here.

The post How Low Can You Go?
An Inside Look at Weebit ReRAM Power Consumption appeared first on Weebit.

AI Reinforcement Learningwith Weebit ReRAM

Alessandro Bricalli — Mon, 05 Jun 2023 07:00:20 +0000

A paper from Weebit and our partners at CEA-Leti and the Nano-Electronic Device Lab (NEDL) at Politecnico di Milano was recently published in the prestigious journal Nature Communications. It details how bio-inspired systems can learn using ReRAM (RRAM) technology in a way that is much closer to how our own brains learn to solve problems compared to traditional deep learning techniques.

The teams demonstrated this by implementing a bio-inspired neural network using ReRAM arrays in conjunction with an FPGA system and testing whether the network could learn from its experiences and adapt to its environment. The experiments showed that our in-memory hardware not only does this better than conventional deep learning techniques, but it has the potential to achieve a significant boost in speed and power-saving.

Learning by experience

Humans and other animals continuously interact with each other and the surrounding environment to refine their behavior towards the best possible reward. Through a continuous stream of trial-and-error events, we are constantly evolving, learning, improving the efficiency of routine tasks and increasing our resilience to daily life.

The acquisition of experience-based knowledge is an interdisciplinary subject of biology, computer science and neuroscience known as “reinforcement learning,” and it is at the heart of a major objective of the AI community: to build machines that can learn by experience. The goal is machines that can infer concepts and make autonomous decisions in the context of constantly evolving situations.

In reinforcement learning, an agent (the neural network) interacts with its environment and receives feedback based on that interaction in the form of penalties or rewards. Through this feedback, it learns from its experiences and constructs a set of rules that will enable it to reach the best possible outcomes.

In developing such resilient bio-inspired systems, what’s needed is hardware with plasticity, i.e., the ability to adjust its state based on specific inputs and rules, as in the case of biological synapses. The lack of such commercial hardware is one of the current main limitations in implementing systems capable of learning from experience in an efficient way.

NVMs for in-memory computing

Researchers are now looking at non-volatile memories (NVMs) like ReRAM to enable hardware plasticity for neuromorphic computing. ReRAM is particularly well-suited for use in hardware capable of plastic adaptation, as its conductance can be easily modulated by controlling few electrical parameters. We’ve talked about this previously in several papers and a recent demonstration.

When voltage pulses are applied, the conductance of ReRAM can be increased or decreased by set and reset processes. This is how ReRAM stores information. In the brain, synapses provide the connections between neurons, and they can change their strength and connectivity over time in response to patterns of neural activity. Because of this similarity, ReRAM (RRAM) arrays can be used to create artificial synapses in a neural network which change their strength and connectivity over time in response to patterns of input. This allows them to learn and adapt to new information, just like biological synapses.

In addition to their ability to mimic the plasticity of biological synapses, memristors like ReRAM have several other advantages for these systems. ReRAM is small, low-power, and can be fabricated using standard semiconductor manufacturing techniques in the backend-of-the-line (BEOL), making it easy to integrate into electronic systems.

Power and bandwidth

Deep learning is extremely computationally intensive, involving large numbers of computations which can be very power-hungry, particularly when training large models on large datasets. A great deal of power is also consumed through the high number of iterative optimizations needed to adjust the weights of the network.

Deep learning models also require a lot of memory to store the weights and activations of the neurons in the network, and since they rely on traditional computing architectures, they are impacted by communication delays between the processing unit and the memory elements. This can be a bottleneck that not only slows down computations but also consumes a lot of power.

In the brain, there are no such bottlenecks. Processing and storage are inextricably intertwined, leading to fast and efficient learning. This is where in-memory computing with ReRAM can make a huge difference for neural networks. With ReRAM, fast computation can be done in-situ, with computing and storage in the same place.

The maze runner

While memristor-based networks are not always as accurate as standard deep learning approaches, they are very well-suited to implementing systems capable of adapting to changing situations. In our joint paper with CEA-Leti and NEDL we propose a bio-inspired recurrent neural network (RNN) using arrays of ReRAM devices as synaptic elements, that achieves plasticity as well as state-of-the-art accuracy.

To test our proposed architecture for reinforcement learning tasks, we studied the autonomous exploration of continually evolving environments including a two-dimensional dynamic maze showing environmental changes over time. The maze was experimentally implemented using a microcontroller and a field programmable-gate-array (FPGA), which ran the main program, enabled learning rules and kept track of the position of the agent. Weebit’s ReRAM devices were used to store information and adjust the strength of connections between neurons, and also to map the internal state of each neuron.

Above: a Scanning Electron Microscope image of the SiOx RRAM devices and
sample photo of the packaged RRAM arrays used in this work

Our experiments followed the same procedure used in the case of the Morris Water Maze in biology: the agent has a limited time to explore the environment under successive trials, and once a trial starts, the sequence of firing neurons maps the movement of the agent.

Above: Representation of high-level reinforcement learning for autonomous
navigation considering eight main directions of movement

The maze exploration is configured as successive random walks which progressively develop a model of the environment. Here is how it generally progressed:

At the beginning, the network cannot find the solution and spends the maximum amount of time available in the maze.
As the network progressively maps the configuration of its environment, it becomes a master of the problem trial after trial, and it finally finds the optimum path towards the objective.
Once the solution is found, the network decreases the computing time with each successive attempt at solving the same maze configuration, because it remembers the solution.
Next, the maze changes shape and a different escape path must be found. As it attempts to find the solution, the network receives a penalty in unexpected positions. After an exploration period, it successfully gets to the target again.
Finally, the system comes back to the original configuration and the network easily retrieves the first solution – faster than before. This is thanks to the residual memory of the internal states and to the intrinsic recurrent structure.

Above: (left) the system re-learns quickly when presented with “maze 1” the second time; (right) ReRAM resistance can be easily modulated by using different programming currents, enabling some memory of the original maze configuration due to gradual adaptation of the internal voltage of the neurons

You can see a short video here showing the experimental setup and the hardware demonstration of the exploration of the dynamic environment via reinforcement learning.

In our paper, we go into much more detail on the experiments, including testing the hardware for complex cases such as the Mars rover navigation to investigate the scalability and reconfigurability properties of the system.

Saving space with fewer neurons

One of the key features that makes our implementation so effective is that it uses an optimized design based on only eight CMOS neurons, representing the eight possible directions of movement inside the maze. CMOS neurons are generally integrated in the front-end of line (FEOL) and require a large amount of circuitry, so that an increase in the number of neurons is associated to an increase in area/cost.

In our system, the ReRAM, acting as the threshold modulator, is the only thing that changes for each explored position in the maze, while the remaining hardware of the neurons remains the same. For this reason, the size of the network can be increased with very small costs in terms of circuit area by increasing the amount of ReRAM – which is dense and easily integrated in the back-end-of-line (BEOL).

Our bio-inspired approach shows far better management of computing resources compared to standard solutions. In fact, to carry out an exploration at a certain average accuracy (99%), our solution turns out to be 10 times less expensive, as it requires 10 times less synaptic elements (the number of computing elements is directly proportional to the area/power consumption).

Above: Thanks to the reinforcement learning, the energy consumed by
each neuron drastically decreases as more and more trials are allowed

Key Takeaways

Deep learning techniques using standard Von Neumann processors can enable accurate autonomous navigation but require a great deal of power and a long time to make training algorithms effective. This is because the environmental information is often sparse, noisy and delayed, while training procedures are supervised and require direct association between inputs and targets during the backpropagation. This means that complex models of convolutional neural networks are needed to numerically find the best combination of parameters for the deep reinforcement computation.

Our proposed solution overcomes the standard approaches used for autonomous navigation using ReRAM based synapses and algorithms inspired by the human brain. The framework highlights the benefits of the ReRAM-based in-situ computation including high efficiency, resilience, low power consumption and accuracy.

Since biological organisms draw their capability from the inherent parallelism, stochasticity, and resilience of neuronal and synaptic computation, introducing bio-inspired dynamics into neural networks would improve robustness and reliability of artificial intelligent systems.

Read the entire paper here: A self-adaptive hardware with resistive switching synapses for experience-based neurocomputing.

The post AI Reinforcement Learning
with Weebit ReRAM appeared first on Weebit.

In-Memory Computing for AI Similarity Search using Weebit ReRAM

Amir Regev — Thu, 22 Dec 2022 08:37:09 +0000

We recently collaborated with our friends at IIT-Delhi, led by Prof. Manan Suri, on a research project demonstrating an efficient ReRAM based in-memory computing (IMC) capability for a similarity search application. The demonstration was done on 28nm ReRAM technology developed by Weebit in collaboration with CEA-Leti. A paper based on this work, “Fully-Binarized, Parallel, RRAM-based Computing Primitive for In-Memory Similarity Search,” was published in IEEE Transactions on Circuits and Systems II: Express Briefs.

A bit of background: CAMs in AI/ML search applications

Associative memories, also called Content Addressable Memories (CAMs), are an important component of intelligent systems. CAMs perform fast search operations by accepting a query and performing a search over multiple data points stored in memory to find one or more matches based on a distance metric, and then return the locations of the matches. This information can be potentially used for applications such as nearest neighbor searches for classification or unsupervised labeling. Ternary Content-Addressable Memory (TCAM) is a type of CAM that incorporates a “don’t care condition” to assist searches for partial matches and is therefore the most commonly used type of CAM.

TCAMs offer a powerful in-memory computing paradigm for efficient parallel-search and pattern-matching applications. With the emergence of big data and AI/ML, TCAMs have become a promising candidate for a variety of edge and enterprise data-intensive applications. In the research project, we proposed a scheme that demonstrates the use of TCAMs for performing hyperspectral imagery (HSI) pixel matching in the context of remote-sensing applications. TCAMs can also be used to enable applications such as biometrics (facial/iris/fingerprint recognition) and to assist in string matching for large scale database searches.

Traditionally, CAMs/TCAMs are designed using standard memory technologies such as SRAM or DRAM. However, these volatile memory-based circuits have performance limitations in terms of search energy/bit (a metric commonly used for evaluating the performance of CAM circuits), and CAMs based on SRAMs are limited in scale due to relatively large cell areas.

ReRAM can overcome performance limitations

CAM performance limitations can be addressed by using an emerging NVM (Non-Volatile Memory) technology like ReRAM instead of volatile memory technologies. Because ReRAM can help reduce power consumption and cell size, it can be used to build compact and efficient TCAMs. Such NVM devices also reduce circuit complexity and provide opportunity to exploit low-area analog in-memory computing), leading to increased design flexibility.

In the recent paper, the joint IIT-Delhi/Weebit team presented a hardware realization for CAM using Weebit ReRAM arrays. In particular, the researchers proposed an end-to-end engine to realize IMSS (In-Memory Similarity Search) in hardware by using ReRAM devices and binarizing data and queries through a custom pre-processing pipeline. The learning capability of the proposed ReRAM based in-memory computing engine was demonstrated on a hyperspectral imagery pixel classification task using the Salinas dataset, demonstrating an accuracy of 91%.

Above: Figure showing energy efficient classification of agricultural land from hyperspectral imagery using proposed In-Memory Computing Technique.

The team experimentally validated the system on fabricated ReRAM arrays, with full-system validation performed through SPICE simulations using an open source SkyWater 130nm CMOS physical design kit (PDK). We were able to significantly reduce the computations required and improve the speed of computations, leading to benefits in terms of both energy and latency. By projecting estimations to advanced nodes (28nm), we demonstrated energy savings of ~1.5x for a fixed workload compared to the current state-of-the-art technology.

You can access the full paper here.

The post In-Memory Computing for AI
Similarity Search using Weebit ReRAM appeared first on Weebit.

Weebit ReRAM Results: High Temperature Stability at 28nm

Gabriel Molas — Thu, 16 Jun 2022 12:09:07 +0000

As embedded memories move below 28nm process geometries, it is becoming more and more complex and expensive to scale standard memories that are charge-based (those that store data as an electrical charge like flash) and integrate them with advanced CMOS nodes.

There is high demand to scale memory nodes further for applications like microcontrollers (MCUs) for automotive and other markets, as well as artificial intelligence (AI) applications. This is driving the industry search for non-volatile memory (NVM) alternatives to embedded flash.

Memories that are integrated in the back end of line (BEOL) of the manufacturing process like ReRAM are attracting growing interest because, unlike flash, these memories don’t interfere with the integration of analog components in the front end and are therefore easier to integrate with the design. With its faster speed, lower power consumption and lower additional mask count compared to flash memory, ReRAM is increasingly of interest to the industry, and great strides are being made in its development into a mature technology.

Together with CEA-Leti, Weebit recently published a paper, “High temperature stability embedded ReRAM for 2x nm node and beyond,” outlining performance results of Weebit ReRAM in 28nm. The results highlight some of the many advantages of Weebit ReRAM. We presented these results at the recent International Memory Workshop (IMW) 2022.

High temperature stability

For many applications such as automotive, aerospace and defense and others, memory retention and stability at high temperatures is critical. In ReRAM technologies, retention failures are usually caused by filament dissolution resulting from motion and recombination of oxygen vacancies, which happens as temperatures increase, causing the atoms to move. This can cause the filament to become less conductive or to dissolve, leading to increased resistance of the memory cell.

Weebit has optimized the materials and engineered the Weebit ReRAM stack to limit the motion of the oxygen vacancies caused by increased temperatures, making the filament more stable over time.

Our results show that Weebit ReRAM in 28nm achieves a low raw bit error rate (BER) without any need for error correcting code (ECC) or redundancy. Specifically, we show the ability of Weebit ReRAM to maintain a stable memory window after 15 hours’ bake at 210°C after 10,000 cycles. To the best of our knowledge, this is one of the best results ever reported for any company’s ReRAM.

Image: Weebit ReRAM maintains a stable memory window after 15 hours’ bake at 210°C after 10,000 cycles (without using a Program & Verify algorithm).

The importance of endurance

While endurance of 10,000 programming operations is sufficient to address many applications today, endurance above 100,000 operations is required for any NVM to be a contender for next-generation applications, with some applications even requiring up to 1 million operations.

In ReRAM technologies, endurance failures happen when the resistance window narrows. This can happen when there is degradation (of the dielectric material on the switching layer due to defect generation. This can cause resistance to drop to an intermediate resistance between the High Resistive State (HRS) and Low Resistive State (LRS) or be stuck at the LRS.

Image: In the ReRAM forming step, a positive voltage is applied on the switching layer, creating the filament and changing the resistance of the oxide layer to LRS (read more here). Endurance failures happen when the resistance window collapses.

In any NVM, as the number of Program/Erase cycles increases, endurance failures may occur. To avoid this, it’s critical to make optimizations such as adjusting programming conditions, optimizing the programming energy, and trading off with the window margin.

Since electrical current flowing through the dielectric degrades that material over time, the key is to provide enough current to create a robust filament, but no more energy than that. Because we know how parameters such as voltage and time will impact the resistance window, we’ve made optimizations in Weebit ReRAM to ensure we are providing enough current to create the filament but not wasting energy that can degrade the dielectric. We filed for several patents related to how we implement these optimizations.

In our paper, we show that Weebit ReRAM can endure more than 10⁵ cycles with no memory degradation, and no failure on 16kb arrays. This was achieved based on a single programming pulse without using a program and verify (P&V) algorithm (which would have enabled adjustments to be made throughout the cycling). The results are based on raw data, reflecting the intrinsic quality of the Weebit ReRAM technology and memory stack. In addition, 10⁶ cycles are achievable with some optimizations.

Other Key Findings

The new paper outlines many other findings based on a wide array of tests. One example is testing for solder reflow compliance. Since most assembly processes require short bursts of very high-temperature soldering (up to three cycles), it’s critical that the NVM retain its programmed data during this process. NVM technology must sustain the Pb-free solder reflow profile as described on JEDEC standards (IPC/JEDEC J-STD-020D.1), with a 260°C temperature peak. Weebit ReRAM passed basic (3x reflow) and extended (9 cycles) SMT (Surface Mount Technology) tests with zero failures.

In addition to tests that show raw data – highlighting the intrinsic performance aspects of Weebit ReRAM – we also showed how technology enhancements and forming protocol optimizations can optimize device performance. And the paper highlights the results we achieved using a P&V (Program and Verify) algorithm. In a real product, such an algorithm is used to make optimizations/reprogramming tweaks after each programming operation. Using a P&V algorithm, Weebit ReRAM achieved a clear window margin with zero failures on a 1Mb array.

The conclusion of all of our tests show that Weebit ReRAM is a highly reliable ReRAM technology when integrated in 28nm. As we continue our progress toward productization, and also work on 22nm and below, these results go a long way in showing that customers can be confident in engaging with Weebit for next-generation designs.

Read the paper.

By Gabriel Molas, Weebit Chief Scientist

The post Weebit ReRAM Results:
High Temperature Stability at 28nm appeared first on Weebit.