Full article: Design and investigation of computation-in-memory based low power hybrid MTJ/CMOS logic gates

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Hybrid magnetic tunnel junction (MTJ)/CMOS circuits based on the computation-in-memory (CIM) architecture are contemplated as the future generation of digital integrated circuits. It overcomes the limitations of von-Neumann architecture by offering solutions to problems such as memory wall and standby power dissipation. In this work, we have developed hybrid logic gates, such as AND/NAND, OR/NOR, and XOR/XNOR, for CIM architecture by integrating three terminal spin-Hall effect assisted spin transfer torque (SHE + STT) MTJs with standard CMOS. To write the MTJs an auto-write-stopping (AWS) circuit is adopted, whereas to perform the logic operations and produce the corresponding outputs, an improved sense amplifier circuit (ISA) is employed. All the hybrid logic gates are investigated for key performance indicators such as power, delay, device count, and power delay product (PDP). The results are compared with their conventional counterparts. The comparison reveals that the ISA + AWS-based hybrid gates dissipate 50.52% lower total power. The worst-case read delay of ISA + AWS hybrid AND/NAND, OR/NOR, and XOR/XNOR gates are 27.41%, 13.4%, and 21.28% lower. Meanwhile, the reduction of read PDP (write PDP) is 47.64% (37.09%), 25.78% (36.29%), and 39.31% (35.48%) observed with ISA + AWS hybrid AND/NAND, OR/NOR, and XOR/XNOR gates in comparison with the conventional counterparts. Hence the ISA + AWS gates are superior in terms of total power dissipation, worst read delay, and read/write PDP. Further, we have conducted Monte-Carlo simulations on all the logic circuits to study the parameter variations during fabrication.

Keywords:

Reviewing Editor:

Pham DT, University of Birmingham, United Kingdom

Subjects:

1. Introduction

The standard von-Neumann architecture has been the backbone of computing systems for many decades. However, due to the memory wall issues the von-Neumann architecture cannot meet the modern day requirements such as high throughput with low power computation for rapidly growing fields like artificial intelligence, big data, and 5G technologies (Collaert, Citation2023; Qin et al., Citation2023; Rajput et al., Citation2022). Meanwhile, scaling down the CMOS technology node below 90 nm at the device level has worsened the standby power dissipation in integrated circuits (ICs) due to the kick-in of secondary effects (IRDS™ 2022: Executive Summary - IEEE IRDS™, Citation2023). In addition, though the increased clock frequency has increased the operating frequency, thereby speeding up the computation process, it also increased the dynamic power dissipation. As an overall effect, total power dissipation in computing systems has risen (Lin et al., Citation2019). This increases the heat production in the ICs to its thermal limits to disrupt the expected behavior of the semiconductor devices (Barla et al., Citation2021). To tackle these problems, many efforts have been put forth in various literature at device and architecture levels (Bläsing et al., Citation2020; Finocchio et al., Citation2021). Combining computation-in-memory (CIM) with magnetic tunnel junction (MTJ) is the most prominent among them (Haensch et al., Citation2023; Raman et al., Citation2021; J. Wang et al., Citation2022; Yu et al., Citation2021). In CIM, computation capability is embedded into the MTJ itself; MTJ is nothing but a variant of magnetic random access memory (MRAM) that has a non-volatility nature. MRAM works on the principles of spintronics, where the electron’s charge and spin is also employed to store the information (Bandyopadhyay, Citation2011). In CIM, the MTJs store the information and participate in the computation process, breaking the separation between memory and computational units, thereby solving the memory wall problem. Due to its non-volatility nature, the power supply to the MTJs can be completely cut-off in standby mode without the necessary backup routine as in the case of standard memory units of von-Neumann architecture. When the power is restored, the previously stored information is readily available for computation (Barla et al., Citation2022b). This makes the MTJs, and CIM dissipates almost zero static power in the standby mode. Apart from non-volatility, the MTJs have several other advantages, such as large endurance, fast reading capability, high density, 3-D fabrication feasibility, and ease of integration with the existing CMOS technology over its counterparts (Hirohata et al., Citation2020; Joshi et al., Citation2020). Storing the information into the MTJ is also known as the MTJ writing or switching process. Generally, with conventional spin-transfer torque (STT) (Berger, Citation1996; Slonczewski, Citation1996) switching mechanism, MTJ write circuitry dissipates more than 95% of the total power as compared to the MTJ read circuitry, which consumes less than 5% (Barla et al., Citation2022a). To reduce the MTJ write power consumption, various contemporary switching mechanisms have been developed such as spin-orbit torque (SOT) (D’Yakonov & Perel’, Citation1971; Grimaldi et al., Citation2020), and voltage-assisted MTJ switching (VAMS) (Nozaki et al., Citation2010). From the commercialization point of view, STT switching mechanism is more attractive (Spin-transfer Torque DDR Products | Everspin, 2023). However, disadvantages like intrinsic incubation delay and reliability issues limit the usage of STT for low-speed applications only (Iga et al., Citation2012). In the case of VAMS, generating a precise voltage pulse is necessary for the switching process. In the IC environment generating such a precise voltage pulse is challenging due to thermal noise. This raises reliability concerns with the VAMS. Compared with STT and VAMS, the SOT switching mechanism in three-terminal devices is most appropriate. Hence we have chosen SOT MTJ devices for our work. However, one needs more clarity regarding the exact causes of SOT, whether it is the spin-Hall effect (SHE) (Hirsch, Citation1999) or the Rashba effect (Bychkov & Rashba, Citation1984). In our article we consider the widely accepted viewpoint that SHE causes SOT. Ref (van den Brink et al., Citation2014) proposes the SHE assisted STT (SHE + STT) switching mechanism with significant improvement in writing speed and power consumption compared to its counterparts. Many applications are developed using the conventional SHE + STT switching mechanism and its corresponding write circuitry (Amirany & Rajaei, Citation2019; Wang et al., Citation2015). However, with conventional SHE + STT writing circuitry, there is a wastage of write power which is been identified and an auto-write-stopping (AWS) circuitry has been proposed in ref. (Barla et al., Citation2022c). With AWS circuitry, there is a significant saving of write power compared to the conventional one. Meanwhile, though several sense amplifier circuits were developed to read the information from the MTJs, these circuits suffer from delayed output response and higher power dissipation. These problems were effectively addressed in the ref. (Thapliyal et al., Citation2018) by proposing an improved sense amplifier (ISA). To our understanding, incorporation of ISA and AWS circuitry have not been carried out to date to develop logic gates and evaluate its behavior. In this regard, we have integrated the ISA and AWS circuitry with all the logic gates such as AND/NAND, OR/NOR, and XOR/XNOR, and simulations are carried out. Performance indicators such as number of devices used, read/write delay, read/write power dissipation and read/write power delay product (PDP) have been studied by comparing them with its conventional counterparts. Monte-Carlo (MC) simulations were also carried out on the circuits to have an understanding of the process and mismatch variations that would occur during the fabrication with CMOS and extracted parameters of the MTJ.

The remainder of the article is structured into various sections. Section 2 briefs about the structure of SHE + STT MTJ device with its switching mechanism and hybrid MTJ/CMOS CIM. In section 3 we have provided the design and working of hybrid SHE + STT MTJ/CMOS logic gates along with details about sense amplifier and MTJ write circuits. Section 4 presents the simulation and discussion of the results. Finally, conclusion is presented in the section 5.

2. Background

2.1. SHE + STT MTJ

The SHE + STT MTJ is a three-terminal, four-layered device as shown in the . Here an oxide (MgO) layer is sandwiched between two ferromagnetic (CoFeB) layers forming CoFeB/MgO/CoFeB structure. The entire structure is mounted on a heavy metal (HM) consisting of high spin orbit interaction. The upper CoFeB is called the fixed layer, and the lower is the free layer. The fixed layer’s magnetic orientation is pinned to a particular direction. In contrast, the magnetic orientation of the free layer can be changed; either in the same direction or in the opposite, in relation to the fixed layer. If the magnetic orientation of both fixed and free layers points in the same direction then we say the MTJ is in parallel or low resistance (R_P) state. Otherwise, when the fixed and free layers point in the opposite direction, MTJ is in antiparallel or high resistance (R_AP) state. By passing a read current (I_R) between T1-T2/T3 terminals the status of the MTJ, that is, either R_AP or R_P can be read/sensed. MTJs with R_AP and R_P are utilized to store binary information permanently. The ratio of difference between R_AP and R_P is termed as tunnel magnetoresistance (TMR) and is given as, (1) $TMR = \frac{R_{AP} - R_{P}}{R_{P}} .$ (1)

Figure 1. SHE + STT device structure and its switching mechanism for (a) AP to P and (b) P to AP configuration (Barla et al., Citation2021).

Altering the state of the MTJ from R_AP to R_P or vice versa is also called as MTJ switching/writing, which is obtained by changing the free layer’s magnetic orientation to the fixed layer. According to modified LLGS equation the free layer’s magnetic dynamics (Wang et al., Citation2015) is expressed as, (2) $\frac{\partial \vec{m}}{\partial t} = - γ μ_{0} \vec{m} \times {\vec{H}}_{eff} + α \vec{m} \times \frac{\partial \vec{m}}{\partial t} - \frac{γ ℏ}{2 e t_{F} M_{S}} P J_{STT} \vec{m} \times (\vec{m} \times {\vec{m}}_{r}) - ξ η J_{SHE} \vec{m} \times (\vec{m} \times {\vec{σ}}_{SHE})$ (2) where μ₀ is free space permeability, $H_{eff}$ is the effective magnetic field, $\vec{m}$ and $\vec{m_{r}}$ are free and fixed magnetic moments respectively, J_SHE and J_STT are SHE and STT current density respectively, M_S, is saturation magnetization, e is electron charge, t_F is free layer thickness, $γ ℏ$ is reduced Planck constant, ${\vec{σ}}_{SHE}$ represents the polarization direction of pure spin current injected into the free layer, P is spin polarization, η is spin-Hall angle, α Gilbert damping constant. Change in magnetic moment assessed by, the precession, spin transfer torque, Gilbert damping torque and spin-Hall effect. To perform MTJ writing, a write current (I_W) has to be forced through the MTJ. The I_W is the combination of SHE current (I_SHE) and STT current (I_STT). The I_SHE with current density J_SHE flows between T2-T3 terminal whereas the I_STT with current density J_STT flows between T1-T2 or T1-T3 terminals. shows the switching of MTJ from R_AP to R_P with SHE + STT writing mechanism. Here J_SHE flowing from T3 to T2 terminal in Y-direction (i.e. via HM) produces spin current (J_SC) in the Z-direction. The $J_{SC}$ then enters into the free layer and shifts its magnetic orientation from -Z-direction to XY plane by exerting a SOT. At the same time J_STT flowing from T1 to T2 terminal exerts STT onto the magnetic orientation which is current in XY plane to shift it to Z-direction. As an overall effect the switching of MTJ is completed. In order to switch the MTJ from R_P to R_AP the I_STT direction is reversed as shown in the .

2.2. Computation in memory

The idea of CIM was first proposed by Kautz (Citation1969). Later Stone (Citation1970) has developed a structure for the cache-organized computer based on the concept of near-memory computation. However, these ideas were developed with conventional volatile memory, and due to technological limitations in the later years, it did not attract much attention. Since the evolution of spintronic devices and its feasibility to integrate with CMOS devices, the CIM was revamped to draw the much-needed attention of both academia and industry. Spintronic devices, especially MTJs, alleviate the problems of increased leakage current in CMOS and delay in the lengthy interconnect by positioning above the CMOS logic (Barla et al., Citation2021; Spintronics-based Computing, Citation2015). Furthermore, 3D stacking feasibility of MTJs also assists in increasing the circuit integration density. With MTJs, by virtue of its non-volatility, we can cut off the power supply to the CIM structure without losing the stored information. And when the power is restored back, the stored information is readily available for the computation without the need for writing it again. A generic CIM structure with MTJs is presented in . It has three major sections, (a) sense amplifier, (b) logic network (LN); which is the combination of MOS tree and MTJs, (c) MTJ write circuit. The sense amplifier reads logic information presented by LN to produce output (OUT+) and its complement (OUT-). The LN performs the logic operations on input data. The MTJ write circuit is used write the information bit into the MTJs. Here a pair of MTJs are employed to store one bit information. When the MTJ pair is in AP-P configuration, we assume that bit ‘0’ is stored. On the contrary, when the MTJ pair is in P-AP configuration, we assume bit ‘1’ is stored. Integration of MTJs with the MOS tree in the LN is feasible due to the resistance compatibility of MTJs with MOS. For example, if R_Close and R_Open are the closed (ON) and open (OFF) MOS resistances, respectively, then $R_{Close} < R_{P}$ and R_Open > R_AP. Hence the total resistance of the left arm (R_LA) and right arm (R_RA) in LN is the combined resistances of MOS and MTJs. The R_LA and R_RA have an inverse relationship with the reading current in left and right arms; I_LA and I_RA respectively, and decides the current flow rate to produce OUT + and OUT-. For example when branch current I_LA > I_RA, then OUT+ = ‘1’ and OUT− = ‘0’, otherwise if branch current I_LA < I_RA the OUT+ = ‘0’ and OUT− = ‘1’ is produced.

Figure 2. Block diagram of CIM structure, comprising of sense amplifier, logic network with MOS tree, and MTJs. MTJ write circuit is used to change the configuration of MTJs.

3. Hybrid SHE + STT MTJ/CMOS logic gates

3.1. Sense amplifier circuit

One crucial issue in the hybrid MTJ/CMOS circuits is the integration of MTJs with CMOS devices. The information stored in the MTJs needs to be sensed quickly to perform the logic operations. Many efforts are carried out in this direction and various sense amplifiers have been proposed for STT MTJs, as shown in the . Black and Das (Citation2000) proposed five transistor SRAM based sense amplifier () for magnetic non-volatile flip-flop. Then ten transistors/one capacitor Dynamic Current-Mode (DCM) logic sense amplifier () was proposed by Mochizuki et al. (Citation2005) to develop a non-volatile full adder. However, these sense amplifiers suffer with reduced reliability below the 90 nm CMOS technology. Further, the MTJs used in these circuits should have high TMR. Hence restricting usage of these sense amplifier for hybrid MTJ/CMOS circuits for CIM. To improve the reliability Zhao et al. (Citation2009) has proposed pre-charged sense amplifier (PCSA) () circuit. Later Kang et al. (Citation2014) developed separated precharge sensing amplifier (SPCSA) () circuit. Whereas Zhang et al. (Citation2017) presented Reliability-enhanced SPCSA (RESPCSA) () circuit. The PCSA, SPCSA, and RESPCSA significantly improved reliability and reduced the area on silicon with nominal sensing power. However, there was not much improvement in the sensing latency. The much-needed reduction in the sensing latency and reduction in power dissipation for hybrid MTJ/CMOS circuits was achieved with the sense amplifier circuit proposed by Thapliyal et al. (Citation2018) (). This sense amplifier also reduced power/energy consumption as compared to its predecessors and has acceptable reliability for hybrid MTJ/CMOS circuits. Therefore we have adopted the sense amplifier developed in ref. (Thapliyal et al., Citation2018) and address as an improved sense amplifier (ISA) in this work. The ISA uses only six MOS transistors, whereas PCSA, SPCSA, and RESPCSA use 7, 15, and 17 transistors, respectively. The increased number of transistors in PCSA, SPCSA, and RESPCSA increase the power dissipation and the device count. Apart from that, the pre-charge phase of ISA (discussed later in this section) produces the OUT+/OUT − as Vdd-Vth. Meanwhile, with all the other sense amplifiers, we obtain OUT+/OUT − as Vdd, which is higher than ISA. Due to this, in the evaluation phase, the lower voltage of ISA, Vdd-Vth, is discharged quickly, producing a quicker output response than other sense amplifiers.

Figure 3. Various sense amplifier circuits developed to read the information stored in the MTJ pair. MTJ0 and MTJ1 are at opposite states. SS (‘0’ or ‘1’) is used to control the mode of operation. Schematic for (a) conventional SRAM based sense amplifier, (b) ten transistors/one capacitor DCM sense amplifier, (c) PCSA, (d) SPCSA, (e) RESPCSA, (f) ISA. Note that these sense amplifiers are developed for STT MTJs. The same sense amplifier circuits can be used for SHE + STT MTJs as their reading mechanism is same.

The ISA works in sync with a sense signal (SS). To understand the operation of ISA, consider the . When the SS= ‘0’ ISA works in pre-charge mode otherwise, when the SS= ‘1’, ISA is in evaluation mode. In the pre-charge mode inputs are to be applied and held at a particular voltage level. Now as SS= ‘0’, transistor MP2 is closed, and MN2 is open. MP3 facilitates the node voltages of OUT + and OUT − to be shared and attained to be at Vdd-Vth. In the evaluation mode, as SS= ‘0’ the transistor MN2 is closed and provides discharge paths for both OUT + and OUT−. Here if we assume MTJ0-MTJ1 pair is in P-AP configuration, then the resistance in the left branch is lower than the right one, initiating a quicker left branch discharge current (I_LB) compared to right branch discharge current (I_RB). So the OUT − node is pulled down to ground (Gnd). As a result, transistor MN1 is open, and MP1 remains closed, raising the OUT + node voltage to Vdd. Hence the reading of bit ‘1’ performed when the MTJ0-MTJ1 pair is in P-AP configuration. On the contrary, when MTJ0-MTJ1 pair is in AP-P configuration, OUT − and OUT + nodes are pulled to Vdd and Gnd, respectively to read bit ‘0’.

3.2. SHE + STT MTJ write circuit

Conventional write (CW) circuit for SHE + STT MTJ devices was presented in the ref. (Wang et al., Citation2015). In this circuit, the combination of I_SHE together with I_STT completes the writing of the information into the MTJs. However, the major drawback of CW circuit is the continued flow of I_STT even after the completion of MTJ writing, causing wastage of write power. To save the wastage of write power an AWS circuit has been developed (Barla et al., Citation2022c) and is shown in the . In this circuit, the flow of I_STT post the MTJ write is stopped, thereby eliminating the wastage of write power. The write circuit has three parts; write core, control circuit, self-termination circuit. In the write core, I_SHE and I_STT is facilitated through MTJ pair, MTJ0-MTJ1 to write the necessary bit information. The controls circuit is used to control the flow of I_SHE and I_STT in the write core. shows the status of the MTJ pair for the different input signal combinations. The self-termination circuit works in sync with write core and control circuit, that is, it continuously monitors the information stored in the MTJ pair. Once the necessary information is written into the MTJ pair, the self-termination circuit signals the control circuitry to stop the flow of I_STT. This can save a significant amount of write power. Let us look at the circuit working by assuming that bit ‘0’ is stored, and we want to write bit ‘1’ in the MTJ pair. This means we need to change the configuration of the MTJ pair from AP-P to P-AP. According to the the signals write enable (WE), Input, and ESHE are made as ‘111’. So the control circuit output STTP + and STTN + are ‘11’, respectively. At the same time, EWRT is at ‘1’ because the node voltage A in the self- termination circuit is ‘0’. As a result there is a flow of I_SHE and I_STT in the path Vdda-P1-TG4-MTJ1-MTJ0-TG1-N0-Gnd and Vdda-P1-TG3-MTJ1-MTJ0-TG0-N0-Gnd respectively. Meanwhile, the MTJ pair enters into the metastable state. After 200ps the ESHE is disabled to stop the flow of I_SHE whereas the I_STT continues to flow. The I_STT switches the MTJ pair from AP-P to P-AP, thereby writing the bit ‘1’. Meanwhile, the change of MTJ configuration triggers the node voltage N as 1 →0. This can be understood by looking into the EquationEquations (3)(3) $V_{H} = \frac{Vdda \times R_{AP}}{R_{AP} + R_{P}} .$ (3) and Equation(4)(4) $V_{L} = \frac{Vdda \times R_{P}}{R_{AP} + R_{P}} .$ (4) . Before the completion of writing, the MTJ0-MTJ1 were in AP-P configuration, respectively. Hence the node voltage at N is given by EquationEquation (3)(3) $V_{H} = \frac{Vdda \times R_{AP}}{R_{AP} + R_{P}} .$ (3) as, (3) $V_{H} = \frac{Vdda \times R_{AP}}{R_{AP} + R_{P}} .$ (3)

Figure 4. Schematic of AWS write circuitry constituting (a) control circuit, (b) write core and (c) self-termination circuit. Direction of I_STT and I_SHE are marked during writing the information bit ‘0’ and ‘1’.

Table 1. Various input signals, intermediate signals, write enabler outputs and the corresponding MTJ states for the AWS circuit.

Display Table

After the completion of the writing process, MTJ configuration changes to AP-P. Due to which the node voltage at N is changes as given in EquationEquation (4)(4) $V_{L} = \frac{Vdda \times R_{P}}{R_{AP} + R_{P}} .$ (4) as, (4) $V_{L} = \frac{Vdda \times R_{P}}{R_{AP} + R_{P}} .$ (4)

(Note: ON resistances of MOS are ignored in EquationEquations (3)(3) $V_{H} = \frac{Vdda \times R_{AP}}{R_{AP} + R_{P}} .$ (3) and Equation(4)(4) $V_{L} = \frac{Vdda \times R_{P}}{R_{AP} + R_{P}} .$ (4) ). The self-termination circuit detects and processes the V_H and V_L voltages and is reflected at EWRT as ‘0’. When EWRT = ‘0’ the TG0 and TG3 are disabled. In addition STTP + and STTN + become ‘1’ and ‘0’ respectively to stop the flow of I_STT even when the EW is high. So the wastage of write power is prevented with AWS circuitry. In a similar manner writing bit ‘0’ when bit ‘1’ is being stored in the MTJ pair can be understood following the .

3.3. Hybrid SHE + STT MTJ/CMOS logic gates

Figure 5. Hybrid SHE+STT MTJ/CMOS logic gates for CIM structure, showing (a) ISA, (b) MOS tree for: XOR/XNOR, (h) NAND/AND and (g) OR/NOR, (c) logic network, (d) write core, (e) self termination circuit.

We have developed two variants of the hybrid SHE + STT MTJ/CMOS logic gates, that is, (a) conventional (CW) hybrid logic gates and (b) ISA + AWS logic gates () that performs all the logic operations such as AND/NAND, OR/NOR and XOR/XNOR. The CW gates use conventional read (PCSA) and write circuit whereas the ISA + AWS gates use ISA for reading, and AWS write circuit for writing. The LN is common in both variants of the logic gates. ISA + AWS gates work in two phases, pre-charge and evaluation. In the pre-charge phase, SS= ‘0’ and all the outputs and corresponding complements are at high logic. In this period, input A is applied to the MOS logic tree and input B is stored into the MTJ pair using their AWS write circuitry. Writing the input B with AWS write is explained in the subsection 3.2. In the evaluation, when phase SS= ‘1’, the inputs in the LN are evaluated to produce output and its complements. We take the example of XNOR/XOR gate for the explanation purpose. Consider A = B = ‘1’, so the transistors MN4, MN6 are ON whereas MN3 and MN5 are OFF. The MTJ pair is in P-AP configuration. XNOR and XOR nodes have discharge paths to ground (Gnd) via RA and LA, respectively. The total R_LA resistance is $(R_{Close} + R_{AP}),$ meanwhile the total R_RA resistance is $(R_{Close} + R_{P}) .$ Comparison of R_LA and R_RA suggests that path resistance of R_RA < R_LA. So XOR node (via RA) discharges more quickly than XNOR node (via LA), resulting in the XOR node reaching the threshold voltage of MN0. So, MN0 is open (OFF). Therefore XNOR becomes ‘1’ whereas XOR continues to discharge to Gnd and becomes ‘0’. Hence we obtain XNOR = ‘1’ and XOR = ‘0’. Similarly we can understand the circuit working for the rest of the input configurations. shows the truth table for various logic gates and corresponding resistance for MOS and MTJs.

Table 2. Truth table for various logic gates along with the corresponding path resistance for the LA and RA current in logic network.

Display Table

4. Simulation results and discussion

The simulation work carried out on Cadence (IC6.1.7-64b.500.19) tool with a 45-nm generic design kit. We have provided a supply voltage of 1 V and retained the default transistor parameters of L = 45 nm and W = 120 nm for all, except the write core transistors. The write core transistor width is set as 360 nm and a supply voltage (Vdda) of 1.25 V is provided. This supports the combined flow of I_SHE and I_STT in the write core. For the MTJ model, we have used the recently developed SHE + STT MTJ model written using Verilog-A language (Wang et al., Citation2015). The MTJ parameters used for the simulation work is presented in the .

Table 3. SHE + STT MTJ parameters set during the simulation. Rest of the parameters are retained as mentioned in ref. (Wang et al., Citation2015).

Display Table

The total energy consumption of CW and AWS write circuits during wring process is tabulated in . While calculating, we summed write core and control circuit energy together and presented it as total energy consumption. The table shows that the CW circuit consumes a constant energy of 1.43pJ during the write process. The AWS write circuit consumes 51.63 fJ, 51.58 fJ, and 37.49 fJ while writing bit ‘1’, bit ‘0’, and redundant write with an average of 46.93 fJ. The AWS write circuit consumes 96.71% less energy than the CW write circuit (Wang et al., Citation2015), proving its superiority.

Table 4. Write energy consumption of CW and AWS write circuits.

Display Table

A comparison of the simulated waveform obtained with the CW and AWS write circuits is presented in the . Between time T1 to T3 and T4 to T6, writing the information bit ‘0’ and ‘1’ respectively is shown. Whereas, from T7 to 60 ns redundant write condition is shown. Writing bit ‘1’ into the MTJ pair is initiated at T4 and it got completed T5. With AWS write circuit due to the CM and AWS process, the I_STT is stopped at T5, that is, soon after the completion of the MTJ writing. However, with the CW circuit, the I_STT continues to flow till the end of T6. Hence T6-T5 is the time period for which we have saved the wastage of write power compared to the CW write. Similarly during writing bit ‘1’ and redundant write condition, T6-T5 and 60 ns-T8 is the period during AWS write circuit stops the flow of I_STT to save the write power.

Figure 6. Waveform for the AWS write circuit showing (a) input (b) write enable (WE), (c) SHE signal (ESHE), (d) state of MTJ0 and (e) MTJ1, (f) node N voltage, output of (g) I4 inverter (EWT) and (h) write completion detector (EWRT),control circuit outputs (i) STTP and (j) STTN, (k) AWS I_STT and (l) conventional I_STT.

The performance comparison of ISA + AWS logic gates with the CW logic gates are in terms of various factors such as terms of read/write delay, read/write power dissipation, read/write PDP, transistor count is presented in the . shows the simulated waveform for ISA + AWS and CW logic gates. In all the ISA + AWS gates, that is, AND/NAND, OR/NOR, and XOR/XNOR, a continuous monitoring (CM) and auto-write-stopping process is incorporated. Whereas the CW gates do not have CM and AWS processes. The self-termination circuit executes the CM and AWS processes by utilizing the voltage node N (between MTJ pair) as discussed in the subsection 3.2. Consequently, to implement these processes, ISA + AWS based AND/NAND, OR/NOR and XOR/XNOR gates occupies 30.64% additional area compared to CW equivalent gates. Further, the control circuitry power dissipation is 346.6%, 351.56% and 349.94% more for AND/NAND, OR/NOR and XOR/XNOR in ISA + AWS gates, respectively, compared to CW gates. Note, we have included the self-termination circuit in the control circuitry while obtaining control circuitry power dissipation results. On an outset, though AWS control circuitry apparently dissipates more power than its CW counterpart, this power is in nW range. However maximum power dissipation in MTJ write takes place in the write core, which is in μW range. Looking at the write core, the ISA + AWS based logic gates dissipate 59.56% lower power than their CW counterparts, respectively. So the total write circuitry power dissipation (write core + control circuitry) for ISA + AWS based gates are 51.5% lower than its CW equivalents respectively ().

Figure 7. Various input and output waveforms for different hybrid logic gates. Between T1 to T3, one of the condition wherein information bit ‘1’ being written into the MTJ pair is shown. Initially MTJ pair (input B) is stored with information ‘0’. At T1, writing the bit ‘1’ into the MTJ pair was initiated. At time T2, writing the bit ‘1’ is complete. Due to the CM and AWS process, in AWS gates the I_STT current is stopped at T2 whereas in CW gate the I_STT continues to flow till T3. So T3-T2 is the time for which power is saved with ISA + AWS gates. In the pre-charge phase (SS = 0); ISA + AWS gate’s both output and its complement are at Vdd-Vth, whereas for CW gates it is Vdd.

Figure 8. Comparison of (a) read circuitry (b) write circuitry and (c) total (read + write circuitry) power dissipation for CW and ISA + AWS logic gates.

Table 5. Performance comparison between hybrid CW and ISA + AWS logic gates.

Display Table

We have employed ISA in our hybrid gates for producing AND/NAND, OR/NOR and XOR/XNOR outputs after performing the logic operations, whereas PCSA was employed in CW gates. The read circuitry (which includes, sense amplifier and LN) power dissipation in ISA + AWS based AND/NAND, OR/NOR and XOR/XNOR gates is 27.87%, 14.33% and 22.94% lower power than its CW equivalents respectively (). This is due to the fact that in the pre-charge mode of ISA the OUT + and OUT- attain Vdd-Vth whereas it is Vdd in PCSA. In the evaluation mode the lower voltage of ISA (i.e., Vdd-Vth) dissipates less power than the PCSA. So the he total power dissipation (includes read and write circuitry) of AND/NAND, OR/NOR and XOR/XNOR gates are 50.52% lower in ISA + AWS based gates respectively in comparison with CW gates ().

We have also analyzed the worst-case delays in the ISA + AWS and CW logic gates. The worst-case read delay in ISA + AWS based gates for AND/NAND, OR/NOR and XOR/XNOR are 27.41%, 13.4%, and 21.28% lower than its counterpart CW gates respectively (). A lower worst-case read delay is achieved due to the utilization of ISA. This is because, in the evaluation mode, ISA required shorter duration to pull-down the Vdd-Vth voltage (seen at the outputs in every pre-charge mode) compared to the Vdd voltage in the case of CW logic gates. The worst-case write delay in ISA + AWS based gates for AND/NAND, OR/NOR, and XOR/XNOR are 32.64%, 32.4%, 33.39% and more than CW based gates respectively (). The write delay in ISA + AWS based gates are more because there is a single common path for the I_STT that switches the MTJ pair, and it needs to flow through two MOS to MTJs and two transmission gates (TG). In conventional circuits, there are two separate write current paths for writing MTJs, and in these individual paths, the I_STT flows through two MOS and one MTJ. So the magnitude of ISA + AWS gate’s I_STT is smaller than CW gate’s I_STT magnitude. Further, a part of AWS I_STT is utilized in the write completion detector for CM process. This also reduces the magnitude of AWS I_STT resulting in the delayed switching of the MTJ pair in ISA + AWS gates as compared to its CW gates.

Figure 9. Comparison of worst case (a) read and (b) write delay for CW and ISA + AWS logic gates.

With read/write power and delay, we have analyzed the PDP for the read/write circuitry for all the logic gates. With lower read circuitry power dissipation and worst-case read delay, ISA + AWS based AND/NAND, OR/NOR and XOR/XNOR produce a read PDP which is 47.64%, 25.78% and 39.31% lower as compared with its CW based equivalent gates (). As the write power in ISA + AWS based gates are lower, the write PDP of AND/NAND, OR/NOR and XOR/XNOR for ISA + AWS gates are 37.09%, 36.29% and 35.48% lower, respectively than its CW counterparts ().

Figure 10. Worst case (a) read and (b) write PDP comparison between CW and ISA + AWS logic gates.

Upon exploration, we found that the performance of such as read circuitry power dissipation, read delay and read PDP of the hybrid logic gates rely on width of the pull-down (PD) transistor MN2 (). So we have varied the width of PD transistor of all the ISA + AWS logic gates from its default value that is, 120 nm to 1.2 μm and plotted its response which is presented in the . Here we can observe that the read circuitry’s power dissipation rises as the PD width increases for all the hybrid gates. However, at the same time, read delay reduces with the increase of PD’s width. This is because a wider PD provides a quicker discharge path for the reading current which was discussed in the subsection 3.1.

Figure 11. Dependence of read circuitry’s power dissipation, delay and PDP on size of PD transistor for ISA + AWS (a) AND/NAND, (b) OR/NOR, and (c) XOR/XNOR logic gates.

Meanwhile, to explore the effect of write core transistor width, we have varied its width from 360 to 720 nm and noted the power dissipation which is shown in the . Here we can note that as the width of the write core transistor is increased, the write circuitry power dissipation increases, increasing the total power dissipation of ISA + AWS logic gates.

Figure 12. Variation of write circuitry and total power dissipation with the respect to width of the write core transistor.

Furthermore, to study the effect of PVT variation during the fabrication, we have performed MC simulation on CW and ISA + AWS gates at an early stage of the design. During MC simulations we have incorporated CMOS and 3% variations in the TMR, thickness of the oxide layer, and free layer of the MTJ that follow Gaussian distributions. and shows the MC simulation of CW and ISA + AWS logic gates respectively. In the we have presented the comparison of power dissipation between CW and ISA + AWS gates. The table shows that ISA + AWS gates always dissipate lower power than their counterpart CW gates.

Figure 13. MC simulation showing the power dissipation in CW: AND/NAND (a) LIM circuitry, (b) Write circuitry, (c) total power; OR/NOR (d) LIM circuitry, (e) Write circuitry, (f) total total; XOR/XNOR (g) LIM circuitry, (h) Write circuitry, (i) total power.

Figure 14. MC simulation showing the power dissipation in ISA + AWS: AND/NAND (a) LIM circuitry, (b) Write circuitry, (c) total power; OR/NOR (d) LIM circuitry, (e) Write circuitry, (f) total total; XOR/XNOR (g) LIM circuitry, (h) Write circuitry, (i) total power.

Table 6. MC simulation revealing the various power dissipation for CW and ISA + AWS logic gates with 200 iterations.

Download CSV Display Table

5. Conclusion

This article presents the design and an in-depth analysis of ISA + AWS hybrid logic gates such as AND/NAND, OR/NOR, and XOR/XNOR. We have employed the AWS writing mechanism for writing the information into the MTJ pair, whereas ISA is used to perform CIM for logic gates. Though the AWS logic gates require few additional transistors, the simulation results suggest that improvement in the write circuitry’s power dissipation and PDP is significantly more. Meanwhile, ISA circuitry employed to perform CIM operation reduces the read circuity’s power dissipation, delay, and PDP. Based on the results, we can say ISA + AWS hybrid logic gates are advantageous for low-power applications such as embedded memory, neuromorphic computing, etc.

Author contributions

Prashanth Barla: Conceptualization (equal); Investigation (equal); Methodology (equal); Validation (equal); Writing – original draft (equal); Writing – review & editing (equal). Vinod Kumar Joshi: Conceptualization (equal); Methodology (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal). Somashekara Bhat: Supervision (equal); Validation (equal).

Supplemental material

Disclosure statement

No potential conflict of interest was reported by the authors.

Data availability statement

Data available on request from the authors.

Additional information

Notes on contributors

Prashanth Barla

Prashanth Barla received M.Tech. in Microelectronics & Control Systems from VTU, Belgaum, India and Ph.D. from MIT, Manipal, India. He is currently an Asst. Prof (Senior Sclae) with the Department of Computer Science & Engineering, at MIT, Manipal, India. His research interests include VLSI design, spintronics and its applications.

Vinod Kumar Joshi

Vinod Kumar Joshi received the M.Tech. degree from VIT University, Vellore, India, and the Ph.D. degree from Kumaun University, Nainital, India. He is currently an Associate Professor (Senior Scale) with the Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal, India. His main research interests include spintronics based VLSI and logic-in-memory-based hybrid nonvolatile logic circuits for low-power applications.

Somashekara Bhat

Somashekara Bhat received the Ph.D. degree in the field of MEMS from IIT Madras, India. He is currently serving as a Professor with the Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. His research interests include MEMS and electronics for biomedical applications.

References

Amirany, A., & Rajaei, R. (2019). Nonvolatile, spin-based, and low-power inexact full adder circuits for computing-in-memory image processing. SPIN, 9(3), 1950013. https://doi.org/10.1142/S2010324719500139
Web of Science ®Google Scholar
Bandyopadhyay, C. (2011). Introduction to spintronics. Boca Raton, FL: Taylor & Francis.
Google Scholar
Barla, P., Joshi, V. K., & Bhat, S. (2021). Spintronic devices: A promising alternative to CMOS devices. Journal of Computational Electronics, 20(2), 805–837. https://doi.org/10.1007/s10825-020-01648-6
Web of Science ®Google Scholar
Barla, P., Joshi, V. K., & Bhat, S. (2022a). Design and evaluation of a self write-terminated hybrid MTJ/CMOS full adder based on LIM structure. Journal of Circuits, Systems and Computers, 31(08), 2250146.
Web of Science ®Google Scholar
Barla, P., Joshi, V. K., & Bhat, S. (2022b). Fully nonvolatile hybrid full adder based on SHE + STT-MTJ/CMOS LIM architecture. IEEE Transactions on Magnetics, 58(9), 1–11. https://doi.org/10.1109/TMAG.2022.3187605
Web of Science ®Google Scholar
Barla, P., Joshi, V. K., & Bhat, S. (2022c). A novel auto-write-stopping circuit for SHE + STT-MTJ/CMOS hybrid ALU. IEEE Transactions on Magnetics, 58(9), 1–11. Febhttps://doi.org/10.1109/TMAG.2022.3187605
Google Scholar
Berger, L. (1996). Emission of spin waves by a magnetic multilayer traversed by a current. Physical Review. B, Condensed Matter, 54(13), 9353–9358. https://doi.org/10.1103/physrevb.54.9353
PubMed Web of Science ®Google Scholar
Bläsing, R., Khan, A. A., Filippou, P. C., Garg, C., Hameed, F., Castrillon, J., & Parkin, S. S. P. (2020). Magnetic racetrack memory: From physics to the cusp of applications within a decade. Proceedings of the IEEE, 108(8), 1303–1321. https://doi.org/10.1109/JPROC.2020.2975719
Web of Science ®Google Scholar
Black, W. C., & Das, B. (2000) Programmable logic using giant-magnetoresistance and spin-dependent tunneling devices (invited). Journal of Applied Physics, 87(9), 6674–6679. https://doi.org/10.1063/1.372806
Web of Science ®Google Scholar
Bychkov, Y. A., & Rashba, É. I. (1984). Properties of a 2d electron gas with lifted spectral degeneracy. JETP Letters, 39(2), 78.
Web of Science ®Google Scholar
Collaert, N. (2023). Advancements in IC Technologies: A look toward the future. IEEE Solid-State Circuits Magazine, 15(3), 80–86. https://doi.org/10.1109/MSSC.2023.3280433
Google Scholar
D’Yakonov, M. I., & Perel’, V. I. (1971). Possibility of orienting electron spins with current. JETPL, 13, 467. https://ui.adsabs.harvard.edu/abs/1971JETPL.13.467D/abstract
Web of Science ®Google Scholar
Finocchio, G., Di Ventra, M., Camsari, K. Y., Everschor-Sitte, K., Khalili Amiri, P., & Zeng, Z. (2021). The promise of spintronics for unconventional computing. Journal of Magnetism and Magnetic Materials. 521, 167506. https://doi.org/10.1016/j.jmmm.2020.167506
Web of Science ®Google Scholar
Grimaldi, E., Krizakova, V., Sala, G., Yasin, F., Couet, S., Sankar Kar, G., Garello, K., & Gambardella, P. (2020). Single-shot dynamics of spin–orbit torque and spin transfer torque switching in three-terminal magnetic tunnel junctions. Nature Nanotechnology, 15(2), 111–117. https://doi.org/10.1038/s41565-019-0607-7
PubMed Web of Science ®Google Scholar
Haensch, W., Raghunathan, A., Roy, K., Chakrabarti, B., Phatak, C. M., Wang, C., & Guha, S. (2023) Compute in-memory with non-volatile elements for neural networks: A review from a co-design perspective. Advanced Materials (Deerfield Beach, Fla.), 35(37), e2204944. https://doi.org/10.1002/adma.202204944
PubMed Web of Science ®Google Scholar
Hirohata, A., Yamada, K., Nakatani, Y., Prejbeanu, I.-L., Diény, B., Pirro, P., & Hillebrands, B. (2020).) Review on spintronics: Principles and device applications. Journal of Magnetism and Magnetic Materials. 509, 166711. https://doi.org/10.1016/j.jmmm.2020.166711
Web of Science ®Google Scholar
Hirsch, J. E. (1999). Spin hall effect. Physical Review Letters, 83(9), 1834–1837. https://doi.org/10.1103/PhysRevLett.83.1834
Web of Science ®Google Scholar
Iga, F., Yoshida, Y., Ikeda, S., Hanyu, T., Ohno, H., & Endoh, T. (2012). Time-resolved switching characteristic in magnetic tunnel junction with spin transfer torque write scheme. Japanese Journal of Applied Physics, 51(2S), 02BM02. https://doi.org/10.7567/JJAP.51.02BM02
Google Scholar
IRDS™ 2022: Executive Summary - IEEE IRDS™. (2023). Retrieved from https://irds.ieee.org/editions/2022/executive-summary
Google Scholar
Joshi, V. K., Barla, P., Bhat, S., & Kaushik, B. K. (2020). From MTJ Device to Hybrid CMOS/MTJ Circuits: A Review. IEEE Access. 8, 194105–194146. https://doi.org/10.1109/ACCESS.2020.3033023
Web of Science ®Google Scholar
Kang, W., Deng, E., Klein, J.-O., Zhang, Y., Zhang, Y., Chappert, C., Ravelosona, D., & Zhao, W. (2014). Separated precharge sensing amplifier for deep submicrometer MTJ/CMOS hybrid logic circuits. IEEE Transactions on Magnetics, 50(6), 1–5.)
Web of Science ®Google Scholar
Kautz, W. H. (1969).) Cellular logic-in-memory arrays. IEEE Transactions on Computers, C-18(8), 719–727. https://doi.org/10.1109/T-C.1969.222754
Web of Science ®Google Scholar
Lin, X., Yang, W., Wang, K. L., & Zhao, W. (2019).) Two-dimensional spintronics for low-power electronics. Nature Electronics, 2(7), 274–283. https://doi.org/10.1038/s41928-019-0273-7
Web of Science ®Google Scholar
Mochizuki, A., Kimura, H., Ibuki, M., & Hanyu, T. (2005). TMR-based logic-in-memory circuit for low-power VLSI. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E88-A(6), 1408–1415. https://doi.org/10.1093/ietfec/e88-a.6.1408
Google Scholar
Nozaki, T., Shiota, Y., Shiraishi, M., Shinjo, T., & Suzuki, Y. (2010). Voltage-induced perpendicular magnetic anisotropy change in magnetic tunnel junctions. Applied Physics Letters, 96(2), 022506.
Web of Science ®Google Scholar
Qin, J., Sun, B., Zhou, G., Guo, T., Chen, Y., Ke, C., Mao, S., Chen, X., Shao, J., & Zhao, Y. (2023). From spintronic memristors to quantum computing. ACS Materials Letters, 5(8), 2197–2215. https://doi.org/10.1021/acsmaterialslett.3c00088
Google Scholar
Rajput, P. J., Bhandari, S. U., & Wadhwa, G. (2022). A review on—Spintronics an emerging technology. Silicon, 14(15), 9195–9210. https://doi.org/10.1007/s12633-021-01643-x
Web of Science ®Google Scholar
Raman, S. R. S., Xie, S., & P.Kulkarni, J. (2021 Compute-in-edram with backend integrated indium gallium zinc oxide transistors [Paper presentation].2021 Ieee International Symposium on Circuits and Systems (ISCAS) (p. 1–5). https://doi.org/10.1109/ISCAS51556.2021.9401798
Google Scholar
Slonczewski, J. C. (1996). Current-driven excitation of magnetic multilayers. Journal of Magnetism and Magnetic Materials, 159(1-2), L1–L7. https://doi.org/10.1016/0304-8853(96)00062-5
Web of Science ®Google Scholar
Spin-transfer Torque DDR Products | Everspin. (2023). Retrieved from https://www.everspin.com/spin-transfer-torque-ddr-products
Google Scholar
Spintronics-based Computing. (2015). Cham, Switzerland: Springer International Publishing. Retrieved from 10.1007/978-3-319-15180-9
Google Scholar
Stone, H. S. (1970).) A logic-in-memory computer. IEEE Transactions on Computers, C-19(1), 73–78. https://doi.org/10.1109/TC.1970.5008902
Web of Science ®Google Scholar
Thapliyal, H., Sharifi, F., & Kumar, S. D. (2018). Energy-efficient design of hybrid MTJ/CMOS and MTJ/nanoelectronics circuits. IEEE Transactions on Magnetics, 54(7), 1–8. https://doi.org/10.1109/TMAG.2018.2833431
Web of Science ®Google Scholar
van den Brink, A., Cosemans, S., Cornelissen, S., Manfrini, M., Vaysset, A., Van Roy, W., Min, T., Swagten, H. J. M., & Koopmans, B. (2014). Spin-Hall-assisted magnetic random access memory. Applied Physics Letters, 104(1), 12403.
Web of Science ®Google Scholar
Wang, J., Bai, Y., Wang, H., Hao, Z., Wang, G., Zhang, K., Zhang, Y., Lv, W., & Zhang, Y. (2022). Reconfigurable bit-serial operation using toggle SOT-MRAM for high-performance computing in memory architecture. IEEE Transactions on Circuits and Systems I: Regular Papers, 69(11), 4535–4545. https://doi.org/10.1109/TCSI.2022.3192165
Web of Science ®Google Scholar
Wang, Z., Zhao, W., Deng, E., Klein, J.-O., & Chappert, C. (2015). Perpendicular-anisotropy magnetic tunnel junction switched by spin-Hall-assisted spin-transfer torque. Journal of Physics D: Applied Physics, 48(6), 065001. https://doi.org/10.1088/0022-3727/48/6/065001
Web of Science ®Google Scholar
Wang, Z., Zhao, W., Deng, E., Zhang, Y., & Klein, J.-O. (2015). Magnetic non-volatile flip-flop with spin-Hall assistance. Physica Status Solidi (RRL) – Rapid Research Letters, 9(6), 375–378. https://doi.org/10.1002/pssr.201510097
Web of Science ®Google Scholar
Yu, S., Jiang, H., Huang, S., Peng, X., & Lu, A. (2021). Compute-in-memory chips for deep learning: recent trends and prospects. IEEE Circuits and Systems Magazine, 21(3), 31–56. https://doi.org/10.1109/MCAS.2021.3092533
Web of Science ®Google Scholar
Zhang, D., Zeng, L., Zhang, Y., Klein, J. O., & Zhao, W. (2017). Reliability-enhanced hybrid CMOS/MTJ logic circuit architecture. IEEE Transactions on Magnetics, 53(11), 1–5. ArticleSequenceNumberhttps://doi.org/10.1109/TMAG.2017.2701407
Web of Science ®Google Scholar
Zhao, W., Chappert, C., Javerliac, V., & Noziere, J.-P. (2009). High speed, high stability and low power sensing amplifier for MTJ/CMOS hybrid logic circuits.IEEE Transactions on Magnetics, 45(10), 3784–3787. https://doi.org/10.1109/TMAG.2009.2024325
Web of Science ®Google Scholar

Design and investigation of computation-in-memory based low power hybrid MTJ/CMOS logic gates

Abstract

1. Introduction