TECHNOLOGY ROADMAP EMERGING RESEARCH DEVICES 2013 EDITION FOR

Size: px

Start display at page:

Download "TECHNOLOGY ROADMAP EMERGING RESEARCH DEVICES 2013 EDITION FOR"

Madison Spencer
6 years ago
Views:

1 INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS 01 EDITION EMERGING RESEARCH DEVICES THE ITRS IS DEVISED AND INTENDED FOR TECHNOLOGY ASSESSMENT ONLY AND IS WITHOUT REGARD TO ANY COMMERCIAL CONSIDERATIONS PERTAINING TO INDIVIDUAL PRODUCTS OR EQUIPMENT.

3 TABLE OF CONTENTS Emerging Research Devices Scope Difficult Challenges Introduction..... Device Technologies..... Materials Technologies Nano-information Processing Taxonomy Emerging Research Devices Memory Taxonomy and Devices Logic and Alternative Information Processing Devices More-than-Moore Emerging Research Architectures Memory Architectures for program centric architectures Storage Class Memories Evolved Architectures Exploiting Emerging Research Memory Devices Architectures That Can Learn Morphic Architectures Emerging Memory and Logic Devices A Critical Assessment Introduction Quantitative Logic Benchmarking for Beyond Technologies Survey-Based Benchmarking of beyond Memory & Logic Technologies Memory and Logic Technologies Highlighted for Accelerated Development Processing Introduction Grand Challenges Endnotes/References... 69

4 LIST OF FIGURES Figure ERD1 Relationship among More Moore, More-than-Moore, and Beyond (Courtesy of Japan ERD)... 1 Figure ERD A Taxonomy for emerging research information processing devices (The technology entries are representative but not comprehensive.)... 5 Figure ERD Taxonomy of emerging memory devices... 7 Figure ERD4 Taxonomy of memory select devices...16 Figure ERD5 Taxonomy of options for emerging logic devices. The devices examined in this chapter are differentiated according to (1) whether the structure and/or materials are conventional or novel, and () whether the information carrier is electron charge or some non-charge entity. Since a conventional FET structure and material imply a charge-based device, this classification results in a three-part taxonomy.... Figure ERD6 Two variants of learning devices for configuration...7 Figure ERD7 High-level RF functions partitioned into generic lower-level functions implemented in emerging devices...8 Figure ERD8 Taxonomy for traditional and emerging models of computation...41 Figure ERD9 Median delay, energy, and area of proposed devices in NRI benchmark (normalized to ITRS 15-nm ), based on principal investigators data. (a) 011 benchmark results; (b) 010 benchmark results....5 Figure ERD10 Area, energy, and delay of NAND gate of various post- technologies from 011 NRI benchmark, based on principal investigators data....5 Figure ERD11 Transport impact on switch delay, size, and area of control. Circle size is logarithmically proportional to physically accessible area in one delay. Projections for 15nm included as reference. (Based on principal investigators data.)...5 Figure ERD1 (a) Nomenclature and signals in devices benchmarked; (b) summary of bit adder circuit parameters Figure ERD1 (a) Energy vs. delay plot of bit adder built from benchmarked devices; (b) power vs. throughput of bit adders built from these devices, reflecting power-constrained (< 10 W/cm) throughput Figure ERD14 Technology performance evaluation for (a) FeFET memory, (b) ferroelectric tunnel junction (FTJ), (c) RRAM, (d) Mott memory, (e) macromolecular memory, (f) molecular memory, and (g) carbon-based memory....6 Figure ERD15 Technology performance evaluation for (a) carbon nanotube FET, (b) graphene nanoribbon FET, (c) nanowire FET, (d) tunnel FET, (e) n-type Ge FET, and (f) p-type III-V FET....6 Figure ERD16 Technology performance evaluation for (a) spinfet, (b) NEMS, (c) atomic switch, (d) Mott FET, and (e) neg-cg FET Figure ERD17 Technology performance evaluation for (a) spin wave logic, (b) nanomagnet logic (NML), (c) exciton FET, (d) BiSFET, (e) STT logic, and (f) all spin logic....65

5 LIST OF TABLES Table ERD1 Emerging Research Devices Difficult Challenges... Table ERD Current Baseline and Prototypical Memory Technologies... 6 Table ERD Transition Table for Emerging Research Devices... 6 Table ERD4a Emerging Research Memory Devices Demonstrated and Projected Parameters... 6 Table ERD4b Emerging Research Memory Devices Redox RAM Demonstrated and Projected Parameters... 6 Table ERD5 Experimental demonstrations of vertical transistors in memory arrays Table ERD6 Benchmark Select Device Parameters Table ERD7a Experimentally demonstrated two-terminal memory select devices Table ERD7b Experimentally demonstrated self-selecting memory devices Table ERD8 Target device and system specifications for SCM... 7 Table ERD9 Potential of the current prototypical and emerging research memory candidates for SCM applications... 7 Table ERD 10a MOSFETS: Extending MOSFETs to the End of the Roadmap.... Table ERD 10b Charge-Based Beyond : Non-Conventional FETs and Other Charge-Based Information Carrier Devices.... Table ERD 10c Alternative Information Processing Devices.... Table ERD 11 Figure-of-merit of three reconfigurable architectures... 7 Table ERD1 Anticipated Important Properties of Emerging Memories as driven by application need Table ERD 1 Likely desirable properties of M (memory) type and S (Storage) type Storage Class Memories Table ERD14 Current Research Directions for Employing emerging research memory devices to enhance logic Table ERD15 Applications and Development of Neuromorphic Systems Table ERD16 Noise-Driven Neural Processing and its Possible Applications Table ERD17 Potential Evaluation for Emerging Reseach Memory Devices Table ERD18 Potential Evaluation - Extending MOSFETS to the end of the Roadmap Table ERD19 Potential Evaluation - Non-conventional FETs and other Charge-based Devices Table ERD0 Potential Evaluation: Non-FET, Non-Charge-Based "Beyond " Devices... 58

7 Emerging Research Devices 1 EMERGING RESEARCH DEVICES 1. SCOPE Continued dimensional and functional scaling i of is driving information processing ii technology into a broadening spectrum of new applications. Many of these applications are enabled by performance gains and/or increased complexity realized by scaling. Because dimensional scaling of eventually will approach fundamental limits, several new alternative information processing devices and microarchitectures for existing or new functions are being explored to sustain the historical integrated circuit scaling cadence and reduction of cost/function into future decades. This is driving interest in new devices for information processing and memory, new technologies for heterogeneous integration of multiple functions (a.k.a. More than Moore ), and new paradigms for system architecture. This chapter, therefore, provides an ITRS perspective on emerging research device technologies and serves as a bridge between and the realm of nanoelectronics beyond the end of dimensional and equivalent functional scaling. (Material challenges related to emerging research devices are addressed in a complementary chapter entitled Emerging Research Materials.) An overarching goal of this chapter is to survey, assess and catalog viable new information processing devices and system architectures for their long-range potential, technological maturity, and to identify the scientific/technological challenges gating their acceptance by the semiconductor industry as having acceptable risk for further development. Another goal is to pursue long term alternative solutions to technologies addressed in More-than-Moore (MtM) ITRS entries. This is accomplished by addressing two technology-defining domains: 1) extending the functionality of the platform via heterogeneous integration of new technologies, and ) stimulating invention of a new information processing paradigm. The relationship between these domains is schematically illustrated in Figure ERD1. The expansion of the platform by conventional dimensional and functional scaling is often called More Moore. The platform can be further extended by the More-than-Moore approach which was first introduced into ERD chapter in 011. On the other hand, new information processing devices and architectures are often called Beyond technologies and have been the main subjects of this chapter. The heterogeneous integration of Beyond, as well as More-than- Moore, into More Moore will extend the platform functionality to form ultimate Extended. Figure ERD1 Relationship among More Moore, More-than-Moore, and Beyond (Courtesy of Japan ERD). i Functional Scaling: Suppose that a system has been realized to execute a specific function in a given, currently available, technology. We say that system has been functionally scaled if the system is realized in an alternate technology such that it performs the identical function as the original system and offers improvements in at least one of size, power, speed, or cost, and does not degrade in any of the other metrics. ii Information processing refers to the input, transmission, storage, manipulation or processing, and output of data. The scope of the ERD Chapter is restricted to data or information manipulation, transmission, and storage.

8 Emerging Research Devices The chapter is intended to provide an objective, informative resource for the constituent nanoelectronics communities pursuing: 1) research, ) tool development, ) funding support, and 4) investment, each directed to developing a new information processing technology. These communities include universities, research institutes, and industrial research laboratories; tool suppliers; research funding agencies; and the semiconductor industry. The potential and maturity of each emerging research device and architecture technology are reviewed and assessed to identify the most important scientific and technological challenges that must be overcome for a candidate device or architecture to become a viable approach. The chapter is divided into five sections: 1) memory devices, ) information processing or logic devices, ) More-than- Moore device technologies, 4) emerging research information processing architectures, and 5) a critical assessment of each technology entry. Some detail is provided for each entry regarding operation principles, advantages, technical challenges, maturity, and current and projected performance. Also included is a device and architectural focus combining emerging research devices offering specialized, unique functions as heterogeneous core processors integrated with a platform technology. This represents the nearer term focus of the chapter, with the longer term focus remaining on discovery of an alternate information processing technology to eventually replace digital. The memory device section is expanded to include a new technology entry: carbon-based memory. With growing research activities in ReRAM, a separate table is created for this technology entry to track different types/mechanisms. The logic device section is organized based on state variables and novelty of materials/structures. The More-than- Moore section introduces new discussions on devices with learning capabilities and continues covering emerging devices for RF applications. Finally, the critical assessment section continues to cover a survey-based benchmark and quantitative benchmarks reported in literature to provide a balanced assessment of these emerging new device technologies. A brief section is also included to propose a set of fundamental principles that will likely govern successful extension of information processing technology substantially beyond that attainable solely with ultimately scaled. This chapter continues the technology highlights: (1) Carbon-based Nanoelectronics as a rapidly emerging information processing technology; () Spin Transfer Torque Magnetostatic RAM (STT-MRAM) and Redox Resistive RAM as rapidly emerging memory technologies. These three technologies exhibit substantial potential such that they will likely be ready for manufacture within a five ten year period. Highlighting also suggests that a technology is an attractive candidate for accelerated development. As in previous editions, the chapter includes transition tables. The purpose of these transition tables is twofold. The first is to track technologies that have appeared in or have been removed from the 01 tables and so provide a very brief explanation of the reason for this change. The second is to identify technologies that are considered important but do not meet the criteria for full inclusion into the more detailed tables. These may be expected to become more or less visible in future editions of the roadmap and hence the name.. DIFFICULT CHALLENGES.1 INTRODUCTION The semiconductor industry is facing three classes of difficult challenges related to extending integrated circuit technology to new applications and to beyond the end of dimensional scaling. One class relates to propelling beyond its ultimate density and functionality by integrating a new high-speed, high-dense, and low-power memory technology onto the platform. Another class is to extend information processing substantially beyond that attainable by using an innovative combination of new devices, interconnect and architectural approaches for extending and eventually inventing a new information processing platform technology. The third class is to invent and reduce to practice long term alternative solutions to technologies that address existing MtM ITRS topical entries currently in wireless and eventually in power devices, image sensors, etc. These difficult challenges, all addressing the long term period of , are presented in Table ERD1.

9 Emerging Research Devices Table ERD1 Difficult Challenges Scale high-speed, dense, embeddable, volatile/nonvolatile memory technologies to replace SRAM and possibly FLASH for manufacture by 018. Scale to and beyond Extend ultimately scaled as a platform technology into new domains of application. Continue functional scaling of information processing technology substantially beyond that attainable by ultimately scaled. Invent and reduce to practice long term alternative solutions to technologies that address existing MtM ITRS topical entries currently in wireless/analog and eventually in power devices, MEMS, image sensors, etc. Emerging Research Devices Difficult Challenges Summary of Issues and opportunities SRAM and FLASH scaling in D will reach definite limits within the next several years (see PIDS Difficult Challenges). These limits are driving the need for new memory technologies to replace SRAM and possibly FLASH memories by 018. Identify the most promising technical approach(es) to obtain electrically accessible, high-speed, high-density, low-power, (preferably) embeddable volatile and nonvolatile memories. The desired material/device properties must be maintained through and after high temperature and corrosive chemical processing. Reliability issues should be identified & addressed early in the technology development. Develop nd generation new materials to replace silicon (or InGaAs, Ge) as an alternate channel and source/drain to increase the saturation velocity and to further reduce V dd and power dissipation in MOSFETs while minimizing leakage currents for technology scaled to 018 and beyond. Develop means to control the variability of critical dimensions and statistical distributions (e.g., gate length, channel thickness, S/D doping concentrations, etc.) Accommodate the heterogeneous integration of dissimilar materials. The desired material/device properties must be maintained through and after high temperature and corrosive chemical processing Reliability issues should be identified & addressed early in this development. Discover and reduce to practice new device technologies and primitive-level architecture to provide special purpose optimized functional cores (e.g., accelerator functions) heterogeneously integrable with. Invent and reduce to practice a new information processing technology eventually to replace. Ensure that a new information processing technology has compatible memory technologies and interconnect solutions. A new information processing technology must be compatible with a system architecture that can fully utilize the new device. Non-binary data representations or non-boolean logic may be required to employ a new device for information processing, which will drive the need for new system architectures. Bridge the gap that exists between materials behaviors and device functions. Accommodate the heterogeneous integration of dissimilar materials. Reliability issues should be identified & addressed early in the technology development. The industry is now faced with the increasing importance of a new trend, More than Moore (MtM), where added value to devices is provided by incorporating functionalities that do not necessarily scale according to "Moore's Law. Heterogeneous integration of digital and non-digital functionalities into compact systems that will be the key driver for a wide variety of application fields, such as communication, automotive, environmental control, healthcare, security and entertainment.. DEVICE TECHNOLOGIES Difficult challenges gating development of emerging research devices are divided into three parts: those related to memory technologies, those related to information processing or logic devices, and those related to heterogeneous integration of multi-functional components, a.k.a. More-than-Moore (MtM) or Functional Diversification. (Refer to Table ERD1.) One challenge is the need of a new memory technology that combines the best features of current memories in a fabrication technology compatible with process flow scaled beyond the present limits of SRAM and FLASH. This would provide a memory device fabrication technology required for both stand-alone and embedded memory applications. The ability of an MPU to execute programs is limited by interaction between the processor and the memory, and scaling does not automatically solve this problem. The current evolutionary solution is to increase MPU cache

10 4 Emerging Research Devices memory, thereby increasing the floor space that SRAM occupies on an MPU chip. This trend eventually leads to a decrease of the net information throughput. In addition to auxiliary circuitry to maintain stored data, volatility of semiconductor memory requires external storage media with slow access (e.g., magnetic hard drives, optical CD, etc.). Therefore, development of electrically accessible non-volatile memory with high speed and high density would initiate a revolution in computer architecture. This development would provide a significant increase in information throughput beyond the traditional benefits of scaling when fully realized for nanoscale devices. A related challenge is to sustain scaling of logic technology beyond 018. One approach to continuing performance gains as scaling matures in the next decade is to replace the strained silicon MOSFET channel (and the source/drain) with an alternate material offering a higher potential quasi-ballistic-carrier velocity and higher mobility than strained silicon. Candidate materials include strained Ge, SiGe, a variety of III-V compound semiconductors, and graphene. Introduction of non-silicon materials into the channel and source/drain regions of an otherwise silicon MOSFET (i.e., onto a silicon substrate) is fraught with several very difficult challenges. These challenges include heterogeneous fabrication of high-quality (i.e., defect free) channel and source/drain materials on non-lattice matched silicon, minimization of band-to-band tunneling in narrow bandgap channel materials, elimination of Fermi level pinning in the channel/gate dielectric interface, and fabrication of high-κ gate dielectrics on the passivated channel materials. Additional challenges are to sustain the required reduction in leakage currents and power dissipation in these ultimately scaled gates and to introduce these new materials into the MOSFET while simultaneously minimizing the increasing variations in critical dimensions and statistical fluctuations in the channel (source/drain) doping concentrations. The industry is now addressing the increasing importance of a new trend, More than Moore (MtM), where added value to devices is provided by incorporating functionalities that do not necessarily scale according to "Moore's Law. A MtM section was first introduced into the ERD chapter in 011 with the initial coverage on wireless technologies. This version expands the coverage to devices with learning capabilities. Traditionally, the ITRS has taken a technology push approach for roadmapping More Moore, assuming the validity of Moore s law. In the absence of such a law, different methodologies are needed to identify and guide roadmap efforts in the MtM-domain. A longer-term challenge is invention and reduction to practice of a manufacturable information processing technology addressing beyond applications. For example, emerging research devices might be used to realize special purpose processor cores that could be integrated with multiple CPU cores to obtain performance advantages. These new special purpose cores may provide a particular system function much more efficiently than a digital block, or they may offer a uniquely new function not available in a -based approach. Solutions to this challenge beyond the end of scaling may also lead to new opportunities for such an emerging research device technology to eventually replace the gate as a new information processing primitive element. A new information processing technology must also be compatible with a system architecture that can fully utilize the new device. A non-binary data representation and non-boolean logic may be required to employ a new device for information processing. These requirements will drive the need for new system architectures.. MATERIALS TECHNOLOGIES The most difficult challenge for Emerging Research Materials is to deliver materials with controlled properties that will enable operation of emerging research devices in high density at the nanometer scale. To improve control of material properties for high density devices, research on materials synthesis must be integrated with work on new and improved metrology and modeling. These important objectives are addressed in the companion chapter entitled Emerging Research Materials.. NANO-INFORMATION PROCESSING TAXONOMY Information processing systems to accomplish a specific function, in general, require several different interactive layers of technology. The objective of this section is to carefully delineate a taxonomy of these layers to further distinguish the scope of this chapter from that of the Emerging Research Materials chapter and the Design chapter. One comprehensive top-down list of these layers begins with the required application or system function, leading to system architecture, micro- or nano-architecture, circuits, devices, and materials. As shown in Figure ERD below, a different bottom-up representation of this hierarchy begins with the lowest physical layer represented by a computational state variable and ends with the highest layer represented by the architecture. In this schematic representation focused on generic information processing at the device/circuit level, a fundamental unit of information (e.g., a bit) is represented by a computational state variable, for example, the position of a bead in the ancient abacus calculator or the charge (or voltage) state of a node capacitance in logic. A device provides the physical means of representing and

11 Emerging Research Devices 5 manipulating a computational state variable among its two or more allowed discrete states. Eventually, device concepts may transition from simple binary switches to devices with more complex information processing functionality perhaps with multiple fan-in and fan-out. The device is a physical structure resulting from the assemblage of a variety of materials possessing certain desired properties obtained through exercising a set of fabrication processes. An important layer, therefore, encompasses the various materials and processes necessary to fabricate the required device structure, which is the domain of the ERM chapter. The data representation is how the computational state variable is encoded by the assemblage of devices to process the bits or data. Two of the most common examples of data representation are binary digital and continuous or analog signal. This layer is within the scope of the ERD chapter. The architecture plane encompasses three subclasses of this taxonomy: 1) nano-architecture or the physical arrangement or assemblage of devices to form higher level functional primitives to represent and execute a computational model, ) the computational model that describes the algorithm by which information is processed using the primitives, e.g., logic, arithmetic, memory, cellular nonlinear network (CNN); and ) the system-level architecture that describes the conceptual structure and functional behavior of the system exercising the computational model. Subclass 1) is within the scope of the ERD chapter, and subclasses ) and ) above are within the scope of the Design chapter. The elements shown in the red-lined yellow boxes represent the current platform technology that is based on electronic charge as a binary computational state variable. This state variable serves as the foundation for the von Neumann computational system architecture. The other entries grouped in these five categories summarize individual approaches that, combined in some yet to be determined highly innovative fashion, may provide a new highly scalable information processing paradigm. Figure ERD A Taxonomy for emerging research information processing devices (The technology entries are representative but not comprehensive.)

12 6 Emerging Research Devices 4. EMERGING RESEARCH DEVICES 4.1 MEMORY TAXONOMY AND DEVICES The emerging research memory technologies tabulated in this section are a representative sample of published research efforts (circa ) describing attractive alternative approaches to established memory technologies. iii The scope of this section also includes updated subsections addressing the Select Device required for a crossbar memory application and an updated treatment of Storage Class Memory (including Solid State Disks). Figure ERD is a taxonomy of the prototypical and emerging memory technologies discussed in Table ERD and ERD4. An overarching theme is the need to monolithically integrate each of these memory options onto a technology platform in a seamless manner. Fabrication technologies are sought that are modifications of or additions to established platform technologies. A goal is to provide the end user with a device that behaves similar to the familiar silicon memory chip. The emerging research memory technology entries in the current version of the roadmap differ in several respects from the 011 edition. These changes are described in the Transition Table for emerging research devices (Table ERD). The motivations for these changes are also given in the table. Table ERD Transition Table for Emerging Research Devices Because each of these new approaches attempts to mimic and improve the capabilities of a present day memory technology, key performance parameters are provided in Table ERD for existing baseline and prototypical memory technologies. These parameters provide relevant benchmarks against which the current and projected performance of each new research memory technology may be compared. Table ERD Current Baseline and Prototypical Memory Technologies Major changes in emerging memories are as follows. First, due to significant research activity in the area of Redox memory (ReRAM) section has been expanded into a separate table (ERD4b) with subcategories describing four distinct device categories. Second, Nanoelectricalmechanical Memory has been removed from Table ERD4a and an entry for Carbon memory added. This memory portion of this section is organized around a set of five technology entries shown in the column headers of Table ERD4a and four in ERD4b. These entries were selected using a systematic survey of the literature to determine areas of greatest worldwide research activity. Each technology entry listed has several sub-categories of devices that are grouped together to simplify the discussion. Key parameters associated with the technologies are listed in the table. For each parameter, three values for performance are given: 1) theoretically predicted performance values based on calculations and early experimental demonstrations, ) up-to-date experimental values of these performance parameters reported in the cited technical references. The last row in Tables ERD4a contains the number of papers on the particular device technology published in the last two years. It is meant to be a gauge of the amount of research activity currently taking place in the research community and was a primary metric in determining which of the candidate devices were included in this table. The tables have been extensively footnoted and details may be found in the indicated references. The text associated with the table gives a brief summary of the operating principles of each device and as well as significant scientific and technological issues, not captured in the table, but which must be resolved to demonstrate feasibility. Table ERD4a Emerging Research Memory Devices Demonstrated and Projected Parameters Table ERD4b Emerging Research Memory Devices Redox RAM Demonstrated and Projected Parameters The purpose of many memory systems is to store massive amounts of data, and therefore memory capacity (or memory density) is one of the most important system parameters. A functional memory cell in aan array usually comprises two components: the storage node and the select device, the latter of which allows a given memory cell in an array to be addressed for read or write. Both components impact scaling limits for memory 1. In a two-dimensional layout using planar select transistors the cell layout area is A cell =(6-8)F. In order to reach the highest possible -D memory density of iii Including a particular approach in this section does not in any way constitute advocacy or endorsement. Conversely, not including a particular concept in this section does not in any way constitute rejection of that approach. This listing does point out that existing research efforts are exploring a variety of basic memory mechanisms.

13 Emerging Research Devices 7 4F, a vertical select transistor can be used. Table ERD5 shows several examples of vertical transistor approaches currently being pursued for select devices. Another approach to obtaining a select device with a small footprint is a twoterminal nonlinear device, e.g. a diode, either as a separate device or a strong nonlinearity intrinsic to a resistive memory element. Table ERD6 displays benchmark parameters required for a -terminal select device and Table ERD7 summarizes the operating parameters for several candidate -terminal select devices. Table ERD5. Experimental demonstrations of vertical transistors in memory arrays. Table ERD6. Benchmark Select Device Parameters Table ERD7a. Experimentally demonstrated two-terminal memory select devices Table ERD7b. Experimentally demonstrated self-selecting memory devices (self-rectifying) Storage-class memory (SCM) describes a device category that combines the benefits of solid-state memory, such as high performance and robustness, with the archival capabilities and low cost per bit of conventional hard-disk magnetic storage. Such a device requires a nonvolatile memory technology that can be manufactured at a very low cost per bit. Table ERD8 lists a representative set of target specifications for SCM devices and systems, which are compared against benchmark parameters offered by existing technologies (HDD, NAND Flash, and DRAM). Two columns are shown, one for the slower S-class SCM and the other for fast M-class SCM, as described in Section These numbers describe the performance characteristics that will likely be required from one or more emerging memory devices in order to enable the emerging application space of SCM. Table ERD9 illustrates the potential for storage-class memory applications of a number of prototypical memory technologies (Table ERD) and emerging research memory candidates (Table ERD4). The table shows qualitative assessments across a variety of device characteristics, based on the target system parameters from Table ERD8. These tables are discussed in more detail in Section Table ERD8. Target device and system specifications for SCM Table ERD9. Potential of the current prototypical and emerging research memory candidates for SCM applications MEMORY TAXONOMY Memory Volatile SRAM Baseline DRAM Flash Stand-alone NOR Embedded NAND Table ERD Nonvolatile Prototypical FeRAM PCM MRAM STT-RAM Emerging Ferroelectric Memory FeFET FTJ ReRAM Electrochemical Metallization Bridge Metal Oxide - Bipolar Filamentary Metal Oxide - Unipolar Filamentary Metal Oxide - Bipolar Nonfilamentary Mott Memory Carbon Memory Macromolecular Memory Molecular Memory Table ERD4 Figure ERD Taxonomy of emerging memory devices Figure ERD provides a simple visual method of categorizing memory technologies. At the highest level, memory technologies are separated by the ability to retain data without power. Nonvolatile memory offers essential use advantages, and the degree to which non-volatility exists is measured in terms of the length of time that data can be expected to be retained. Volatile memories also have a characteristic retention time that can vary from milliseconds to (for practical purposes) the length of time that power remains on. Nonvolatile memory technologies are further categorized by

14 8 Emerging Research Devices their maturity. Flash memory is considered the baseline nonvolatile memory because it is highly mature, well optimized, and has a significant commercial presence. Flash memory is the benchmark against which prototypical and emerging nonvolatile memory technologies are measured. Prototypical memory technologies are at a point of maturity where they are commercially available (generally for niche applications), and have a large scientific, technological, and systematic knowledge base available in the literature. These prototypical technologies are covered in Table ERD and in the PIDS Chapter. The focus of this section is Emerging Memory Technologies. These are the least mature memory technologies in Fig. ERD4, but have been shown to offer significant potential benefits if various scientific and technological hurdles can be overcome. This section provides an overview of these emerging technologies, their potential benefits, and the key research challenges that will allow them to become viable commercial technologies MEMORY DEVICES Redox Memory The redox-based nanoionic memory operation is based on a change in resistance of a MIM structure caused by ion (cation or anion) migration combined with redox processes involving the electrode material or the insulator material, or both,,4. The material class for redox memory is comprised of oxides, chalcogenides (including glasses), semiconductors, as well as organic compounds including polymers. In many cases, the conduction is of a filamentary nature, and hence a one-time formation process is required before the bi-stable switching can be started. If this effect can be controlled, memories based on this bi-stable switching process can be scaled to very small feature sizes. The switching speed is limited by the ion transport. If the active distance over which the anions or cations move is small (in the < 10 nm regime) the switching time can be as low as a few nanoseconds. Many of the finer details of the ReRAM switching mechanisms are still unknown. Developing an understanding of the physical mechanisms governing switching of the redox memory is a key challenge for this technology. Nevertheless, recent experimental demonstrations of scalability 5, retention, 6 and endurance 7 are encouraging. In this edition of the ERD Chapter, ReRAM has been split into four categories based on physical mechanisms and electrical characteristics. These are broadly split into Electrochemical Metallization Bridge (EMB), which typically switch due to cation motion, and Metal Oxide ReRAM, where switching is due to anion reconfiguration. There are three varieties of Metal Oxide ReRAM: Bipolar-Filamentary, Unipolar-Filamentary, and Bipolar-Nonfilamentary. Bipolar versus unipolar behavior is distinguished by the requirement of opposite polarities for the SET and RESET operations (bipolar requires both polarities). Filamentary versus non-filamentary is characterized by the area through which electrical conduction and resistance switching take place. In the more common filamentary ReRAM, conduction occurs through a filament, which is typically a small, fixed size for a given material system and forming conditions. Hence, the current through the device is not strongly dependent on device area. Conversely, in non-filamentary ReRAM, conduction takes place over a significant portion of the device area. In this case, the device current is directly proportional to device area. In the following four sections, the operating principles, current status, and challenges of each ReRAM category is described Electrochemical Metallization Bridge ReRAM Electrochemical metallization bridge (EMB) ReRAM, also referred in literature as Conductive Bridge RAM (CBRAM) or Programmable Metallization Cell (PMC), utilizes electrochemical control of nano-scale quantities of metal in thin dielectric films or solid electrolytes to perform the resistive switching operation. 8 The basic EMB cell is a metal ion conductor metal (MIM) system consisting of an electrode made of an electrochemically active material such as Ag, Cu or Ni, an electrochemically inert electrode such as W, Ta, or Pt, and a thin film of solid electrolyte sandwiched between both electrodes 9. Large, non-volatile resistance changes are caused by the oxidation and reduction of the metal ions by the application of low bias voltages. Key attributes are low voltage, low current, rapid write and erase, good retention and endurance, and the ability for the storage cells to be physically scaled to a few tens of nm. The material class for the dielectric film or the solid electrolyte is comprised of oxides, chalcogenides (including glasses), semiconductors, as well as organic compounds including polymers. EMB ReRAM is a strong emerging memory candidate primarily due to scalability (~10nm) 10, ultra-low energy operations due to fast read, write and erase times, and low voltage requirements 11. Maturity of the EMB technology development can be assessed by the fact that many companies are either shipping products based on EMB or in advance stages of commercialization. A survey of publications from 011 to 01 shows that CBRAM technology application in various markets including SSDs 1, embedded NVM 1, and serial interface nonvolatile memory replacement 14. In 01, an EMB based serial NVM replacement product became commercially available 15. Research is being conducted on large arrays of 1Mb or more, integrated in advanced technologies (65nm) in 00mm wafer platforms. Such efforts are critical to

15 Emerging Research Devices 9 identify core technology challenges 16 and also on fundamental materials and mechanisms 17. Novel applications such as the reconfigurable switch 18 and synaptic elements in Neuromorphic systems 19 based on EMB ReRAM are also gaining attention and are expected to expand the application base for this technology. As with other filamentary ReRAM technologies, EMB ReRAM is challenged by bit level variability 16, the random nature of reliability failure such as retention or endurance and random telegraph noise (RTN) potentially contributing to read disturbs 0. Such issues require large populations of bits to be studied, which suggests collaboration between universities and industry may be beneficial. A focus on fundamental understanding which simultaneously addresses a mitigation path such as error correction schemes, redundancy, and algorithm development will move the technology forward. Engineering hurdles include the availability and integration of new materials used in EMB ReRAM at advanced process nodes especially when there could be issues with compatibility of thermal budgets and process tooling. The availability of integrated array level information suggests that some of these challenges are being resolved in the recent years 14,18. Active participation from semiconductor equipment vendors and material suppliers would assist overcoming manufacturing hurdles rapidly Metal Oxide-Bipolar Filamentary ReRAM Metal oxide-bipolar filamentary (MO-BF) ReRAM is an emerging bipolar resistance switching memory, often referred to in the literature as Valence Change Memory (VCM). The structure of MO-BF ReRAM cell is an asymmetric electrode/insulator/electrode stack. One electrode serves to create the interface where switching occurs, which is sometimes referred to as the active electrode. The other electrode serves as an ohmic contact and as a reservoir for the oxygen anions during the switching process4. The most common switching metal oxides are TaO x 6 and HfO x 1 due to their excellent performance and compatibility. However, bipolar filamentary switching has been reported in numerous transition metal oxides including TiO x, AlO x, WO x, 4 SrTiO x, 5 and even in certain metal oxynitrides (e.g. AlO x N y 6 ) and nitrides (e.g. AlN 7 ). These oxides are typically oxygen deficient (sub-stoichiometric), which is why a subscript x is used in the chemical description of the material. In addition to single uniform, sub-stoichimetric oxide layers, it is also common to form a MO-BF ReRAM cell from a bilayer combination of oxides, where one contains a significantly higher oxygen stoichiometry than the other (e.g. Ta O 5-x /TaO -x 8 ). Prior to switching, a MO-BF typically requires a one-time electroforming pulse which creates a switching filament with a high concentration of oxygen vacancies (V O ). Switching in MO-BF is thought to occur due to the regulation of charged V O in this switching channel, due to electric field in combination with thermal effects,4,16. The specific details of the bipolar switching mechanism tend to vary between materials and structures, and are the subject of intense, ongoing scientific research. Rapid improvement at the device level has been made in the past several years for MO-BF ReRAM. Individual cell endurance of 10 1 cycles has been demonstrated in a Ta O 5-x /TaO -x structure7,9. Ten year retention at 85 C has been extrapolated for TaO x from measurements up to 000 hr at 150 C6. In HfO x cells, 10 year retention at 105 C has been extrapolated from a retention study including bakes ranging from 150 C to 00 C (using a ~0 hr bake at each temperature) 0. In this study it was noted that lowing the SET compliance current from 100µA to 10µA dropped the maximum temperature at which 10 year retention could be achieved to 9 C. This demonstrates an important tradeoff between SET current and retention time. Excellent progress in scaling of MO-BF ReRAM has been made: single HfO x devices with dimensions of 10 nm were demonstrated in and 8nm devices in 01 with good endurance and retention in both cases. Switching times of less than 1 ns have been demonstrated in a TaO x cell. Furthermore, a SET switching energy of 115 fj and RESET switching energy of 1 pj was demonstrated (for an R OFF /R ON of ) in Ta/TaO x ReRAM 4. More recently, an AlO x device was demonstrated with ~5nm carbon nanotube electrodes, which was estimated to switch at 10 fj or less 5. MO-BF ReRAM has rapidly progressed toward a commercial product. In 01, Panasonic demonstrated an 8 Mb TaO x ReRAM macro with write pulse speed of 8.ns and throughput of 44MB/s 6. Early in 01, Toshiba published basic read/write circuitry details of prototype -layer Gb ReRAM memory integrated with 4 nm, although details of the switching material were not given 7. In July 01, Panasonic announced the first commercial 8-bit microcontroller which replaced EEPROM of an integrated TaO x ReRAM cell with a 0.18µm process 8. Despite rapid progress in the advancement of MO-BF ReRAM, there are still significant scientific and technological hurdles which must be overcome before it can become a viable high density NAND flash replacement or Storage Class Memory technology. One hurdle is that there is still not a complete understanding of the details of the switching mechanism in these devices, although there has been progress resulting from intense research in the basic science and

16 10 Emerging Research Devices technological aspects in recent years. An improved understanding of the bipolar filamentary switching mechanism may solve one of the most significant technological issues with MO-BF ReRAM: variability. For example, it is pointed out that in a 1 MB test chip based on Ti/HfO x ReRAM, the resistance distribution of the high resistance state varies over several orders of magnitude and overlaps the distribution of the low resistance state 16. Error correction circuitry can address some of this, but adds overhead and degrades the speed of the memory system, especially as the error-rate to be corrected rises above 0.1%. It is also important to note that many of the single cell parameters demonstrated have not been demonstrated in the same device or in large arrays due to design tradeoffs. For example, although it is possible to achieve sub-nanosecond programming speeds, much longer programming times are necessary in a large array in order to separate high and low resistance states. There is also a tradeoff between write current and retention 0. Hence, it will be important in coming years to demonstrate a functional ReRAM array that demonstrates excellent speed, fast readout, high endurance, scalability, low switching energy, high reliability, and low variability characteristics simultaneously Metal Oxide-Unipolar Filamentary ReRAM Metal Oxide - Unipolar Filamentary (MO-UF) ReRAM is another resistive switching device, also referred to in the literature as thermochemical memory (TCM) due to the primary physical switching mechanism. The device structure consists of a metal/insulator/metal (MIM) structure. Typical insulator materials are metal-oxides such as NiO x, HfO x, etc., and common metal electrodes include TiN, Pt, Ni, and W. In general, the device can be asymmetric (i.e. top and bottom electrode materials are different), but unlike other types of ReRAM, asymmetry is not required. The first reported resistive switching in these MIM structures was unipolar in nature (see reference 9 for the first integrated device work that put metal oxide ReRAM in the spotlight). Unipolar switching allows use of the same voltage polarity for changing the resistance from high to low (SET) or from low to high (RESET). Note that in the general case, polarity is still important (e.g. repeatable SET/RESET switching only occurs for one polarity of voltage with respect to one of the electrodes 40 ). Only in symmetric structures (e.g. Pt/HfO /Pt), a-polar behavior can be obtained, where SET and RESET occur irrespective of voltage polarity 41. The switching process is generally understood as being filamentary, where conduction is caused by a filamentary arrangement of defects (oxygen vacancies) throughout the thickness of the insulator film. As with other filamentary ReRAM, an initial high voltage forming step is required to form this conductive filament, while subsequent RESET/SET switching is thought to occur through local breaking/restoration of this conductive path. The unipolar character of the switching indicates that drift (of charged defects) in an electric field does not play a role (as it does in bipolar switching resistive memory), but that thermal effects probably dominate 4,4. On the other hand, polarity effects indicate anodic oxidation (e.g. at Ni or Pt electrodes) is responsible for RESET 40. These findings suggest a thermochemical fuse model for describing this unipolar switching. It has been shown for different MIM structures that both unipolar and bipolar switching mechanisms can be induced, depending on the operation conditions 44,45,46,47. An interesting work on the scaling effect on unipolar and bipolar resistive switching of metal oxides was recently published 48. Unipolar switch is seen as advantageous scaled memory arrays, as it enables the use of simple selector devices as a diode that can be stacked vertically with the memory device in a dense crossbar array. Also, the use of a single program voltage polarity greatly simplifies the circuitry. On the other hand, as has been exemplified in mixed mode (unipolar/bipolar) operation of memory cells, there are important tradeoffs between the unipolar and bipolar switching modes. On the plus side, unipolar switching mode typically shows a higher on/off resistance ratio. The key drawback is that unipolar switching is typically obtained at higher switching power (higher currents) than the bipolar mode, and also endurance is lower. As a result, major research and development work on resistive memories has shifted towards bipolar switching mechanisms. Yet, some interesting recent development has been reported 49, 50, 51, 5, 5, 54. Reference 49 shows an endurance of over 10 6 cycles with a resistive window over five orders of magnitude (and a reset current ~1mA). References 50, 51, and 5 demonstrate how unipolar ReRAM elements can be integrated in a very simple way in an existing process (known as Contact ReRAM technology). This may provide an inexpensive embedded ReRAM technology. Recently, integration of unipolar ReRAM in a 9 nm process was reported. 5 The key attributes were a small cell size (0.0 µm ), switching voltage of less than V, RESET current of less than 60 µa, endurance over 10 6 cycles, and short SET and RESET times of 500 ns and 100 µs, respectively. Reference 5 shows 4Mb array data using this same Contact-RRAM technology, fabricated using a 65 nm process. To accommodate for the low logic V DD process, on-chip charge pump was applied. Set and reset voltages are less than V. Reference 54 reports on a novel approach using thermal assisted switching to lower the switching current.

17 Emerging Research Devices 11 As stated above, large on/off ratio is an attribute of unipolar switching. The low resistance window and large intrinsic variability of bipolar switching ReRAM may require complex and time consuming verify schemes. Further study of the stability and control of the large resistance window (at low current levels) is required to determine if unipolar ReRAM variability can be improved, potentially even allowing for multi-level cell operation. Major challenges to be resolved are the high switching current that seems inherent to the unipolar operation mode. As shown in references 50, 51,and 5, reset currents less than 100 µa are achieved but need further reduction to less than 10 µa. Recently, a possible solution incorporating thermally assisted switching has been presented Metal Oxide-Bipolar Non-Filamentary ReRAM Metal Oxide Bipolar Non-Filamentary (MO-BN) ReRAM is a non-volatile bipolar resistive switching device composed of oxide layers. This has been referred to in the industry as interfacial switching, CMO x, MVO or non-filamentary. Research from different groups has yielded a variety of material stacks, with widely varying results and models. The memory effect has been shown to occur uniformly at or near the interface of at least two layers, typically within or nm. One layer is a conductive metal oxide (CMO), which is usually a perovskite such as PrCaMnO or Nb:SrTiO 55. In contrast to Filamentary ReRAM devices typically based on binary oxides such as TiO x, NiO x, HfO x, TaO x or combinations thereof the resistance change effect of MO-BN ReRAM is uniform. Depending on materials choice and structure, the current is conducted across the entire or majority of electrode area. A forming step to create a conductive filament is not needed. Non-volatile memory functionality is achieved by the field-driven redistribution of oxygen vacancies close to the contact resulting in a change of the electronic transport properties of the interface (e.g. by modifying the Schottky barrier height). Oxygen can be exchanged between layers due to the exponential increase in ion mobility with field. Low current densities, uniform conduction, and bipolar switching imply that substantial self-heating is not involved. Typical R OFF to R ON ratios are on the order of 10. One class of the Bipolar Non-Filamentary ReRAM includes a deposited ion conductive tunnel layer (Tunnel ReRAM), e.g. ZrO. Here, a redistribution of oxygen vacancies causes a change of the electronic transport properties of the tunnel barrier. Low current densities and area scaling of device currents enable ultra-high density memory applications. SET, RESET, and read currents scale with device area. In addition, write current is controlled by the tunnel oxide and hence can be adjusted by changing the tunnel barrier thickness. Both SET and RESET IV characteristics are highly nonlinear enabling true 1R cross-point architectures without the need for an additional selector device for asymmetric arrays up to 51x4096 bit. No external circuitry is needed for current control during the SET operation. A continuous transition between on and off state allow straightforward multi-level programming without the need for precise current control. The typical thickness of the CMO is greater than 5 nm and the tunnel barrier is typically - nm. If a tunnel barrier is present, the adjacent electrode has to be an inert metal such as Pt to prevent oxidation during operation. For the case of PCMO cell, low deposition temperatures less than 45 C of all layers enables back end integration schemes. Bipolar Non-Filamentary ReRAM technology is less mature than other ReRAM categories. Depending on the material system and structure cycling endurance over 10,000 cycles and up to a billion cycles as well as data retention from days to months at 70 C has been achieved on single devices 56,57,58. Within MO-BN ReRAM device family the Tunnel ReRAM is probably the most mature technology. Single device functionality is demonstrated down to 0nm. SET, RESET, and read currents scale with area and tunnel oxide thickness facilitating sub µa switching currents with read currents in the order of a few na to a few 100 na. BEOL integration schemes and /RRAM functionality are verified for 00nm devices on 00mm wafers. True cross-point array (1R) functionality utilizing the self-selecting non-linear device IV characteristics and transistor-less array operation is demonstrated on fully decoded 4kb true crosspoint arrays (1R) build on top of base wafers. SLC and MLC operations are demonstrated within 4kb arrays. Major challenges to be resolved towards the commercialization of MO-BN ReRAM are, in order of priority, a) improvement of data retention, b) the integration of conductive metal oxide layer (perosvkites) via ALD or the replacement of CMO by more process friendly materials, and c) the replacement of Pt electrodes by a non-reactive, compatible electrode material. The most important issue is the improvement of retention and the voltage-time dilemma. This dilemma hypothesizes physical reasons as to why it is difficult in a particular device and material system to simultaneously obtain a long retention, with short low read voltages, and fast switching at moderate write voltages 59. Even though the exact mechanism is still under investigation there is a common agreement that oxygen vacancies are moved by the external electric field resulting in different resistance states of the memory cell. Vacancy drift at room temperature is possible due to a field dependent mobility, which increases exponentially at fields of 1 MV/cm and larger. However, current models based on a field dependent mobility underestimate the experimentally observed ratio between SET/RESET times and data retention

18 1 Emerging Research Devices indicating that the mechanism is only partially understood. More theoretical work is needed to understand the kinetics of programming and retention mechanisms. Once understood, materials have to be chosen to maximize the ratio between SET/RESET and retention times. The goal is to write devices at low temperatures and meet retention requirement of 10 years at 70 C, 85 C, and 15 C, depending on the application. Memory cells with a conductive perovskites as an electrode have demonstrated excellent device-to-device and wafer-towafer reproducibility with yields close to 100%. One of the reasons might be that perovskites display high oxygen vacancy mobilities and tolerate large variations in the oxygen content while maintaining their crystal structure. From an integration perspective, ALD is the method of choice for advanced technology nodes and future D integration schemes. Key issues are the control of the ratio of metals (perovskites are ternary or quaternary oxides), the control of the oxygen stroichiometry in the cell, oxygen loss in the presence of reducing atmospheres like H, as well as high temperatures required for crystallization. Eventually a migration to binary oxides with comparable properties might be required to resolve the integration challenges. Platinum or other noble electrodes display superior device performance over compatible electrodes like TiN. On the one hand it was observed that the oxidation resistance of TiN is not sufficient to prevent oxidation and the formation of TiO during operation. On the other hand, inert electrodes such as Pt or Pt-like metals are difficult to integrate. New oxidation resistant electrodes and Pt alternatives are required to reduce integration challenges and enable D integration schemes Ferroelectric Memory Emerging Ferroelectric Memory consists of two exploratory classes: 1) Ferroelectric FET and ) Ferroelectric polarization ReRAM. This entry should not be confused with the conventional ferroelectric capacitor-based memory (FeRAM or FRAM), which is addressed in tables in PIDS and Table ERD Ferroelectric FET The Ferroelectric FET (FeFET) memory cell is a 1T memory device where a ferroelectric capacitor is integrated into the gate stack of a FET. The remnant polarization of the ferroelectric directly affects the charges in the channel, and leads to a defined shift of the output characteristics of the FET. An important challenge with FeFET is short retention. An analysis has shown that two major causes are the depolarization field and charge trapping due to leakage current. It was also found that the solution to these problems would require the fulfillment of conflicting requirements 60. They then proposed the capacitor-less DRAM as the most viable approach 61. Before 011, the most popular ferroelectric materials used in the gate stack of a FeFET were perovskites used in the storage capacitors of the commercialized ferroelectric memory (with 1T-1C or 1T-C cell structure), such as PbZr x Ti 1-x O (PZT) and SrBa Ta O 9 (SBT), which require buffer layers to suppress chemical reactions at the interface of the FeFET on Si substrates. However, despite the debilitating effects of the resulting depolarization field from buffer layers 60, good retention for a 10 year memory window (projected) and promising endurance up to 10 1 cycles have been demonstrated 6,6,64. This suggests that the FeFET has potential for high-end memory/storage technology, such as enterprise storage, and capacitor-less DRAM. 61 However, these perovskite-based ferroelectrics have poor scalability due to the relatively large physical thicknesses for both the ferroelectric and the buffer layers, and problems with manufacturability plagued these perovskites as they are not compatible with standard fabrication. Notably, since 011, ferroelectricity in a variety of doped and polycrystalline HfO has been reported 65,66,67,68,69, and HfO -based FeFETs have been fabricated using standard high-k metal gate (HKMG) processes. The use of HfO -based ferroelectrics significantly reduces the physical thickness of the gate stack, and in turn scales down the channel length to the current technology node 70. Follow the typical HKMG process, SiO serves as the buffer layer between HfO and Si with a sub-nanometer thickness, yielding low depolarization field 60. The HfO based FeFETs show promising write speed (down to a few ns), retention (projected to 10 years), and endurance (up to 10 1 ), which all match the best performances of its perovskite counterparts (refer to ERD4a). On the other hand, organic ferroelectrics 71, such as polyvinylidene fluorides (PVDF), have the advantage of simplicity, large-area fabrication, non-toxicity, and flexibility when they are made on flexible substrates, including polymers 7,7, IGZO 74, graphene 75 and CNT 76. The drawback is the low spontaneous polarization (P s ) and high gate leakage compared to their inorganic counterparts. Recently, some molecular-based ferroelectrics 77,78,79 presented high values of P s that are comparable to those in perovskite ferroelectrics 77, 80.

19 Emerging Research Devices 1 The FeFET memory device has the ability to realize multi-level-cells (MLC) with demonstrated retention of 10 5 s (some even projected to 10 years) for intermediate states with partially saturated polarization 81,8,8, which are usually inferior to saturated states 84,85. Tradeoffs exist between write time, voltage, and retention, because short and low-voltage write pulses result in less saturated polarization states, and in turn, shorter retention. These tradeoffs can be optimizd to use the FeFET in either M-SCM or S-SCM (see Section 4.1.4). The development of memory FeFET arrays is in its early stages. A 64 kb SBT-based FeFET array has been demonstrated with the merits of long retention, low program voltage, and high endurance 6. The variation of V th in the array is mainly attributed to the differences in the P/E response from cell to cell. The 16x16 array of flexible organic FeFETs with PVDF suffers from large variations in on/off ratios due to gate leakage 86. The ferroelectricity in doped polycrystalline HfO could potentially solve the most significant roadblocks of FeFET technology. Although the fabrication of HfO -based FeFET is compatible with the HKMG process, demonstration of memory cells and arrays will be necessary before there is a realistic opportunity for commercialization. Research efforts are also necessary to determine the physical origins of ferroelectricity in doped HfO and justify continued research on material optimizations 69, Ferroelectric Tunnel Junction A ferroelectric tunnel junction (FTJ) is a trilayer structure with two conductive electrodes sandwiching a very thin ferroelectric layer which electrons cross by quantum mechanical tunneling. Any asymmetry between the two metal/ferroelectric interfaces (different electrode materials, different interface terminations, etc.) will give rise to an asymmetry in the electrostatic potential distribution caused by the ferroelectric polarization in the barrier 88,89. When switching the polarization orientation by an external voltage, the average height of the tunnel barrier changes, resulting in potentially large variations of the tunnel current, as it depends exponentially on the square root of the barrier height. This is the tunnel electroresistance (TER) effect. Because ferroelectrics have a remnant spontaneous polarization, the tunnel resistance change associated with polarization switching is non-volatile. Importantly, while write voltages on the order of a few volts are usually required to switch the polarization, the resistance state can be read at a much lower voltage (typically 100 mv) that does not affect the polarization state. The read out process is thus nondestructive, in contrast to the situation in conventional ferroelectric RAMs. Although earlier attempts date back to the early 1970's 90, the first demonstrations of TER came in ,9,9. These used the conductive tip of an atomic force microscope as the top electrode, and on/off ratios of 750 were reported at room temperature 91. In 01, solid-state junctions with lithographically defined top metallic electrodes were produced 94. Up to now, FTJs have almost exclusively used epitaxially grown perovskite ferroelectrics as the barrier material (BaTiO 91,9,94,95,96,97 BiFeO 98,99, Pb(Zr,Ti)O 9,100,101 ). In most cases, the bottom electrode was a conductive epitaxial perovskite as well (manganites, SrRuO ) and top electrodes simple metals or alloys such as Co or Pt. The largest on/off ratios now exceed 10 4 (R ON = Ohm, R OFF = Ohm) 94,10. Scalability has been demonstrated down to junction diameters of 180 nm in devices defined using conventional (electron-beam) lithography 94 and preliminary results were obtained in junctions of about 50 nm defined by nanoindention 94. The fundamental scaling limit is set by the ferroelectric domain size, which can be as small as a few nm in ultrathin ferroelectric films 10. Endurance has been tested up to several thousands of cycles 94,98,10, with unpublished reports exceeding Retention over several days has been demonstrated 94,98,10. Extrapolating from the performance of ferroelectric RAMs, endurance of cycles and retention over 10 years can be expected 10,104. The minimum reported write energy is 10 fj/bit 94. The thermal stability of the TER has not been investigated, but conventional inorganic ferroelectrics can retain their properties up to at least 00 C 105,106. The field of FTJs is recent and various standard tests required to fully benchmark their potential as non-volatile memory elements remain to be properly performed. Most importantly, longer endurance and retention must be demonstrated in devices integrated in an advanced node. The second key challenge is the integration with technology and the replacement of epitaxial oxides by conventional metals as the bottom electrode 95. This also includes the fabrication of high quality, ultrathin ferroelectric films by a process compatible with, in terms of growth technique and temperature. If confirmed, the recent report of ferroelectricity in (Zr, Hf)O films grown by atomic layer deposition 107 may offer a relatively simple integration process. Organic ferroelectrics may be another interesting route to follow. 108 In terms of architectures, it is necessary to move from single devices to cross-bar architectures. Scientific investigation of the precise role of the electrode/barrier interface atomic 109 and electronic structure 110 on the TER as well as on the dynamics of ferroelectric domains 98,111 at nanosecond timescales and below is also needed Carbon Based Memory

20 14 Emerging Research Devices Recently, various forms of carbon, such as amorphous carbon (α-c) and diamond-like (DLC) carbon, carbon nanotubes and graphene were explored for memory applications 11. Several different mechanisms were reported for carbon-based memory devices. In α-c and DLC devices the resistive switching (both bipolar 11 and unipolar 114 ) is likely due to the transition between the electronic orbital states of sp (diamond-like high resistive state) and sp (graphite-like low resistive state). However, the detailed process of sp /sp conversion carbon based memory is not well understood. While some studies attribute the sp /sp transition to the thermal effect 115, other works suggest that low-temperature transition between the sp and sp phases can occur in a high electric field. Also, a nanoelectromechanical mechanism 116 with breaking and closing of nanogaps is proposed for two terminal graphitic and graphene structures. This mechanism also needs further investigation. For example, according to a recent study, the switching rate is strongly temperature dependent, which rules out a purely electromechanical switching mechanism. This observed switching in suspended graphene devices strongly suggests a switching mechanism via atomic movement and/or chemical rearrangement 117. The mechanism can be thus related to thermochemical and redox process. The latter mechanism likely dominates in the graphene oxide structures 118. Also, there are a number of reports of resistive switching in carbon structures due to electron trap charging and discharging effects 119, which have a mechanism similar to some electronic effect mechanisms previously recognized on RRAM. This mechanism, while interesting scientifically is unlikely to yield practically useful retention in combination with sufficiently high read current, and is therefore not considered by ERD. In addition to -terminal carbon based devices, there are a number of reports of -terminal FET-like memory structures. Many of them are unconventional kinds of floating gate/charge trapping flash memories, e.g. with graphene FET channel 10. Also, a new -terminal memory structure was proposed where the metal/graphene contact is switched between ohmic-like and space charge region-limited states when gate electric bias is applied. The suggested mechanism is based on a density of states (DOS) limited contact model Mott Memory Mott memory is based on the electronic phase transition between metallic and insulative states of correlated electron materials. Devices have a metal/insulator/metal (MIM) capacitor structure, with a correlated-electron insulator (or Mott insulator). Since the Mott transition is a first-order phase transition, the material shows a bistability in its resistance at the Mott transition 1. A mechanism of the Mott memory has been theoretically proposed in terms of the interface Mott transition induced by the carrier accumulation at a Schottky-like interface between a metal electrode and a correlated electron insulator 1. This theory also predicted that the resistive switching due to the interfacial Mott transition has nonvolatile memory functionality. The Mott transition induced by electric field or carrier injection has been experimentally demonstrated in a correlated electron material of Pr 1-x Ca x MnO 14. After this demonstration, two-terminal devices such as switches and memories have been intensively studied with using such correlated electron oxides as Pr 1- xca x MnO, 15,16 VO, 17,18,19 SmNiO, 10 NiO, 11,1 Ca RuO 4, 1 and NbO 14,15. Recently, nonvolatile resistive switching has been also demonstrated in Mott-insulator chalcogenides of AM 4 X 8 (A=Ga, Ge; M=V, Nb, Ta; X=S, Se) 16,17. In addition to the two-terminal devices, it has been recently reported that the three-terminal devices, i.e., field effect transistors (FETs), consisting of VO show a nonvolatile behavior in their transfer characteristics, which offers a promising prospect for nonvolatile memory applications. 18 It is argued that the observed nonvolatile switching is attributable to the first-order phase transition accompanied with a structural change of VO. Using NbO which exhibits a temperature-driven Mott transition from a low-temperature insulator phase to a hightemperature metal phase, Mott memristors with a size of nm have been demonstrated. The switching speed and energies of the NbO -Mott memristors were less than. ns and of the order of 100 fj, respectively 14,15. However, the resistive switching in the NbO -Mott memristors is volatile, because the Mott transition in the devices is induced by Joule heating. The nonvolatile resistive switching of AM 4 X 8 single crystals was induced by the electric field of less than 10 kv/cm 16,17. This suggests that if the device consisting of a 10-nm-thick AM 4 X 8 film is fabricated, the switching voltage will be less than 0.01 V. The retention of the Mott memory is an open question, because its retention characteristics have been rarely reported. The control of crystallinity and chemical-composition in the thin films of correlated electron materials are crucial, because disorders, defects, and spatial variation of chemical composition have significant impact on the Mott transition. For example, the Mott transition can be driven even by a small amount of carrier doping to the integer-filling or half-filling valence states of transition element 1. However, because of disorder, defects, and spatial variation of the chemical composition, rather large amount of carriers of more than 10 cm - are required to drive the Mott transition in actual correlated electron materials, resulting in a relatively large switching voltage required in the Mott memory. As pointed out

21 Emerging Research Devices 15 in theoretical work 1, precise control of the interface between a metal electrode and a correlated electron material is a key challenge for Mott memory Macromolecular Memory Macromolecular memory focuses on metal/insulator/metal diode structures incorporating a layer of polymer. The polymer layers contain mainly carbon atoms and are largely amorphous. Via the two terminals, the resistance of the memory is modified and read out using pulsed voltage. Nonvolatile memory effects with non-destructive read have been reported for a large variety of polymeric materials and blends of polymers with nanoparticles. Both bipolar and unipolar switching has been demonstrated. Depending on the structure of the polymer, a variety of mechanisms can be operative. For polymers supporting transport of inorganic ions, the formation of metallic filaments is reported. In semiconducting polymers supporting ion transport, dynamic doping due to migration of inorganic ions may occur. Ferroelectric polymers in blends with semiconducting material give rise to a memory effect based on the modification of charge injection barriers by the ferroelectric polarization. However, for many polymeric materials, the origin of the resistive switching is not well understood. To date no specific design criteria for the polymer are known, although clear correlations between memory effect and electronic properties of the polymer have been demonstrated. A number of studies have indicated the importance of certain interfaces and interlayers of the polymers with other conducting and semiconducting materials. In some cases, the mechanism and its relationship with polymer structure seems obvious, but in many cases the mechanism of operation is largely unknown. Stability of the memory states at high temperatures (85 o C, 10 4 s) has been demonstrated 19,140. Programming at very low power (70 nw) has been realized 140. Assuming a 15 ns switching time determined for the same system, one might acheive write energy of J/bit. Furthermore low programming voltages have been realized: +1.4 and -1. V for the two states with good retention time (>10 4 s) 141. Scaling of polymer resistive memory cell to 100 nm has been reported 14. At this length scale, integration of memory cells into an 8 8 array was demonstrated. Polymer memory cells on flexible substrates have been demonstrated 14. For amorphous carbon, scaling to a nanometer sized cell has been published (1 10 nm ) 144. Using carbon nanotubes as macromolecular electrodes and aluminum oxide as interlayer, isolated, non-volatile, rewriteable memory cells with an active area of ~ 6 nm have been achieved, requiring a switching power less than 100 nw, with estimated switching energies below 10 fj per bit 5. Regarding the mechanism of operation, extensive work on the class of polyimide polymers has shown clear correlations between electronic structure of the polymer and memory effects, although the exact mechanism is still unclear. A number of studies have indicated an active role of the interface between macromolecular material and (native) oxide layers in memory operation involving charge trapping 145,146. In macromolecular memory, a large variety of operation mechanisms can be operative. A key research question concerns distinguishing different mechanisms and evaluating the potential and possibilities of each mechanism. A second step is to identify model systems for each mechanism. Having such a model system provides a possibility to benchmark the operation of the macromolecular materials. These research steps are crucial for establishing and securing the collaboration of the chemical industry. For design, synthesis, and development of next generation macromolecular materials for memory applications, clear guidelines on the required structural and electronic properties of the macromolecular material are needed. For instance, a memory effect originating from the formation and dissolution of metallic filaments requires macromolecular materials that support the transport of ions and have appropriate internal free volume for ion conduction. Here the field could benefit from interaction with the field of polymer batteries. Ferroelectric polymers have been shown to give rise to resistive memory 147 and could benefit enormously from development of new macromolecular polymeric materials with combined ferroelectric switching and semiconducting structural units. Finally, a number of macromolecular memories involve oxide layers. At the macromolecular/oxide interface trap states can be engineered by tuning the electron levels of the macromolecular material Molecular Memory Molecular memory is a broad term encompassing different proposals for using individual molecules or small clusters of molecules as building blocks of memory cells. In molecular memory, data are stored by applying an external voltage that causes a transition of the molecule into one of two possible conduction states. Data is read by measuring the resulting resistance change of the molecular cell. The concept emphasizes extreme scaling; in principle, one bit of information can be stored in the space of a single molecule 148. Computing with molecules as circuit building blocks is an exciting concept with several desirable advantages over conventional circuit elements. Because of their small size, very dense circuits could be built, and bottom-up self-assembly of molecules in complex structures could be applied to augment top-down lithography fabrication techniques. As all molecules of one type are identical, molecular switches should have identical

22 16 Emerging Research Devices characteristics, thus reducing the problem of variability of components. However, the success of molecular electronics depends on our understanding of the phenomena accompanying molecular switching, where currently many questions remain. Early experiments on the reversible change in electrical conductance generated considerable interest 149,150. However, further studies revealed several serious challenges for single/few molecule devices due to extreme sensitivity of the device characteristics to the exterior parameters such as contacts, reproducible nanogap, and environment, etc. Also, there are multiple mechanisms contributing to the electrical characteristics of the molecular devices, e.g. the conductivity switching as an intrinsic behavior of molecular switches may often be masked by other effects, such as formation of metal filaments along the molecule attached between two metal electrodes 151. In other cases, intrinsic molecular switching has been reported, and a 160-kbit molecular memory has been fabricated 15. Molecular memory is viewed as a long term research goal. The field of molecular electronics needs further fundamental work, which is currently in progress 15, MEMORY SELECT DEVICE The Capacity (or density) is one of the most important parameters for memory systems. In a typical memory system, memory devices (cells) are connected to form an array. A memory cell in an array can be viewed as being composed of two components: the Storage node, which is usually characterized by an element with switchable states, and the Select Device, which allows the storage node to be selectively addressed for read and write. Both components impact memory scaling. It should be noted that for several advanced concepts of resistance-based memories, the storage node could in principle be scaled below 10 nm, 155 and the memory density is often limited by the select device. The most commonly used memory select devices are transistors (e.g., FET or BJT). Flash memory is an example of a storage node (floating gate) and a select device (transistor) combined in one device. Planar transistors typically have a footprint of around (6-8)F. In order to reach the highest possible D memory density of 4F, a vertical select transistor should be used. However, transistors as select devices are generally unsuitable for D memories. Two-terminal memory select devices and crossbar memory arrays are preferred for scalability. 156,157 The function of select devices is essentially to minimize leakage through unselected paths ( sneak paths ). Two-terminal select devices can achieve this through asymmetry (e.g., diodes) or nonlinearity (e.g., nonlinear devices). Volatile switches can also be used as select devices. Figure ERD4 shows a taxonomy of memory select devices. In addition, inherent self-selecting properties (e.g., nonlinearity or self-rectification) may enable crossbar arrays without external select devices. Memory Select Devices Transistor Diodes Volatile switch Nonlinear devices Planar Si diodes Threshold switch External nonlinear select devices Vertical Oxide/oxide heterojunctions Mott switch MIEC Metal/oxide Schottky junctions Reverse-conduction diodes Complementary structures Intrinsic nonlinearity Vertical transistors Self-rectification Figure ERD4: Taxonomy of memory select devices Several examples of experimental demonstrations of vertical select transistors used in memory arrays are presented in Table ERD5. While a vertical select transistor allows for the highest planar array density (4F ), this technology is more difficult to integrate into stacked D memory than the conventional 8F technology using planar FETs. For example, to avoid thermal stress on the memory elements on the existing layers, the processing temperature of any vertical transistor intended for use as a selection device in D stacked memory must be low. Also, making contact to the third terminal (gate) of vertical FET constitutes an additional integration challenge, which usually results in cells size larger than (4F ) 158 ; although true 4F arrays can, in principle, be implemented with -terminal select devices Two-terminal select devices

23 Emerging Research Devices 17 General requirements for two-terminal select devices are sufficient ON currents at proper bias to support read and write operations and sufficient ON/OFF ratio to enable selection. The minimum ON current required for fast read operation is ~ 1µA (Table ERD6). The required ON/OFF ratio depends on the size of the memory block, m m: for example using a standard scheme of array biasing the required ON/OFF ratio should be in the range of for m= , in order to minimize the sneak currents 160. These specifications are quite challenging, and the experimentally demonstrated select devices have yet to meet them. Thus, select devices are becoming a critical part of emerging memory. It should be noted that resistance based memories could target different applications, which may impact requirements for the select devices. The simplest realizations of two-terminal memory select devices use semiconductor diode structures, such as a pnjunction diode, Schottky diode, or heterojunction diode. Such devices are suitable for a unipolar memory cell. For bipolar memory cells, selectors with bi-directional conduction are needed. Proposed examples include Zener diodes 161, BARITT diodes 16, reverse breakdown Schottky diode Si diode select devices Both single-crystal Si 164 and poly-si 165,166,167 diodes have been developed as select devices for PCM arrays. To provide a high ON current, the contact resistivity needs to be reduced to < 10-7 Ω cm, which can be achieved by engineering the metal electrodes and electrode-si interface. 165 A short-time annealing technique helps to reduce the OFF current and enlarge the ON/OFF ratio. Poly-Si technology can achieve ON current density of 10 7 A/cm (at ~ 1.8V) and ON/OFF ratio of It is believed that Si diode can be scaled beyond 0nm. Poly-Si diode select devices have been integrated in PCM crossbar arrays 165, D vertical chain-cell type PCM 166, and a 1Gb PCM test chip 167. A major challenge of Si diodes is the high processing temperature (above 1000 C) required to crystallize Si to reduce contact resistivity and OFF current Oxide diode select devices Oxide-based Heterojunction 168,169,170,171 or Schottky junction 174,175,176,177 diodes can be fabricated at lower temperature. They are particularly suitable for oxide-based RRAM devices. A p-nio x /n-tio x diode has demonstrated a rectification ratio of 10 5 at ±V and ON current density of 5 10 A/cm (at ~.5V) 168. A p-cuo x /n-inzno x diode achieved higher ON current density of 10 4 A/cm (at ~ 1.V) and was integrated with NiO x RRAM in a -layer 8 8 crossbar array 169,170 and with Al O antifuse in a one-time-programmable (OTP) memory 171. Si substrates can be used as a part of heterojunction diodes as demonstrated in n-zno/p-si 17 and n-ge-nanowire/p-si diodes 17. In a TiO x -based diodes with Pt electrodes, temperature-dependent current-voltage (I-V) characteristics confirms a Schottky barrier of ~0.55eV at the TiO x /Pt interface 174. The rectification ratio is ~ at ±1V but ON current density is low (~ 1A/cm ) due to large size. A Pt/TiO /Ti diode with Pt as the Schottky contact and Ti the ohmic contact achieved higher rectification ratio of at ±1V 175. Another demonstration of Pt/TiO /Ti Schottky diodes improved ON current density to ~ 10 5 A/cm at V on a 4µm area 176. Measurement showed that current is not uniform across the diode area, probably due to edge leakage. Therefore, current density is higher at smaller diode size. An Ag/n-ZnO Schottky diode with non-alloyed Ti/Au ohmic contact demonstrated a rectification ratio of 10 5 and forward current density over 10 4 A/cm at V 177. In addition to oxide Schottky diodes, Si Schottky diodes are also utilized as select devices, e.g., Al/p-Si Volatile switch as select devices Volatile resistive switching devices can also work as select devices: provide access to a selected memory element in ON state and block sneak paths in OFF state. They are similar to the storage nodes. The main difference is that nonvolatility is required for the storage node, while select devices need volatile switching characteristics to turn on and off quickly Mott switch This device is based on metal-insulator transition (i.e., Mott transition) and exhibits a low resistance above a critical electric field (threshold voltage, V th ). It recovers to a high-resistance state if the voltage is below a hold voltage (V hold ). If the electronic conditions that triggered Mott transition can relax within the memory device operation time scale, the Mott transition device is essentially a volatile resistive switch and can be utilized as a select device. A VO -based device has been demonstrated as a select device for NiO x RRAM 179. It should be noted that VO undergoes a phase transition to the metal state at temperature around 68 C, and thus its operation is restricted to this transition temperature. This limits practical applications of VO in memory devices as current specifications require operational temperature of 85 C. Suitable Mott materials with higher transition temperatures need to be investigated. Recently, metal insulator transitions at ~ 10 C and electrically driven switching were observed in SmNiO Threshold switch This type of device is based on the threshold-switching effect observed in thin-film MIM structures caused by electronic charge injection. Significant resistance reduction occurs at V th and this low-resistance state quickly recovers to the

24 18 Emerging Research Devices original high-resistance state when the applied voltage falls below V hold. It was reported that chalcogenide-based threshold switches could be used as access devices in PCM arrays 181. Niobium oxide is found to possess both memory switching and threshold switching properties at different compositions, based on which a hybrid memory (W/bi-layer- NbO x /Pt) was demonstrated in a 1kb array 18. In Si-As-Te ternary alloy, the composition (controlled by the sputtering power during deposition) determines the emergence of threshold switching 18. Both V th and V hold vary with composition, which may provide a method to optimize the select device operation window. Another threshold switch device based on AsTeGeSiN was shown to be scalable to 0nm with current density exceeding 10MA/cm and endurance over 10 8 cycles 184. It was integrated with TaO x -based RRAM devices Nonlinear select devices Devices with nonlinear characteristics can be used as select devices for both unipolar and bipolar memory elements Nonlinear select devices Nonlinearity in device characteristics can be introduced with non-ohmic transport mechanisms, e.g., tunneling. A Ni/TiO /Ni nonlinear select device is integrated with HfO -RRAM to demonstrate a 1S1R memory structure 185. Another select device with Pt/TiO /TiN structure is combined with a bi-layer Pt/TiO -x /TiO /W RRAM for a functional memory device 186. A so-called varistor select device is based on a sandwiched TaO x /TiO /TaO x structure 187. It was found that the substitution of Ti 4+ in TiO by Ta 5+ ions increases the conductivity of the initially insulating TiO layer. The on current of nonlinear select devices can be modulated by oxide thickness and oxidation conditions. A back-to-back diode structure, n + /p/n + poly-si, is also suggested as a select device, where the middle p-layer is fully depleted and a drain induced barrier lowering (DIBL) effect causes exponential current increase with applied voltage MIEC switch The device is made from Cu-containing mixed ionic and electronic conduction (MIEC) materials sandwiched between an inert top electrode (TE) (e.g., TiN, W) and a bottom electrode (BE). Negative voltage applied on TE pulls Cu + in MIEC away from the BE and create vacancies near BE. The vacancy concentrations depend exponentially on the applied voltage. Symmetrical diode-like I-V characteristics is achieved with two inert electrodes. Large fraction of mobile Cu + enables high current density (> tens of MA/cm ) 189. Endurance above 10 8 cycles has been demonstrated on MIEC devices in small arrays 190. The MIEC select devices were also integrated with PCM in a 51kb testing array using 180nm process 191. The scalability of MIEC select devices was tested to below 0nm in diameter and below 1nm in thickness Complementary resistive switches Complementary resistive switch (CRS) provides a self-selecting memory by connecting two bipolar RRAM devices antiserially 19. It may be considered as constructed nonlinearity. Both states 0 and 1 have high resistance in CSR, which helps to minimize leakage through sneak paths. In either state, one of the two RRAMs is in LRS and the other in HRS. When reading a 1 state, the HRS device is switched to LRS and both devices end up in LRS. When reading a 0 state, no switching occurs and CSR remains in HRS. The read operation is destructive, although a non-destructive readout method was also proposed 194. CRS has been demonstrated in different resistive switching devices, e.g., Cu/SiO /Pt bipolar resistive switches 195, amorphous carbon-based RRAM 196, TaO x based ReRAM, 197 a multi-layer TiO x device, 198 HfO x ReRAM, 199 ZrO x /HfO x bi-layer ReRAM 00, a Cu/TaO atomic switch 01, and Nb O 5 x /NbO y RRAM, 0 etc. To date, these implementations have generally required fairly high switching currents, and decreasing this to target values (0-100µA) needs to be addressed. Table ERD7a summarizes experimentally demonstrated parameters of some two-terminal select devices, including diodes, volatile switches, and nonlinear devices. Memory devices with self-selecting properties (e.g., self-rectifying) are summarized in Table ERD7b. It remains a great challenge for these devices to meet all the requirements in Table ERD6. For scaled two-terminal select devices, two fundamental challenges are contact resistance 165 and lateral depletion effects. 0,04 Very high doping concentrations are needed to minimize both effects. However, high doping concentrations result in increased reverse bias currents in classical diode structures and therefore a reduced I on /I off ratio. For switch-type select devices the main challenges are identifying the right material and the switching mechanism to achieve the required drive current density, I on /I off ratio, and reliability. Endurance is also a critical parameter for select devices that are rarely reported in literature. ERD will continue tracking select devices with more comprehensive set of parameters STORAGE CLASS MEMORY Traditional storage: HDD and Flash solid-state drives

25 Emerging Research Devices 19 Conventionally, magnetic hard-disk drives are used for nonvolatile data storage. The cost of HDD storage in $/GB is extremely low and continues to decrease. Although the bandwidth with which contiguous data can be streamed is high, the poor random access time of HDDs limits the maximum number of I/O requests per second (IOPs). Also, HDDs have relatively high energy consumption, large form factor, and are subject to mechanical reliability failures in ways that solid state technologies are not. However, the sheer number and growth in HDD shipments per year (80,000 Petabytes in 01, growing at % per year) means that magnetic disk storage is highly unlikely to be replaced by solid-state drives at any time in the foreseeable future. 05 Nonvolatile semiconductor memory in the form of NAND flash has become a widely-used alternative storage technology, offering faster access times, smaller size and lower energy consumption when compared to HDD. However, there are several serious limitations of NAND flash for storage applications, such as poor endurance ( erase cycles), modest retention (typically 10 years on a new device, but only 1 year at the end of rated endurance lifetime), long erase time (~ms), and high operating voltage (~15V). Another difficult challenge of NAND Flash SSD is due to its page/blockbased architecture. By not allowing for direct overwrite of data, sophisticated garbage collection, wear-leveling and bulk erase procedures are required. This in turn requires additional computation (thus limiting performance and increasing cost and power associated with a local processor, RAM, and logic), and over-provisioning which further increases cost per effective user-bit of data. 06 Although flash memory technology continues to project for further density scaling, inherent performance characteristics such as read, write and erase latencies have been nearly constant for over a decade. 07 While the introduction of multilevel cell (MLC) flash devices extended flash memory capacities by a small integral factor (-4), the combination of scaling and MLC have resulted in the degradation of both retention time and endurance, two parameters critical for storage applications. The migration of NAND Flash into the vertical dimension above the silicon is expected to continue this trend of improving bit density (and thus cost-per-bit) while maintaining or even slightly degrading the latency, retention, and endurance characteristics of present-day NAND Flash. This outlook for existing technologies has opened interesting opportunities for prototypical and emerging research memory technologies to enter the non-volatile solid-state-memory space What is Storage Class Memory? Storage-class memory (SCM) describes a device category that combines the benefits of solid-state memory, such as high performance and robustness, with the archival capabilities and low cost of conventional hard-disk magnetic storage. 08,09 Such a device requires a nonvolatile memory (NVM) technology that could be manufactured at a very low cost per bit. A number of suitable NVM candidate technologies have long received research attention, originally under the motivation of readying a replacement for NAND Flash, should that prove necessary. Yet the scaling roadmap for NAND Flash has progressed steadily so far, without needing any replacement by such technologies. So long as the established commodity continues to scale successfully, there would seem to be little need to gamble on implementing an unproven replacement technology instead. However, while these NVM candidate technologies are still relatively unproven compared to Flash, there is a strong opportunity for one or more of them to find success in applications that do not involve simply replacing NAND Flash. Storage Class Memory can be thought of as the realization that many of these emerging alternative nonvolatile memory technologies can potentially offer significantly more than Flash, in terms of higher endurance, significantly faster performance, and direct-byte access capabilities. In principle, Storage Class Memory could engender two entirely new and distinct levels within the memory and storage hierarchy. These levels would be differentiated from each other by access time, with both levels located within the more than two orders of magnitude between the latencies of off-chip DRAM (~80ns) and NAND Flash (0µs) Storage-type SCM The first new level, identified as S-type storage-class memory (S-SCM), would serve as a high-performance solid-state drive, accessed by the system I/O controller much like an HDD. S-SCM would need to provide at least the same data retention as Flash, allowing S-SCM modules to be stored offline, while offering new direct overwrite and random access capabilities (which can lead to improved performance and simpler systems) that NAND flash devices cannot provide. However, it would be absolutely critical that the eventual device cost for S-SCM be no more than 1.5-x higher than NAND Flash. While such costs need not be realized immediately at first introduction, it would need to be very clear early on that costs would steadily approach such a level relative to Flash, in order to guarantee large unit volumes and justify the sizeable up-front capital investment in an unproven new technology. Note however that such system cost reduction can come from other sources than the raw cost of the device technology: a slightly-higher-cost NVM technology that

26 0 Emerging Research Devices enabled a simple, low-cost SSD by eliminating or simplifying costly and /or performance-degrading overhead components would achieve the same overall goal. If the cost per bit could be driven low enough through ultrahigh memory density, ultimately such an S-SCM device could potentially replace magnetic hard-disk drives in enterprise storage server systems as well as in mobile computers Memory-type SCM The second new level within the memory and storage hierarchy, termed M-type storage-class memory (M-SCM), should offer a read/write latency of less than ~00 ns. These specifications would allow it to remain synchronous with a memory system, allowing direct connection from a memory controller and bypassing the inefficiencies of access through the I/O controller. The role of M-SCM would be to augment a small amount of DRAM to provide the same overall system performance as a DRAM-only system, while providing moderate retention, lower power-per-gb and lower cost-per-gb than DRAM. Again, as with S-SCM, the cost target is critical. It would be desirable to have cross-use of the same technology in either embedded applications or as a standalone S-SCM, in order to spread out the development risk of an M-SCM technology. The retention requirements for M-SCM are less stringent, since the role of non-volatility might be primarily to provide full recovery from crashes or short-term power outages. Particularly critical for M-SCM will be device endurance, since the time available for wear-leveling, error-correction, and other similar techniques is limited. The volatile portion of the memory hierarchy will have effectively infinite endurance compared to any of the non-volatile memory candidates that could become an M-SCM. Even if device endurance can be pushed well over 10 9 cycles, it is quite likely that the role of M-SCM will need to be carefully engineered within a cascaded-cache or other Hybrid Memory approach 10. That said, M-SCM offers a host of new opportunities to system designers, opening up the possibility of programming with truly persistent data, committing critical transactions to M- SCM rather than to HDD, and performing commit-in-place database operations Target Specifications for SCM Since the density and cost requirements of SCM transcend the straightforward scaling application of Moore s Law, additional techniques will be needed to achieve the ultrahigh memory densities and extremely low cost demanded by SCM, such as (1) -D integration of multiple layers of memory, currently implemented commercially for write-once solid-state memory 11, and/or () Multiple level cell (MLC) techniques. Table ERD8 lists a representative set of target specifications for SCM devices and systems compared with benchmark parameters of existing technologies (HDD and NAND Flash). As described above, SCM applications can be expected to naturally separate based on latency. Although S-class SCM is the slower of these two targeted specifications, read and write latencies should be in the 1-5 µsec regime in order to provide sufficient performance advantage over NAND Flash. Similarly, endurance of S-class SCM should offer at least 1 million program-erase cycles, offering a distinct advantage over NAND Flash. In order to support off-line storage, 10 year retention at 85 o C should be available. In order to make overall system power usage (as shown in Table ERD8) competitive with NAND Flash and HDD, and since faster I/O interfaces can be expected to consume considerable power, the device-level power requirements must be extremely minimal. This is particularly important since low latency is necessary but not sufficient for enabling high bandwidth high parallelism is also required. This in turn mandates a sufficiently low power per bit access, both in terms of peripheral circuitry and device-level write and read power requirements. Finally, standby power should be made extremely low, offering opportunities for significant system power savings without loss of performance through rapid switching between active and standby states. In order to achieve the desired cost target of within 1.5-x of the cost of NAND Flash, the effective areal density will similarly need to be very close to NAND Flash. This low cost structure would then need to be maintained by subsequent SCM generations, through some combination of further scaling in lateral dimension, by increasing the number of multiple layers, or by increasing the number of bits per cell. Also shown in Table ERD8 are the target specifications for M-type SCM devices. Given the faster latency target (which enables coherent access through a memory controller), program-erase cycle endurance must be higher, so that the overall non-volatile memory system can offer a sufficiently large lifetime before needing replacement or upgrade. Although some studies have shown that a device endurance of 10 7 is sufficient to enable device lifetimes on the order of -10 years 1, we anticipate that the need for sufficient engineering margin would suggest a minimum cycle endurance of 10 9 cycles. While such endurance levels support the use of M-class SCM in memory support roles, significantly higher endurance values would allow M-class SCM to be used in more varied memory applications, where the total number of memory accesses may become very large.

27 Emerging Research Devices 1 As was discussed above, the expected use of M-class SCM within an on-line system reduces the requirement of nonvolatility greatly. Here the minimum retention that would be required would be sufficient to allow recovery of transaction or other critical data from the system, within a week or so of loss of power either due to a localized or wide-scale loss of power to the overall system. Obviously, a larger retention time would support even more varied applications, including even long-term off-line storage. Active power should be competitive with or better than DRAM, with standby power again being significantly lower. While M-class SCM will not need to be refreshed, it should be noted that this may not necessarily lead to large power and efficiency improvements in existing DRAM systems, the power used and overhead associated with refresh has typically been fairly modest. However, this trend may change as DRAM itself scales to more aggressive technology nodes. As with S-class SCM, the areal density of M-class SCM will need to be high in order to support a low cost as compared with DRAM. Note that S-class SCM could potentially fail to match NAND Flash in cost yet still succeed in the market, because it can offer inherently higher performance, higher endurance, and the additional benefits of direct byte-accessibility. In contrast, M-class SCM will almost certainly have to offer slower performance and lower cycle endurance than DRAM. While non-volatility is an attractive feature that DRAM cannot provide, it is unlikely that M-class SCM will meet with widespread success unless it can also prove attractive in some combination of lower cost, higher bit density, and/or lower power usage, with respect to DRAM Device Candidates for SCM As discussed above, numerous non-volatile memory candidates have been researched as potential replacements for NAND Flash, or more recently, as potential enablers of Storage Class Memory. Some of these memory candidates have been successfully commercialized, yet are still unsuitable for enabling SCM. This is not surprising since the required combination of attributes high non-volatility (ranging from 1 week to 10 years), very low latencies (ranging from hundreds of nanoseconds up to tens of microseconds), physical durability during practical use, and most important, ultralow cost per bit is non-trivial to attain. Necessary attributes of a memory device for the storage-class memory applications, mainly driven by the requirement to minimize the cost per bit, are scalability, the potential for high areal density through either MLC (Multilevel Cells) and/or D integration, low fabrication cost, long-term retention, low latency, low power, high cycle endurance, and low variability. Table ERD9 shows the potential of the prototypical memory entries (Table ERD) and the current emerging research memory entries (Table ERD4) for storage-class memory applications based on the above parameters. Each of the abovementioned categories are qualitatively described as either a strength of that technology (represented with a green face), a non-strength (yellow face), or a decided weakness (red face), in terms of suitability for SCM applications. The three prototypical memory candidates found on Table ERD are FeRAM, STT-MRAM, and PCRAM. While FeRAM was the first to be commercialized, its difficulties in scalability, MLC, and D integration make it a poor candidate for SCM, despite its excellent latency, power, and endurance characteristics. STT-MRAM can be expected to be particularly good in latency and endurance, but achieving scalability while maintaining thermal stability, low write power, and sharply defined resistance distributions (e.g. low variability) are current challenges. In particular, implementing MLC by stacking multiple STT-MRAM cells with different and carefully tuned characteristics can be expected to be quite difficult to implement. While PCRAM has been shown to be scalable, capable of MLC and only requires a unipolar selection device for D integration, reducing power and maintaining good endurance, retention, latency, and low fabrication costs are still a work in progress. However, the recent release of PCRAM-as-NORreplacement products could potentially provide an opportunity to improve these aspects for SCM applications. Of the seven emerging research memory entries, the emerging ferroelectric memories such as FeFETs are quite similar to FeRAM but can be expected to be more scalable, while offering lower endurance due to charge-trapping effects. For the new category of carbon-based memory, and the older categories of macro-molecular and molecular memory, data from aggressive attempts to implement integrated device arrays is either missing or not yet available, making it difficult to establish these emerging technologies suitability for SCM applications. The large amount of work over the past -5 years in the area of ReRAM have made it clear that these memories need to be further sub-categorized into electrochemical metallization bridge (EMB), metal oxide-bipolar filamentary (MO-BF), metal oxide-unipolar filamentary (MO-UF), and metal oxide-bipolar non-filamentary (MO-BN). While the unipolar nature of MO-UF memories (such as NiO) makes it more suitable for D integration, these memories also tend to exhibit high switching power and poor endurance. The metallic filaments of copper or silver in EMB devices provide a large resistance contrast suitable for MLC, but also tend to lead to poor retention characteristics. Since three of the four sub-

28 Emerging Research Devices categories typically involve filaments, the potential for scalability for the ReRAM should be strong. However, while MO-BF memories (such as HfOx or TaOx) have no strong weaknesses, it is not yet clear whether the broad cycle-to-cycle variations in both resistances and switching voltages will support aggressive scaling to future technology nodes. So far, the low switching currents required at such nodes have tended to lead to increased variability. Finally, MO-FN memories offer low variability, but need to improve retention. Section 5. will discuss Storage Class Memory further, in its context as an Emerging Research Architecture. This discussion will touch on the system design and potential application spaces for SCM. 4. LOGIC AND ALTERNATIVE INFORMATION PROCESSING DEVICES One of the central objectives of this chapter is to review recent research in devices beyond silicon transistors and to forecast the development of novel logic switches that might replace the silicon transistor as the device driving technological development within the semiconductor industry. Such a replacement is thought potentially to be viable if one or more of the following capabilities is afforded by a novel device: (1) an increase in device density (and corresponding decrease in cost) beyond that achievable by ultimately scaled ; () an increase beyond in switching speed, e.g., through improvements in the normalized drive current or reduction in switched capacitance; () a reduction beyond in switching energy, associated with a reduction in overall circuit energy consumption; or (4) the enabling of novel information processing functions that cannot be performed as efficiently using conventional. Figure ERD5 Taxonomy of options for emerging logic devices. The devices examined in this chapter are differentiated according to (1) whether the structure and/or materials are conventional or novel, and () whether the information carrier is electron charge or some non-charge entity. Since a conventional FET structure and material imply a charge-based device, this classification results in a three-part taxonomy. The organization of this section is intended to reflect a progression of options that might enable an orderly transition from to devices that depart increasingly from in terms of structure, materials, or operation. That organization is depicted in Figure ERD5. The resulting taxonomy of emerging logic devices is conveyed in the three tables associated with this section. Table ERD10a is titled Extending MOSFETs to the End of the Roadmap and tabulates characteristics of the devices in the lower-left quadrant of Figure ERD5. These devices are reviewed below in Section Table ERD10b, titled Charge- Based Beyond : Non-Conventional FETs and Other Charge-Based Information Carrier Devices, consists of those devices in the lower-right quadrant of the figure, which are reviewed in Section 4... Finally, Table ERD10c details the Alternative Information Processing Devices that are listed in the upper-right quadrant of Figure ERD5 and that are reviewed below in Section 4... Table ERD 10a MOSFETS: Extending MOSFETs to the End of the Roadmap. Table ERD 10b Charge-Based Beyond : Non-Conventional FETs and Other Charge-Based Information Carrier Devices

29 Emerging Research Devices Table ERD 10c: Alternative Information Processing Devices Extending MOSFETs to the End of the Roadmap Carbon Nanotube FETs The most often mentioned advantages of carbon nanotube FETs are the high mobility of charge carriers and the potential to minimize the subthreshold slope (i.e., minimize short channel effects) by a surround-gate geometry. However, there are multiple challenges to realizing these advantages, including (1) the ability to control bandgap energy, () positioning of the nanotubes in required locations and directions, () control of charge-carrier type and concentration, (4) deposition of a gate dielectric, and (5) formation of low-resistance electrical contacts. In the past two years, significant advances have been made in fabricating CNT FETs. These include (1) maintaining performance as the channel length is scaled down to 9 nm without observing short channel effects 1, () fabricating complementary gate-all-around FETs 14, () fabricating an FET with an intrinsic f T of 15 GHz 15, (4) fabricating inverters and pass-transistor logic operating at 0.4 V with a non-doped CNT 16, (5) fabricating a carbon nanotube computer composed of 178 FETs 17, (6) depositing arrays with density >500µm -1 of aligned and purified semiconductor CNTs with full surface coverage by the Langmuir-Schaefer method 18, and (7) fabricating CNT FETs with ON current of 15 ma/mm 18. In addition, continuous progress has been achieved for the remaining challenges, including sorting carbon nanotubes with chemical techniques to achieve a distribution of nanotubes with controlled bandgap energy, improving the purity of semiconducting nanotubes to ~99.9% 19, and large-scale single-chirality separation 0. The purity is still, however, many orders of magnitude less than would be required for fabrication of VLSI. A technology to remove dispersants is also needed to form good contacts and reduce unintentional doping. A technique for controlling carrier type utilizing interface charges incorporated in a high-k gate insulator 1 has been improved 14. The reliability, controllability, and the carrier trapping effect still need to be understood. Improved subthreshold swing of 6 mv/dec has been realized, using LaO x as the gate dielectric. Good contacts have been formed for both p-type and n-type FETs, but -compatible material is needed for n-type FETs. Even though significant progress has been made, the ultimate goal of depositing dense, aligned array of only semiconducting nanotubes as a high mobility channel replacement material on a silicon wafer remains elusive, as does fabricating nanotube circuits in an otherwise conventional process flow Graphene FETs Graphene materials offer the potential of extremely high carrier mobilities that can exceed those of carbon nanotubes, and also offer the promise of patterning graphene nanoribbons using conventional top down processes. Work on graphene field effect transistors (FETs) is proceeding at a rapid pace, but there are still many issues to address. Beginning with the first description of the electric field effect in graphene in 004 4, graphene FETs using bottom gating 4, topgating 5, 6, 7 dual-gating 8, 9, and side-gating 0 have now been demonstrated using exfoliated 1, epitaxial 5, 6, and CVD-grown graphene 4, 5. Research on graphene FETs started with use of exfoliated graphene to form a transistor channel. Recently, there have been many studies using epitaxial graphene on SiC substrates and CVD-grown graphene. Exfoliated graphene still offers the highest mobility 6, 7, 8 but is not manufacturable. Back-gated graphene FETs with SiO dielectric were typically shown to have field-effect mobilities up to around 10,000 cm /Vs 4 (note that although back-gating is not desirable for FET-based circuitry, this mobility value does act as a useful comparison to top-gate mobility values). It has been predicted that the room-temperature mobility of graphene on SiO is limited to ~40,000 cm /Vs due to scattering by surface phonons at the SiO substrate 9. In fact, the highest field effect mobilities were obtained using suspended graphene. Values as high as 10,000 cm /Vs and 1,000,000 cm /Vs at 40 K and liquid-helium temperature, respectively 6, 7, 8 have been measured. Recently, boron nitride, an inert and flat material, was used as a substrate for a graphene channel 40,41, and it was shown that the field effect mobility of such devices can exceed 100,000 cm /Vs at room temperature. Epitaxial graphene on SiC has exhibited carrier mobilities as high as 15,000 cm /Vs and 50,000 cm /Vs at room and liquid helium temperatures respectively 4, 4. On the other hand CVD graphene has shown carrier mobilities as high as 5,000 cm /Vs at room temperature 44. As for top-gated graphene-channel transistors, field-effect mobilities are in general lower than those shown above, because the deposition of gate dielectric can degrade the electrical properties of graphene 45. In order to avoid such degradation, a buffer layer is often used between graphene and high-k materials 46, 47. It was also shown that SiO layer formed by vacuum evaporation did not degrade graphene channel so much, and field-effect mobilities as high as 5,400 cm /Vs were achieved for top-gated transistors 48. In another case, naturally-oxidized thin aluminum layer was used as a

30 4 Emerging Research Devices seeding layer of Al O by atomic layer deposition and field effect mobilities as high as 8600 cm /Vs were achieved 49. The highest field effect mobility of top-gated graphene transistors was obtained using Al O nanowire as gate dielectric, which was as high as,600 cm /Vs 50. In this case, exfoliated graphene was used as a channel. An important limitation of graphene for digital applications is its zero bandgap energy which in turn will result in a very small I on /I off ratio. As mentioned above, however, several methods to open a bandgap have been proposed. One is to build devices with graphene nanoribbons 51, 5, 5, 54, 55. Carrier transport through a nanoribbon was first demonstrated using nanoribbons fabricated by a top-down approach 51. An energy gap of ~00 mev was obtained for a nanoribbon with a width of 15 nm. Recently, devices with multiple nanoribbons with a sub-10 nm half-pitch were fabricated using patterning with directed self-assembly of block copolymers 56. The on-off ratios of these devices were, however, still ~10 at room temperature. Nanoribbons were also made by several other methods 5, 5, 54, 55. Specifically, it was recently demonstrated that graphene nanoribbon with a precisely-controlled uniform width was synthesized by a bottom-up approach using 10,10 -dibromo-9,9 -bianthryl precursor monomers 57. The band gap of this nanoribbon was estimated to be ~.7 ev 58, 59. Formation of wider nanoribbons with a smaller band gap was also demonstrated using a similar approach recently 60. These bottom-up nanoribbons are typically formed on a clean Au(111) surface in ultra-high vacuum, and it is still difficult to control their direction and position, which is an obstacle for device fabrication. Devices using nanoribbons with a -nm width made by sonication of exfoliated graphite in a chemical solution showed an on-off ratio of 10 7, with a field effect mobility of ~00 cm /Vs 5. The relatively low mobility was considered to be caused by scattering at the edges of nanoribbons. In fact, recent theoretical studies showed that obtaining smooth edges is essential to obtain good electrical properties 61. In addition, recent experimental studies suggest that transport in graphene nanoribbon is greatly affected by defects at the edges and charged impurities 6, 6, 64. More efforts are required to realize graphene nanoribbon transistors for. Another approach to open a band gap in graphene is applying an electric field perpendicular to bilayer graphene 65, 66, 67, 68, 69. In fact, a transport gap of 10 mev was obtained at an electrical displacement of. V nm -1, providing an on-off ratio of ~100 at room temperature 69. With this approach, a high electric field would be required. Alternatively, a band gap can also be created by molecular doping on one side or both sides of bilayer graphene 70, 71, 7, 7, 74. It is predicted that a band gap exceeding 00 mev is formed by sandwiching bilayer graphene between n-type-dopant molecules and p-type-dopant molecules 7. A transport gap can also be obtained by making holes in graphene, namely by making graphene nanomesh 75, 76, 77. The transport gap can be controlled by the neck (distance between holes) width. An on-off ratio up to ~100 was obtained at room temperature using this method. Recently, it was demonstrated that a transport gap is created in graphene by introducing low-density defects with helium ion irradiation 78, 79. An on-off ratio of ~100 was obtained at room temperature. The approaches introduced here have not yet provided an on-off ratio large enough for logic applications; therefore, more efforts are required for this approach to be used in applications. An important application space for graphene may be RF with discrete elements and high linearity requirements. There have been many studies aiming at such high-frequency applications, 5, 80. A unity current gain cut-off frequency of 47 GHz has been obtained by a self-aligned- processe using exfoliated graphene with a channel length of 67 nm 81. Cutoff frequencies as high as 50 GHz and 00 GHz have also been obtained for devices using epitaxial and CVD graphene, respectively (channel length: 40 nm) 8. Achieving a high maximum frequency of oscillation is the next important step for realizing RF applications Nanowire Field-Effect Transistors (NWFETs) Nanowire field-effect transistors are structures in which the conventional planar MOSFET channel is replaced with a semiconducting nanowire. Such nanowires have been demonstrated with diameters as small as 0.5 nm 8. They may be composed of a wide variety of materials, including silicon, germanium, various III-V compound semiconductors (GaN, AlN, InN, GaP, InP, GaAs, InAs), II-VI materials (CdSe, ZnSe, CdS, ZnS), as well as semiconducting oxides (In O, ZnO, TiO ), etc. 84 Importantly, at low diameters, these nanowires exhibit quantum confinement behavior, i.e, 1-D ballistic conduction 85, and match well with the gate-all-around structure that may permit the reduction of short channel effects and other limitations to the scaling of planar MOSFETs. Important progress has been made in the fabrication of semiconducting nanowires for use as FET channels, for which there are two principal methods. The first method is litho-and-etch, by which semiconducting channels are formed through lithography and etch, followed by a printing or stamping process 86. The second is catalyzed chemical vapor deposition 87, 88. In particular, the vapor-liquid-solid (VLS) growth mechanism has been used to demonstrate a variety of nanowires, including core-shell and core-multishell heterostructures 89, 90. Heterogeneous composite nanowire structures have been configured in both core-shell and longitudinally segmented configurations using group IV and compound

31 Emerging Research Devices 5 materials. The longitudinally segmented configurations are grown epitaxially so that the material interfaces are perpendicular to the axis of the nanowire. This allows significant lattice mismatches without significant defects. Vertical transistors have been fabricated in this manner using Si 91, InAs 9, 9 and ZnO 94, with quite good characteristics. Coreshell gate-all-around configurations 95 display excellent gate control and few short channel effects. Circuit and system functionality of nanowire devices have been demonstrated, including individual logic gates with 50 GHz switching speed 96, a vertically integrated three stage ring oscillator with 108 MHz operating frequency 97, and extended, programmable arrays (a.k.a. tiles ) of nonvolatile nanowires that were used to demonstrate full-adder, full-subtractor, multiplexer, demultiplexer, and clocked D-latch operations 98. The measured operating speed of these various nanowire test circuits was limited by off-chip interconnect capacitance and thus did not achieve the THz operation that is predicted to be the intrinsic capability of nanowire devices 99. Nearly ideal subthreshold swing of 60 mv/dec has been obtained in gate-all-around nanowire transistors for both n and p type devices 00,01. Intrinsic switching energy as low as J has also been demonstrated 0, although it is still ~ orders of magnitude higher than predicted theoretically. Despite the promising results that are mostly obtained from academic research labs, nanowire transistors still face significant challenges before commercialization is feasible. For both nanoimprint and grown nanowires, there is an urgent need to improve device yield and uniformity, as well as position registry if the nanowires are to be transferred to a different substrate. Due to the high surface-volume ratio, the performance of nanowire transistors likely will be strongly influenced by surface roughness and surface defects, so proper surface treatment and passivation techniques need to be developed. Additionally, although individual devices with gate length < 40 nm has been demonstrated 0, it is unclear whether high density circuits with pitch size comparable to state-of-the-art can be fabricated. Instead of channelreplacement, nanowires may instead be better suited for alternative architectures such as crossbar memory and logic to fully take advantage of the 1D structure these devices offer. However, effective interface with peripheral circuitry still needs to be developed and serial resistance can significantly limit the array size. Another option is to integrate vertical nanowire devices directly on top of circuits, but this approach still remains to be demonstrated on a large scale P-type III-V Channel Replacement Devices III-V compound semiconductors as replacements for n-type channel materials have attracted considerable attention because of their excellent bulk electron mobilities 0. To create high performance circuits, high mobility p-channel materials are also required. Among III-V compound semiconductors, Sb-based semiconductors exhibit high hole mobilities when used as bulk materials; for example, InSb and GaSb show mobilities of 850 cm V -1 s -1 and 800 cm V - 1 s -1, respectively 04, which are significantly greater than the bulk Si hole mobility of 500 cm V -1 s -1. Moreover, hole mobility can be further increased by introducing biaxial compressive strain. This can be accomplished using pseudomorphic growth on a material with a smaller lattice constant 04, 05, 06, 07. Another method of improving hole mobility is the application of uniaxial strain 06, 08, which can be used in a similar manner as used in Si MOSFETs. The piezoresistance coefficient of p-ingasb is 1.5 times greater than that of Si when uniaxial strain is applied 09. Another advantage of InGaSb is its superior characteristics when used as an n-channel material. InGaSb has an electron mobility of over 4000 cm V -1 s -1. Thus, circuits can be fabricated using a single channel material system 10. The highest hole mobilities when using compressive strain were found in GaSb/AlAsSb heterostructures; they exhibit a mobility 07 of 1500 cm V -1 s -1. In the case of an InGaSb/AlGaSb system, 1500 cm V -1 s -1 was also observed 05. In InSb, the highest mobility 09 is 10 cm V -1 s -1. However, the thickness of the quantum well was 5 nm; a wider well might provide higher mobility 07. Regarding device performance, compressively strained InSb quantum well pfets with a gate length of 40 nm have an f T value of 140 GHz, a g m of 510 ms/mm, and an I on of 150 µa/µm at a power supply voltage of 0.5 V. When the gate length was 15 nm, the sub-threshold slope 09 was 90 mv/dec. A device on a Si substrate is essential as a replacement material for Si channels in MOSFETs. To realize III-V p-fets on a Si substrate, a transferred InGaSb nano-ribbon 11 is reported. The peak effective mobility of the device is 80 cm V -1 s -1, achieved using an InAs cladding layer for surface passivation and hole confinement. By employing high-k materials (10-nm-thick ZrO ), the obtained sub-threshold slope was 10 mv/dec, even at an interface surface density of cm - ev -1. In combination with an InAs channel using the same technique, gates were fabricated and the logic operations of NOT and NAND were demonstrated 1. In the case of this trial, the peak mobility was reduced to 70 cm V -1 s -1 because of the thin well (5 nm) in the enhancement mode. The obtained sub-threshold slope was 156 mv/dec. To realize a p-fet using an Sb-based channel, one notable problem is the interface property of metal-insulatorsemiconductor structures. At present, the best I-V properties have been reported for compressively strained InSb quantum well p-fets (with a gate length of 40 nm) using an HEMT structure to create a Schottky gate 09 ; a MOS structure is a preferred choice to achieve low gate leakage currents. In MOSFETs, the lowest sub-threshold slope is limited to 10

32 6 Emerging Research Devices mv/dec when the gate length is 5 µm 1. Although the interface state density is approximately cm - ev -1 in the mid-bandgap region, it rises to cm - ev -1 near the conduction band edge. Moreover, the insulator thickness (Al O ) in this trial was limited to 10 nm. Thus, improvements to the method of obtaining a low interface state density using a thinner insulator are required, because the interface state density strongly depends on the fabrication process 14. The requirement of having a high on-current is another problem. For a 5-µm-long MOSFET, an on-current of 70 µa/µm was reported 1 ; however, shorter gate lengths were not studied in this report. The shortest reported gate length in a p- MOSFET is 750 nm 14 and the channel dependence shows an inverse relationship between the gate length and the oncurrent. In the 750 nm-long device, the highest on-current was 70 µa/µm. Moreover, the estimated on-current in the InGaSb HEMT structure was 00 µa/µm, even when the gate length was 0 nm 15. Taking both the significant degradation of mobility at a high sheet carrier concentration and carrier starvation due to insufficient carrier concentration due to poor ion implantation in a III-V semiconductor into account, the feasibility of high on currents must be verified N-type Ge Channel Replacement Devices Ge-nMOSFETs have been demonstrated as conventional bulk planar MOSFETs and non-planar structures such as FinFETs and triangular-channel FETs 16, 17. High electron mobility in bulk Ge and (111)-inversion layer, which are.4 times and.1 times 18 the counterpart of Si, respectively, is considered a fundamental advantage of Ge-nMOSFETs. Although there is no advantage in Ge electron mobility compared to that of III-V compound semiconductors, comparable or even higher ballistic current has been estimated for (111)Ge-nMOSFETs due to the higher density of states 19. Strain technologies to enhance the electron mobility using source-drain stressors 0 and stress liners 1 have been demonstrated and simulated, respectively. GeSn semiconductor alloy with a Sn fraction of around 10%, in which direct band gap was predicted,, has also been examined for and Si-photonics devices. GeSn-nMOSFETs have been examined as a counterpart of a GeSn-pMOSFET in which higher hole mobility than in a Ge-pMOSFET is expected. Theoretical calculations for a lattice-relaxed GeSn-nMOSFET with a Sn fraction around 10% predicted slightly higher carrier injection velocity than in a Ge-nMOSFET thanks to the higher electron population to the Γ valley with a low effective mass. Note that the advantage is expected to disappear for a compressively strained GeSn device pseudomorphically grown on a Ge substrate. An advantage of introducing Ge(Sn)-nMOSFETs into devices can be recognized as the relatively lower cost to form all Ge(Sn)- devices than forming III-V nfet/ge pfet dual-channel devices. The best reported inversion electron mobility at a high inversion electron density (N s ) is 49 cm /Vs at 4 Ns= cm - or 488 cm /Vs at 5 Ns = cm -. A major advancement in the last few years is the gate stacks offering both highelectron mobility and low equivalent oxide thickness (EOT). It is significant that these high-mobility values were observed for devices with an EOT of around 1 nm for implementing Ge-nMOSFETs in scaled. 4 Gate length and EOT scaling have been demonstrated independently down to 50 nm 0 and 0.9 nm 6, respectively. The highest reported drain current, however, is only 80 µa/µm (V G -V T =0.9V, V D =1V). 17 Demonstration of pairs of Ge p-and n-mosfets has been reported 17, 7 for Ge-. Although circuit operation has not been demonstrated yet, a circuit simulation based on TCAD results for p- and n-channel Ge-on-insulator (GOI) MOSFETs predicted that the GOI- outperforms Si-on-insulator in most elemental circuits and conditions. 8 One of the most challenging issues is to reduce parasitic resistance in Ge-nMOSFET. Notable progress or attempts have been reported after the last ITRS publication in the aspect. Low temperature activation of n-type impurity at less than 400 C by means of microwave annealing 7 and chalcogen (S, Se, Te) co-implantation 9 have been reported. These results are meaningful for finding another application of Ge- devices which are required to form in low thermal budget such as sequentially stacked -LSIs. 0 Reduction in a specific contact resistivity (ρ c ) down to ohm.cm for NiGe/n + -Ge contact 1 and enhancement of electron density up to over 10 0 cm - by Sb ion implantation and laser annealing processes have been reported. Theoretical calculation predicted the specific contact resistivity required in ITRS (< 1x10-8 ohm.cm ) can be satisfied for a metal/0.7-nm-zno/n + -Ge contact if an electron density of over cm - is attained in the Ge layer. The most significant issue to implement Ge-nMOSFETs into the market is the low current drivability of the short-channel devices, which is even lower than that of conventional Si-nMOSFETs. Reducing the high parasitic resistance in the GenMOSFETs is a key technology to obtain acceptable current drivability as a counterpart of high-performance GepMOSFETs. One of the major problems is the low activation of n-type dopants in Ge, which is usually saturated up to around 1x10 19 cm - by means of conventional ion-implantation and RTA techniques. Suppressing dopant diffusion against the high diffusivity of the n-type dopant in Ge, which is the other problem, should also be examined. Enhancing the activation of n-type dopants by means of a laser annealing or an in-situ doping epitaxial growth is a possible solution., 4 However, an ultra-shallow junction with a depth of less than 10 nm has not been demonstrated, although values of electron concentration over 10 0 cm - have already been reported as mentioned above. On the other hand, reduction of ρ c down to ohm.cm may be attainable if the high activation of dopants is realized. These technologies

33 Emerging Research Devices 7 should be consistent with integration processes for technology generations of 14 nm node and beyond. A common gate stack process for p- and n-mosfets is also desirable due to cost merit against III-V/Ge dual channel devices. The first priority to motivate the Ge-nMOSFET research is to demonstrate high performance of short-channel devices comparable to or higher than that of Si-nMOSFET by integrating the process to reduce parasitic resistance and the gate stack technology to realize sub-nm-eot and low interface state density Tunnel FETs Tunnel FETs are gated reverse-biased p-i-n junctions that are expected to have I on I off transitions much more abrupt than conventional MOSFETs, whose 60mV/dec subthreshold swing limit at room temperature is set by the thermal injection of carriers from the source to the channel 5, 6, 7. By reducing the subthreshold swing below 60mV/decade, Tunnel FET can operate at much lower V DD and thereby dissipate substantially lower power. At low gate bias, the width of the energy barrier between the intrinsic region and the p + region is wide enough to have insignificant band-to-band tunneling probability, and the device is in the OFF-state. As the positive gate voltage increases, the bands in the intrinsic region are pushed down in energy, narrowing the tunneling barrier and allowing tunneling current to flow. Band-to-band tunneling (BTBT) is a quantum-mechanical phenomenon, providing a much more abrupt transition between I on I off states of a three terminal switch compared to the 60mV/decade MOSFET limit. The subthreshold swing is not constant, with the smallest values at low current levels. Tunnel FETs are actively investigated due to their potential for low standby leakage current, as enablers for logic circuits operating with a supply voltage below 0.5V. Recent reports suggest that Tunnel FETs could also be considered for high performance switch, by using appropriate heterostructure architectures 8 and/or exploiting low band-gap materials such as III-V compound semiconductors, Ge, SiGe, or graphene. Tunnel FETs are expected to match the speed performance of (CV/I metrics) at the same supply voltage but with 9 lower I on. Many detailed device simulations have predicted that BTBT FETs could produce subthreshold swings below the thermal limit in devices based on conventional semiconductor materials such as silicon 40 or SiGe 41, carbon-nanotube (CNT) 9 or graphene 4. Since the tunneling current is determined by the bandgap and effective mass of the material, the silicon TFET is limited by its low on-state current density, which may only be improved by high amounts of strain at the tunneling junction (>GPa) 4. Subthreshold swings below 60mV/dec have been measured for several different fabricated Tunnel FETs. The first reported results were on a carbon nanotube Tunnel FET 44 whose swing was 40mV/dec over a limited range of small currents. In 008, two separate Tunnel FETs in the Si/Ge system exhibited a point swing (a subthreshold swing in a small gate voltage range) of 50mV/dec 45 and 4mV/dec 46. Silicided source devices with 46mV/dec 47, 48 with high I on /I off (>10 7 for 0.5V operation) for moderate I on ~100µA/µm at V DS =1V have been reported. In 009, long channel VLS grown silicon nanowire Tunnel FETs 48 were reported with high-k dielectric, I on /I off of the order of 10 7 and an average subthreshold swing over two decades of 10mV/dec at V DS =0.5V. Performance improvements in 5nm narrow fin-like device architectures have been shown 49, with a point slope of 46 mv/dec at low biases and an I on /I off of 10 6 at a supply voltage of 1. V (I on =46μA/μm and I off =5pA/μm) with a MuGFET technology using a high-k dielectric and a metal gate inserted gate stack. All these advanced studies reveal that high-performance tunnel FET may not be achieved with allsilicon architectures and may require Ge or III-V material integrated on silicon platforms. Key challenges for tunnel FETs include optimization of the device architecture for high I on combined with an average subthreshold swing lower than 60mV/decade over at least four decades of current. The engineering of the source tunneling region (junction abruptness, bandgap energy, carrier effective mass) and enhancement of gate control on internal electric field are important for realizing tunnel FET device, which is also predicted by numerical simulations. In this respect, tunnel FETs can benefit from heterostructures with low bandgap materials on silicon platforms, which is still a technological challenge. Critical research is needed to demonstrate staggered bandgap heterostructure tunnel FETs exploiting narrow bandgap materials on silicon substrates 50 with sub-0.5v power supply and I on performance enabling operation at switching speed on the order of GHz. Significant progress in the future design of integrated circuits based on tunnel FETs or their co-design with requires the development of specific compact models. 4.. Charge-Based Beyond : Non-Conventional FETs and Other Charge-Based Information Carrier Devices Spin FET and Spin MOSFET Transistors Spin-transistors are classified as Non-Conventional Charge-based Extended Devices. 51 Spin-transistors can be divided into two categories, i.e., the spin-fets proposed by Datta and Das 5 and spin-mosfets proposed by Sugahara and Tanaka. 5 Both types of spin-transistors have structures with ferromagnetic source and drain that act as a spin

34 8 Emerging Research Devices injector and a detector, respectively. Although these device structures are similar, their operating principles are quite different. 51, 54 In the spin-mosfets, the gate function is the same as for the ordinary transistors (the current switching function). On the other hand, in the spin-fets, the gate function is the control of a spin direction by utilizing the Rashba spin-orbit interaction. Both devices exhibit transistor behavior with the functionalities of a magnetoresistive device. The important features of spin-transistors are variable current drivability controlled by the magnetization configuration of the ferromagnetic electrodes (spin-mosfets) or the spin direction of the carriers (spin-fets) and nonvolatile information storage using the magnetization configuration. These features are very useful functionalities for energy-efficient, lowpower circuit architectures unachievable by ordinary circuits. Nonvolatile logic and reconfigurable logic circuits using the spin-mosfet and the pseudo-spin-mosfets, which are suitable for power-gating systems with low static 54,55, 56, 57, 58, 59, 60 energy, have been proposed. Operation of spin-fets 54, 61 and spin-mosfets 54, 6, 6, 64 has not yet been fully verified experimentally. However, there has been important progress in the underlying technologies. To realize fully functional spin transistors, spin injection, detection and manipulation are essential 65. A key development is half-metallic ferromagnetic materials for highly efficient spin injector/detectors. Theories 66, 67, 68, 69 also predict that the insertion of a tunnel barrier between ferromagnet (FM) and semiconductor (SC) to optimize the interface resistance between FM and SC is a promising method for demonstrating highly efficient spin injection and detection. Spin injection and detection in Si have been achieved using hot-electron transport 70 or spin-polarized tunneling. 71, 7, 7, 74, 75, 76 The electrical creation and detection of spin accumulation in n-type and p-type Si were demonstrated using FM/tunnel contacts 71, 7, 74, 75, 76, 77, 78 and FM/Schottky-tunnel-barrier contacts 79 up to room temperature (RT). Electrical spin injection to Si channels was also demonstrated up to 500K 7 by using a spin relaxation 7, 75, measurement method (Hanle measurement). A relatively long spin lifetime in heavily doped Si at RT was observed. 76, 80 The spin transport in SC has been intensively studied in various kinds of tunnel and channel materials. Recently, the observation of spin signals in heavily doped Ge 81, 8 and GaAs 8, 84 at RT was also reported. Another recent study 85 showed low-resistance spin injection and detection into Si using a graphene tunnel barrier. Localmagnetoresistance signal up to RT was observed for long channel devices with FM/MgO tunnel barrier/si lateral spin 86, 87 valve structures. For the development of half-metallic ferromagnetic materials, there have been many breakthroughs as shown in the 011 ITRS edition and references. 51 A half-metallic full-heusler alloy 88, 89, 90, 91, 9 is one of the most promising halfmetallic ferromagnetic materials, because of the relative ease of preparation. High quality full-heusler alloy thin films have been formed on semiconductors. 9, 94, 95 Efficient spin injection into semiconductors using half-metallic full- Heusler alloy electrode has not yet been reported; however, recently efficient spin injection into nonmagnetic metal was observed. 9 Improved quality of FM/SC interfaces is required to observe efficient spin injection into semiconductors. It should be noted that the technology of the half-metallic materials is a requisite technique for spin-fets, because the current switching function must be controlled by the spin precession induced by the gate voltage operation. On the other hand, in spin-mosfets, the relative magnetization configurations of the source and drain are used to modify output current. Therefore, the half-metallic materials may not be required for spin-mosfet. Alternative approaches for realizing spin-mosfets have been proposed. 54, 59, 6,96 A pseudo-spin-mosfet is a circuit for reproducing the functions of spin-mosfets using an ordinary MOSFET and a magnetic tunnel junction (MTJ) connected to the MOSFET with a negative feedback configuration. The pseudo-spin-mosfet can realize the spintransistor behaviors such as variable current drivability; however, the resistance of the pseudo-spin-mosfet is larger than that of spin-fet or spin-mosfet. In terms of spin manipulation, spin precession of polarized carriers in the channel controlled by a gate voltage was observed experimentally at low temperature. 97 This result confirms that spin-orbit interaction can be controlled by gate voltage. Materials with strong spin-orbit interaction, e.g., InGaAs, InAs and InSb, are required 54 for the channel to sufficiently induce the Rashba spin-orbit interaction. However, materials with strong spin-orbit interaction decrease spin life time. In order to increase spin life time, it is proposed to use a narrow wire channel structure 98, 99, 400 and the socalled persistent spin helix condition. 401, 40, 40, 404 The experimental proof of spin injection, detection and manipulation at RT are necessary for realizing spin-fet. On the other hand, the spin-mosfet 64, 65 54, 96 and the pseudo-spin-mosfet use the spin-transfer-torque (STT) switching like STT-MRAM that has recently been commercialized. 405 Up to now, long channel devices with center-to-center spacing of the FM source/drain electrodes larger than l FM-FM > 1000 nm have been investigated. For fully functional spin-mosfets and spin-fets at RT, experimental studies using short channel devices with l FM-FMl < 100 nm would be necessary Negative Gate Capacitance FET Based on the energy landscapes of ferroelectric capacitors, it has been suggested 406 that by replacing the standard insulator of a MOSFET gate stack with a ferroelectric insulator of appropriate thickness, it should be possible to

35 Emerging Research Devices 9 implement a step-up voltage transformer that will amplify the gate voltage. Such a device is called a negative gate capacitance FET. The gate operation in this device would lead to subthreshold swing (STS) lower than 60 mv/decade and might enable low voltage/low power operation. The main advantage of such a device 407 is that it is a relatively straightforward replacement of conventional FET. Thus, high I on levels, similar to advanced would be achievable with lower voltages. An experimental attempt to demonstrate a low-sts Fe-FET, based on a P(VDF-TrFE)/SiO organic ferroelectric gate stack, was reported in However, this particular experiment demonstrated <60 mv/decade only at very low currents (~1nA) that could be prone to noise. A 010 demonstration 409 showed <60 mv/decade subthreshold swing at higher current levels (~50nA) and over a few decades of current. In addition, careful measurements were done to characterize the noise levels properly (these were found to be in the 10pA range). This experiment thus established the proof of concept of <60mV/decade operation using the principle of negative capacitance. In addition to the above experiments on polymer based ferroelectrics, negative differential capacitance was demonstrated in a crystalline capacitor stack 410. Essentially, it was demonstrated that in a bi-layer of dielectric Strontium Titanate (SrTiO : STO) and Lead Zirconate Titanate (Pb x Zr 1-x TiO : PZT), the total capacitance is larger than what it would be for just the STO of the same thickness as used in the bi-layer. Note that capacitance of two serially connected capacitors would normally be smaller than either constituent. It can be larger only if one of those constituent capacitors has a negative differential capacitance. Thus the enhanced capacitance demonstrates the stabilization of PZT at a state of negative differential capacitance. At present, a major research challenge concerns identification of appropriate materials (ferroelectrics and/or oxides) that can provide the best swing with minimal hysteresis. The crystalline demonstration 410 occurred only at high temperatures (>00 0 C). Materials optimization is needed to bring it down to room temperature. In a MOSFET structure, if the negative capacitance can be properly matched with the device capacitance 406, 407, a very steep swing can be achieved without any significant hysteresis. However, due to the fact that capacitance varies significantly with voltage for a MOSFET, matching capacitance over a large range of voltage may prove to be challenging. Recent theoretical analysis has shown how to obtain an improved match using retrograde doping 411, 41. A second significant challenge comes from the integration of high quality single crystalline ferroelectric oxides on Si. One possible pathway could be to use Strontium Titanate (STO) as a template to grow single crystalline ferroelectric materials like PZT on Si. In principle, the scalability of the device should be similar to the one of a MOSFET. However, no study on scalability has been performed as yet NEMS Switch Micro/Nano-Electro-Mechanical (M/NEM) switches are devices which employ electrostatic force to actuate a movable structure to make a conductive path between two electrodes. Mechanical switches feature two fundamental properties which are unattainable in MOSFETs: zero off-state leakage and zero subthreshold swing 41. The first property provides for zero standby power dissipation, while the second property suggests the potential to operate at very low voltage for low dynamic power dissipation as well. Another advantage is that a mechanical switch can be operated with either positive or negative voltage polarity due to the ambipolar nature of the electrostatic force, so that an electrostatic relay can be configured as a pull-down or pull-up device, respectively 414. Additional advantages of mechanical devices include robustness against high temperature 415, immunity to ionizing radiation and compatibility with inexpensive substrates such as glass or even plastic. Since M/NEM switches can be fabricated at relatively low process temperatures, they can be integrated with in a modular fashion. Possible applications for hybrid NEMS- technology include power gating 416 and configuration of FPGAs 417 using relays. M/NEM switches can be fabricated using conventional planar processing techniques (thin-film deposition, lithography and etch steps), with a final release step in which a sacrificial material such as silicon dioxide, photoresist, polyimide or silicon is selectively removed to form the actuation and contact air-gaps. (The gap in the contact region(s) can be made smaller than the actuation gap to reduce the switching delay and energy as well as the contact velocity for reduced wear and contact bounce.) The smallest actuation and contact gap demonstrated so far for a functional NEM structure fabricated using a top-down approach is 4 nm. 418 The device consists of a vertically actuated -terminal switch featuring a 0 nm-thick, 00 nm-wide, 1.4 µm-long doubly clamped TiW beam with a pull-in voltage of approximately 0.4 V. As expected, the off-state current is immeasurably low and the sub-threshold swing is practically zero. In a digital circuit it is necessary to connect multiple switches in series to implement various logic functions. To avoid having the state of a switch depend undesirably on the state of other switches in the series stack, a four-terminal logic relay design 419 is needed: the voltage applied between the movable (gate) electrode and a fixed (body) electrode determines the state of the relay, i.e. whether current can flow between the two output (source and drain) electrodes. Similarly as for MOSFETs, a constant-field scaling methodology can be applied to logic relays to improve device density, switching delay and power consumption. 41 The ultimate device density achievable with relays may be comparable to that

36 0 Emerging Research Devices achievable with MOSFETs, in principle. The switching delay of a NEM switch is much longer than that of a MOSFET, however, because it is dominated by the mechanical (motional) delay, ~1 ns, 41 rather than the electrical (charging/discharging) delay. Given the large ratio between the mechanical and electrical delays of a relay, an optimized relay-based IC design should arrange for all mechanical movement to happen simultaneously, i.e. relay-based digital logic circuits should be comprised of single-stage complex gates so that the delay per operation is essentially one mechanical delay. 414 This generally results in significantly lower device counts as compared to the optimal implementations, especially since a relay can pass both low and high logic levels and the body electrode also can be connected to a logic signal. By incorporating multiple pairs of source and drain electrodes into each gated structure, the device count and hence the area required can be further reduced. A variety of relay-based computational and memory building blocks have been experimentally demonstrated to date. 40,41 Since optimized relay-based logic uses very different circuit topologies, an assessment of the prospective benefits of NEM switch technology must be made at the level of complete circuit blocks. Such circuit-level assessments (e.g. of full adder and multiplier circuits) indicate that relays can potentially provide for more than 10 reduction in energy per operation as compared with MOSFETs, and can reach clock speeds in the GHz regime. 41,4 Thus, the main potential advantages of NEM switches are improved energy efficiency as well as greater functionality and suitability for threedimensional integration to provide for increased functional density per unit chip area. Moreover, features such as electromechanical hysteresis, which can be enhanced by a built-in electric field and/or surface adhesive force, make scaled mechanical switches attractive for embedded non-volatile memory application. 4 Reliable operation is critical for practical application of M/NEM switches in electronic circuits. Due to their extremely small mass (less than 1 ng), mechanical vibration/shock is not an issue. Structural fatigue is easily avoided by designing the movable electrode such that the maximum induced strain is well below the yield strength. Mechanical wear and Joule heating at the contacting points leads to increased real contact area and eventually stiction-induced device failure. This issue can be mitigated by using a refractory material to minimize wear and material transfer, and by reducing the device operating voltage. (It should be noted that the higher associated contact resistance would not significantly compromise performance, since the speed of an optimally designed relay-based logic circuit is limited by the mechanical delay rather than the electrical delay. The on-state resistance (R ON ) of a logic relay can be as high as ~10 kω. 414 ) MEM switches with tungsten contacts have been demonstrated to have endurance up to 1 billion on/off cycles at.5 Volts, for a relatively large load capacitance of 00 pf (i.e. exaggerated electrical delay). 44 Endurance exceeding on/off cycles is projected for operating voltage below 1 Volt and load capacitance below 1 pf. A gradual increase in contact resistance caused by surface oxidation or the formation of friction polymers during the course of device operation can lead to circuit failure and is the primary reliability challenge for M/NEM logic switches today. 44 Stable conductive oxide contact materials and/or hermetic packaging are potential solutions to this issue. There are multiple factors which will eventually limit the dimensional scaling of M/NEM switches. First, quantum mechanical tunneling limits reduction of the contact gap to ~ nm. Second, contact adhesive force sets the minimum spring restoring force of the movable electrode (to ensure that the switch turns off), which in turn sets a lower limit on the electrostatic actuation force (to ensure that the switch turns on) and hence the switching energy. For ultimately scaled contacts, the surface adhesion energy will be set either by metallic bonding or by van der Waals force (for oxidized contacting surfaces). The minimum switching energy for a nanoscale relay is anticipated to be on the order of 1 aj, which is more than 10 lower than for an ultimately scaled MOSFET. 45 In conclusion, M/NEM switches are interesting candidates for LSTP applications because they provide for zero standby power dissipation. If contact adhesive force is reduced with scaling, they can also provide much lower active power dissipation than MOSFETs. Stability of contact resistance is a key requirement that warrants fundamental research. A circuit-level assessment of energy vs. delay performance indicates that nanoscale relays can potentially provide more than 10 improvement in energy efficiency as compared with transistors, for applications requiring clock speeds in the 100 MHz range. Thus, they are poised to lead a resurgence in mechanical computing for energy-efficient electronics Atomic Switch The atomic switch is classified as an electrochemical switch using the diffusion of metal cations and their reduction/oxidation processes to form/dissolve a metallic conductive path 46, similar to Resistance Random Access Memories (ReRAMs) in which oxygen vacancies are controlled to make a conductive path 47. The difference appears in their respective electrode materials. The atomic switch has a reversible electrode for introducing metal atoms (cations) into the ionic conductive materials 47, and is both experimentally 48 and theoretically 49 confirmed. On the other hand, both electrodes of ReRAMs are inert and use oxygen vacancies to enable formation of a conductive path.

37 Emerging Research Devices 1 The atomic switch was initially developed as a two-terminal device using sulfide 48,49,40,41 materials embedded in a crossbar architecture 4 with scalability down to 0 nm 4,44. Later, an atomic switch using fully compatible materials was developed to enable the formation of these devices in the metal layers of 45,46,47,48,49,440,441. This configuration resulted in the development of new type of programmable logic device 44,44. One advance in this field was development of three-terminal atomic switches 444,445, characterized by high I on /I off ratio, low ON-resistance, nonvolatility, and low power consumption. Several operating mechanisms have been proposed, including gate-controlled formation and annihilation of a metal filament 444, and gate-controlled nucleation of a metal cluster 445. The latter mechanism can be utilized for volatile operation by controlling the metal cation density to be less than that required for nucleation of stable clusters. The long retention time of the metal filament has been confirmed 46. Switching time is on the order of ns 446,447 and the switching repetition of times have been confirmed by two-terminal atomic switches 48. One of the major proposed applications of these two-terminal atomic switches has been for use as programmable switches in FPGAs, utilizing their low on-resistance and high I on /I off ratio. Integration of the two-terminal atomic switches, operated at 1.8 V, on 00-mm wafers was recently demonstrated 448. Other notable progress in two-terminal atomic switches is their application for neuromorphic systems, in which their resistive memory (memristive) effect has been used for demonstrating synaptic functions 449, which can be used in STDP-based neuromorphic systems. High I on /I off ratio (10 8 ) and low power consumption (pw) have been demonstrated using three-terminal atomic switches based on the gate-controlled nucleation of a metal cluster 445, in which a positive gate bias was used. For the purpose of replacing, a three-terminal atomic switch operated with negative gate bias has been developed using Ta O 5 as an ionic transfer material and Pt as a gate electrode 450. Although the operating mechanism is still unclear, linear I sd /V sd characteristics in the on-state clearly suggest a formation of a metallic conductive channel. The gate bias used in the demonstration was quite large (-10 V). Switching speed, cyclic endurance, uniformities of the switching bias voltage and resistances both for the on-state and the off-state should be improved for general usage as a logic device, especially in the development of three-terminal atomic switches. Although the device physics of the two-terminal type have been established both theoretically and experimentally in a past few years 451,45, those of the three-terminal type are relatively unknown and fundamental researches on the switching phenomena is needed. In addition, development of a system architecture for these nonvolatile devices is desired. Application for neuromorphic systems has become a potential target of atomic switches because a single atomic switch can emulate complicated neuromorphic functions, which have been emulated by a circuit consisting of many and analog devices. Although the characteristics of single atomic switches have been extensively studied 45, as with the other neuromorphic devices 454,455, system integration and the development of compatible architectures are still needed Mott FET Mott field-effect transistor (Mott FET) utilizes a phase change in a correlated electron system induced by a gate as the fundamental switching paradigm 456,457. Mott FETs could have a similar structure as conventional semiconductor FET, with the semiconductor channel materials being replaced by correlated electron materials. Correlated electron materials can undergo Mott insulator-to-metal phase transitions under an applied electric field due to electrostatically doped carriers. 458,459 Besides electric field excitation, the Mott phase transition can also be triggered by photo- and thermalexcitations for potential optical and thermal switches. Mott FET structure has been explored with different oxide channel materials 457. Among several correlated materials that could be explored as channel materials for Mott FET, vanadium dioxide (VO ) has attracted much attention due the sharp metal-insulator transition temperature (nearly five orders in single crystals) 460. The phase transition time constant in VO materials is in sub-picosecond range determined by optical pump-probe methods 461. Device modeling indicates that the VO -channel-based Mott FET lower bound switching time is of the order of 0.5 ps at a power dissipation of 0.1 µw 46. VO Mott channels have been experimentally studied with thin film devices and the field effect has been demonstrated in preliminary device structures 46, 464, 465. Recently, prototypes of VO transistors using ionic liquid gating have shown larger ON/OFF ratio at room temperature than with solid gate dielectrics like hafnia. 466, 467 The conductance modulation happens at a slow speed however, due to the large charging time constants. 468 The possibility of electrochemical reactions must also be carefully examined in these proof-of-principle devices due to the instability of the liquid-oxide interfaces and 468, 469, 470 the ease of cations in such complex oxides to change valence state. Experimental challenges with correlated electron oxide Mott FETs include fundamental understanding of gate oxidefunctional oxide interfaces and local band structure changes in the presence of electric fields. Methods to extract quantitatively properties (such as defect density) of the interfaces are an important topic. The relatively large intrinsic carrier density in many of the Mott insulators requires the growth of ultra-thin channel materials and smooth gate oxide-

38 Emerging Research Devices functional oxide interfaces for optimized device performance. It is also important to understand the origin of low roomtemperature carrier mobility in these materials. 458 Theoretical studies on the channel/dielectric interfacial electronic band structure are needed for the modeling of subthreshold behaviors of Mott FETs. Understanding the electronic transition mechanisms while de-coupling from structural Peierls (lattice) distortions is also of interest and important in the context of energy dissipation for switching. While the electric field-induced transitions are typically explored with Mott FET, nanoscale thermal switches with Mott materials could also be of substantial interest. Recent simulation studies of ON and OFF times for nanoscale twoterminal VO switches indicate possibility of sub-ns switching speeds in ultra-thin device elements in the vicinity of room temperature. Rise time on the order of ns (faster than what is expected for Joule heating) is very promising for ultra-fast switches, which suggests electronic correlation effects 471. Such devices could also be of interest to Mott memory 47. One can in a broader sense visualize such correlated electron systems as threshold materials wherein the conducting state can be rapidly switched by a slight external perturbation, and hence lead to potential applications in electron devices. Electronically driven transitions in perovksite-structured oxides such as rare-earth nickelates 47,474 or cobaltates 475 with minimal lattice distortions would also be relevant in this regard. Three-terminal devices are being investigated using these materials and will likely be an area of growth. 476, 477, 478, 479 SmNiO with its metal-insulator transition temperature near 10ºC and nearly hysteresis-free transition is particular interesting due to the possibility of direct integration onto platforms. Floating gate transistors have recently been demonstrated on silicon ALTERNATIVE INFORMATION PROCESSING DEVICES Spin Wave Device Spin Wave Device (SWD) is a type of magnetic logic devices exploiting collective spin oscillation (spin waves) for information transmission and processing 481,48. The basic elements of the SWD include: (i) magneto-electric cells (e.g. multiferroic elements) aimed to convert voltage pulses into the spin waves and vice versa; (ii) magnetic waveguides - spin wave buses for spin wave signal propagation between the magneto-electric cells, (iii) magnetic junctions to couple two or several waveguides, and (iv) phase shifters to control the phase of the propagating spin waves. SWD converts input voltage signals into the spin waves, computes with spin waves, and converts the output spin waves into the voltage signals. Computing with spin waves utilizes spin wave interference, which enables functional nanometer scale logic devices. Since the first proposal on spin wave logic 481, SWD concept has evolved in different ways encompassing volatile 481,48,48 and non-volatile 484, Boolean 484 and non-boolean 485, single-frequency 481 and multi-frequency circuits 486. The primary expected advantage of SWD over Si are the following: (i) the ability to utilize phase in addition to amplitude for building logic devices with a fewer number of elements than required for transistor-based approach; (ii) power consumption minimization by exploiting built-in non-volatile magnetic memory, and (iii) parallel data processing on multiple frequencies in a single core structure by exploiting each frequency as a distinct information channel. During the past five years, - and 4-terminal SWDs have been experimentally demonstrated 487,488. The micrometer scale prototypes with Ni 81 Fe 19 waveguides operating at 1-GHz frequency range demonstrate signal-to-noise ratio aroud 10 at Room Temperature. The internal delay of SWD is defined by the spin wave group velocity (e.g cm/s in Ni 81 Fe 19 waveguides). Power dissipation in SWD is mainly limited by the efficiency of the magneto-electric cell translating input voltage into spin wave. Recent experiments 489 with synthetic multiferroics comprising piezoelectric (lead magnesium niobate-lead titanate PMN-PT) and magnetostrictive (Ni) materials have demonstrated spin wave generation by relatively low electric field (e.g. 0.6MV/m for PMN-PT/Ni). An important experimental result obtained in CoTaZr magnetic cross junction 490 shows that spin wave propagation through the junction is strongly dependent on the junction magnetization, which provides an additional degree of freedom for logic circuits construction. Magnonic phase shifters combining a nanomagnet with a spin wave bus have been studied theoretically, and the feasibility of non-volatile element was demonstrated by micromagnetic simulations 491. A detailed progress report on SWDs is presented in ref. 49. There are several important milestones to be achieved for further SWD development: (i) nanomagnet switching by spin wave (e.g. by the combined effect of the voltage-assisted anisotropy change and a magnetic field produced by the incoming spin wave); (ii) integration of several magneto-electric cells on a single spin wave bus; and (iii) demonstration of SWD operating on more than one frequency. In order to have an advantage over Si in functional throughput, the operational wavelength of SWDs should be scaled down below 100nm 484. All the demonstrated SWD prototypes so far utilized spin waves of micrometer scale wavelength, which makes them immune to waveguide structure imperfections. It is not clear if the scaling to deep sub-micrometer range would significantly affect the signal to noise ratio and the speed of spin wave propagation. Defect tolerance and noise immunity are the most critical challenges. The success of the SWD will also depend on the ability to restore/amplify spin waves (e.g. by multiferroic elements or spin torque oscillators) Nanomagnetic Logic

39 Emerging Research Devices Nanomagnetic Logic (NML) uses fringing field interactions between magnetic islands to perform Boolean 49 and non- Boolean 494,495,496 logic operations. Binary information is represented via magnetization state. Most work with NML has focused on devices that couple in-plane (inml). Recent work has also employed devices with out-of-plane, perpendicular magnetic anisotropy (PMA). Perpendicular magnetic logic (PML) devices switch through nucleation and domain wall propagation 497,498. This reversal behavior is markedly different from in-plane permalloy nanomagnets, which remain nearly single-domain during switching and rotate coherently. In all implementations of NML, externally supplied switching energy is generally needed to re-evaluate (i.e., clock ) a magnet ensemble with new inputs 499. A clock modulates the energy barriers between magnetization states of the devices that comprise an NML circuit. With inml, subgroups of devices are usually placed in a metastable state when clocked, and an input signal at one end is allowed to propagate through a part of said subgroup 500. The other end of the subgroup is held in a metastable state to prevent back propagation of data. While PML devices could be clocked in a similar manner, it is also possible to globally clock a large ensemble of PML devices (e.g., with an oscillating, out-of-plane magnetic field 496 ). Notably, both NML implementation are capable of retaining state without power and could be radiation hard. Moreover, it is possible that NML devices would dissipate less than 40 kt per switching event for a gate operation 49. When clock overhead is considered, projections suggest that NML ensembles could still best low power equivalents in terms of metrics such as energy delay product, etc. Finally, NML appears to be scalable to ultimate limits using individual atomic spins 501. For inml, fanout structures 50, 1-bit full adder 50, etc. have been experimentally demonstrated and successfully reevaluated with new inputs. Both field-coupled 504,505 and spin transfer torque (STT) 506,507,508 electrical inputs have been realized. For electrical output, multiple inml-magnetic tunnel junction hybrids 509 have been proposed and simulated. Field based, compatible line clock structures have been used to simultaneously switch the states of multiple magnetic islands 510, as well as to re-evaluate inml lines and gates with new inputs 511. Additionally, recent experiments have explored materials-based solutions to further reduce field/energy requirements for line clock structures. Some recent results 51,51 suggest that present state-of-the-art component energy metrics could be further reduced by as much as 16X. Voltage controlled clocking e.g., multiferroics 514 and magnetostriction 515 have also been proposed as potential clocking mechanisms for inml. With respect to magnetostriction, recent work 515 suggests that stress-based clocking would lead to clock energy dissipation of just ~00 kt per device. The Spin Hall Effect could be employed as a path to reduced energy clocking (i.e., X) 516. For PML, experimental results suggest that appropriately irradiated PML structures 517 (to define dataflow directionality) can be controlled by a uniform, homogeneous, oscillating, global clock field 518. Majority gates 519, AF-lines 518, multiplane signal crossings 50, and full adders 496,51 have all been experimentally demonstrated with this approach. Field coupled electrical inputs have also been demonstrated 5. Large arrays of PML devices could be controlled with on-chip inductor structures 5 that could be coupled with a capacitance in an LC oscillator which opens the door to adiabatic energy recycling. Devices with PMA are also amenable to voltage controlled clocking 54. Individual PML devices have been clocked with on-chip inductor structures at 10 MHz 55. Simulation-based studies 55 suggest that a NAND/NOR operation would require just.8 aj (assuming 00 nm x 00 nm devices and 50 MHz frequency and 10 functional layers of devices.) There are three primary areas in which research issues remain to be addressed: reliable switching, fault-tolerant architectures, and energy-efficient clocking. Reliable switching: Work from 008 suggests that in a soliton operating mode, dipole-to-dipole coupling could be insufficient to prevent thermal noise from inducing premature/random switching 56. At the device-level, magnetocrystalline biaxial anisotropy could further promote hard axis stability of an inml ensemble 56. Alternatively, adiabatic switching 57 and/or field gradients could potentially mitigate the effects of premature switching. Simulations suggest that NML circuits could tolerate some field misalignment 58. Whether or not a circuit ultimately exhibits reliable and deterministic switching is very much a function of how it is clocked and requires additional study. Fault tolerant architectures should be explored. As one example, stochastic computing (SC) could be an effective architectural strategy. Here, serial bit streams or parallel bundles of wires are used to encode probability values to represent and processes information 59. While bit streams/wire bundles are digital, information is conveyed through the statistical distribution of the logical values. With physical uncertainty, the fractional numbers correspond to the probability of occurrence of a logical one versus a logical zero. Computations in the deterministic Boolean domain are transformed into probabilistic computations in the real domain. Complex functions with simple logic are possible. Bit flips afflict all the bits in the stream with equal probability. The result of bit flips is a mere fluctuation in the statistics of the stream, not a catastrophic error. Thus, SC enables meaningful computation even with high device error rates 59. Furthermore, the SC architecture is pipelineable by nature. As the length of the stochastic bit stream increases, the precision of the value represented by it also increases. This allows a system to tradeoff precision with computation time.

40 4 Emerging Research Devices Thus, higher error rates that NML might experience can be tolerated by SC architectures. Pipelined SC architectures are a natural fit for inherently pipelined NML. Energy efficient clocking: Voltage controlled clocking should continue to be pursued not only for the benefits with respect to energy 50, but also with respect to the fine-grained control it provides for an NML ensemble (which is useful in reducing error rates 51 as well as architecturally 5 ). Domain-wall based switching could be another viable alternative for clocking Excitonic Field Effect Transistor The Excitonic field-effect transistor (ExFET) as currently envisioned allows for an ultra-steep inverse subthreshold slope (SS) by the creation of an energy gap through the gate controlled formation of an excitonic insulating state. While in the on-state, charge is the relevant state variable as in a conventional FET, the state variable characterizing the device offstate is an excitonic insulator that is formed through the interaction between two oppositely doped parallel segments of a device channel. Coulomb interaction between electrons in the n-type branch and holes in the p-type branch allows condensation of the system into an excitonic phase under certain gate voltage conditions. This in turn opens an energy gap in the single particle excitation spectrum. Recombination of electrons and holes is suppressed through the spatial separation of charges with opposite polarity. Since the gate field is used to create a previously non-existent energy gap to suppress current flow from source to drain, the above limitation of SS no longer applies and low-power device operation becomes feasible. The device switches as a function of V gs from a highly conductive state into the off-state once the conditions for the formation of an insulating excitonic phase are satisfied. In the 1970s, so-called direct excitons were theoretically discussed in the context of bulk (D) materials forming an insulating state 5,54,55,56,57,58. It was not until 1988 that an excitonic insulator was experimentally verified 59,540,541,54,54. Typical excitonic binding energies in these systems are of the order of several mev 544. Exciton formation between spatially separated electrons and holes was predicted in Optical experiments were used to prove the existence of an excitonic state in these two-dimensional (D) systems 546,547,548, and theoretical work predicts a phase transition to an excitonic insulator 549, or a crystalline state 550. Indirect exciton formation was also predicted for onedimensional (1D) nanowires 551. In 1D systems as carbon nanotubes excitonic binding energies of the order of several 100meV have in fact been experimentally verified supporting the notion of room-temperature operation of the proposed ExFET 55, BiSFET The bilayer pseudo-spin Field Effect Transistor (BiSFET) has been proposed as an ultra-low-power yet fast device concept. 554 In principle, switching energies orders of magnitude below end-of-the-roadmap are possible with clocking frequencies comparable to or better than end-of-the-roadmap. The BiSFET concept is based on the possibility of room temperature exciton (paired electron and hole) superfluid condensation in two oppositely charged (ntype and p-type) layers of graphene separated by a thin dielectric 555 (where pseudospin refers to the which-layer degree of freedom). Such collective many-body effects have long been observed in III-V semiconductor double quantum wells, but only at ultra-low temperatures and high magnetic fields. 556,557,558,559 What separates the graphene system from III-V systems is a synergy of favorable graphene properties: atomically thin layers, symmetric electron and hole band structures, low density of states and zero band gap energy. The BiSFET would behave much like a gateable Josephson junction for similar physical reasons with a critical voltage beyond which the DC current collapses, as predicted by theory 560 and seen experimentally in III-V systems 561. Thus, logic circuits would work in a manner more akin to that proposed for gated resonant-tunneling-diode logic 56 than that used for, including the use of a four-phase clocked power supply. However, as a collective effect, critical voltages comparable to or below the thermal voltage kbt/q are possible, in principle, providing the basis for low-power switching. Moreover, input and output of the BiSFET remain charged based and compatible with existing electron-based interconnects and, with amplification, devices. Circuit simulations of two-bisfet based inverters with a 5 mev clocked power supply and a fan-out of four continue to suggest switching energies per device on the 10 zepto-joule (10-0 Joules) including the metal-to-graphene contact resistances, a value between two and three orders of magnitude below end-of-the-road-map, while maintaining 10 GHz clocking frequencies (or larger with lesser contact resistances). 56,564 Simulated low voltage clock circuits, suggest that generation of the clocked powers supplies may add an additional 50% overhead. 565 A full range of Boolean logic gates 566 has been simulated, as well as ripple-carry adders. 567 It also is noted that the 5 mv power supply used in this simulation is illustrative and does not represent a theoretical lower limit; the actual clock voltage would likely be set by material and fabrication considerations. Recently, the possibility of forming a room-temperature condensate in such graphene systems was re-examined considering dynamic self-consistent screening using the best theoretical models currently available. 568 The results continue to point toward the possibility of forming the condensate, but also point to need for an even lower-k dielectric

41 Emerging Research Devices 5 environment than previously anticipated, and, thus, have altered the direction of experimental efforts. In another effort, atomistic tight-binding full-band quantum-transport simulations, including the non-local Fock exchange interaction between layers that produces the condensate, have been performed. 569 These simulations support the possibility of subkbt/q critical voltages for current flow through nanoscale (~15 nm in simulation) regions of condensation. To realize BiSFETs, in addition to graphene, novel dielectrics will be required, such as hexagonal boron-nitride, graphene intercalants or transition metal dichalcogenides between the graphene layers, and low-k dielectrics including perhaps air gaps beyond the graphene layers. Fabrication with the necessary degree of control of graphene, dielectric and surface quality, work functions, lithography, etc. impose numerous challenges. However, the highest priority at the time of writing is the experimental search for the condensate in such a dielectrically separated bilayer graphene system or perhaps in such bilayers of another D material system that could be used for BiSFETs to confirm, or refute, theory Spin Torque Majority Logic Gate Spin torque nano-oscillators (STNOs) are nanomagnetic microwave current-controlled oscillators based on spin transfer torque effect in nanoscale spin valves and magnetic tunnel junctions 570. Direct current applied to a STNO induces autooscillatory precession of the free ferromagnetic layer magnetization and thereby generates microwave voltage 571. The precession frequency is tunable by the applied current due to strong non-linearity of the STNO. When several STNOs share a common extended free layer, spin waves propagating in the free layer provide coupling between the STNOs, which can give rise to phase locking 57,57,574. Due to injection locking to the input microwave signals and spin wave interaction in the common free layer, the entire free layer of the device precesses at the frequency applied to the majority of the inputs (majority logic operation with the signal frequency as the state variable). Spin torque majority gate devices are radiation hard, low power (~500 aj per operation) and non-volatile, which offers performance advantages over Si. At present, phase locking between two point-contact STNOs has been experimentally demonstrated 57,57. Also, phase locking among four magnetic vortex-based oscillators has been experimentally shown as well 575. Phase locking in a linear array of spin torque oscillators has been theoretically predicted via micromagnetic simulations 576. Another type of spin torque majority logic gate is based on spin torque switching 577 in a multi-terminal magnetic tunnel junction has been proposed and theoretically studied 578. At present, smallest STNOs have linear dimensions of 40 nm 57. The highest frequency of operation experimentally demonstrates is 65 GHz 579. The operation current is still relatively high and needs to be reduced for practical devices. The lowest STNO operation current reported is ~ 0.1 ma 580. Recently, STNOs based on spin Hall effect in ferromagnetic/nonmagnetic metal bilayers have been demonstrated 581,58. STNOs based on spin Hall hold promise of lower current density and energy consumption. One of the main research challenges is to understand STNO majority logic gate parameters that give rise to robust phase locking and injection locking to external microwave signals. Another key challenge is the need to reduce the SNTO operation current to values below 10 µa in order to minimize power consumption by the device. Pathways towards reduction of the operation current include: (i) reduction of magnetic damping of the free layer, (ii) perpendicular magnetic anisotropy of the free layer, (iii) use of pure spin currents instead of spin-polarized charge currents and (iv) use of ferromagnetic insulators instead of ferromagnetic metals All Spin Logic The recently proposed concept of all spin logic (ASL) 58 uses magnets to represent non-volatile binary data while the communication between magnets is achieved using spin currents in spin coherent channels with the energy coming from the power supply. The ASL concept is based on key scientific advancements of the last decade 584,585,586,587,588,589,590,591. These advancements have blurred the distinction between spintronics and magnetics, creating the possibility of a device capable of providing a low power alternative to charge-based information processing. In particular, the two key recent advances are (1) the demonstration of spin injection into metals 584,591,59 and semiconductors 70,588,589,59 from magnetic contacts and () the switching of a second magnet by the injected spins 590,591. These demonstrations suggest an all-spin approach to information processing. Magnets inject spins and spins turn magnets (digital bits) forming a closed ecosystem which takes advantage of both analog (spin currents) and digital (bistable magnets) properties without the need to convert to charge. It has been shown that ASL can potentially reduce the switching energy-delay product 594 by a significant amount, but there are major challenges to be overcome. One is the room temperature demonstration of switching in multi-magnet networks interacting via spin currents. The other is the introduction of high anisotropy magnetic materials 595 into relevant experiments which can improve energy-delay. Issues such as current density and proper choice of channel materials also have to be carefully considered. The analog nature of ASL communication can be efficiently coupled with median function 596 to develop an architecture called Functionality Enhanced ASL (FEASL) to realize low-power, lower delay and lower area circuits. FEASL is especially suited for adder and multiplier circuits which are an integral part of arithmetic logic units (ALU). Moreover, it should be mentioned that ASL could also provide a

42 6 Emerging Research Devices natural implementation for biomimetic systems with architectures that are radically different from the standard von Neumann architecture. 4. MORE-THAN-MOORE DEVICES INTRODUCTION This section covers emerging devices that are neither targeted for Boolean logic nor memory. The target application may be either improved information processing (Emerging Research Architectures) or extensions to information processing to include peripheral or non-core functions (More-than-Moore). Most of ITRS covers devices expected to be used for Boolean logic in a target application comprising a microprocessor system. The ERD chapter also covers Emerging Research Architectures and More-than-Moore, the latter being essentially an architecture comprising a microprocessor core surrounded by analog devices for I/O. However, the microprocessor may also be augmented by other non-core functions, like power harvesting. This chapter will describe how new devices could raise the performance of microelectronic systems either through more functional diversity or greater performance of the core. Non-memory uses of the Redox Random Access Memory (ReRAM) or memristors comprise the main new device class in this edition. ITRS sees a possibility that fabs will add capability to produce these devices. Initially, the devices will be as a replacement for Flash in memory sticks and solid state disk drives. ITRS is responding by enhanced coverage of ReRAM in the PIDS and ERD memory sections. However, there is nothing about the devices that confines them only for memory. In fact, the devices can become the basis of reconfigurable logic and neuromorphic architectures and thereby raise information processing performance substantially. ReRAM is not a good acronym because the devices are not used as memory, so we will use the term memristor. Requirements for memristors in these applications will be described below. This section will continue to support the approach outlined in the ITRS More-than-Moore White Paper iv. The concept of a one cubic centimeter cell phone is used as a representative or emulator application for More-then-Moore. A one cubic centimeter cell phone would become more feasible if the RF front-end block which translates the incoming modulated electromagnetic wave into a digital stream of information could be improved by integration of new RF devices into other chips. This integration would be assisted by new devices for a switch, filter, local oscillator, mixer, etc DEVICES WITH LEARNING CAPABILITIES Devices That Learn Logic Configurations The ITRS traditionally views computer performance as the product of device performance multiplied by a factor related to computer architecture but it is possible to raise computer performance in a different way. An information processing function can be realized by the placement of gates and wires on a custom ASIC or by coding the function in software and running it on a microprocessor. The ASIC implementation will be several orders of magnitude faster and more power efficient because moving a bit from one gate or functional block to another via a wire is much more efficient than having a microprocessor move the bit in software, incurring as overhead the movement of hundreds of bits of instruction, opcode, cache access, etc. to interpret the machine s instruction set for the software. However, customers demand the ability to change a system s function in a split second. While the function of a microprocessor can be changed just by loading different software, changing the function of an ASIC requires redesign and refabrication of a chip that may take a year and cost a million dollars. The direction under discussion would implement programmable wire. The vision is a chip that could be rewired and thus realize the flexibility of a microprocessor and the efficiency of an ASIC at the same time. The Field Programmable Gate Array (FPGA) achieved the flexibility objective of the last paragraph long ago 597, but memristor-class devices show the possibility meeting the efficiency objective as well 598. As illustrated in figure ERD , there are several approaches in the literature where memristors are used as essentially Single Pole Single Throw (SPST) switches that connect some form of general wiring array into specific wiring interconnections. By altering the pattern of switches, a series of gates in a base layer can be reconnected into an arbitrary configuration. This is essentially the same architectural concept as an FPGA, but the traditional FPGA implements a non-volatile switch with a sizeable circuit. The efficiency of memristors in these types of reconfigurable architectures will depend on figures of merit in Table ERD11. Note that Table ERD11 covers a device called both a ReRAM and memristor in the ITRS. If the device is used for memory, it is called a ReRAM and covered in other sections of the ITRS. If the same device is used in reconfigurable iv

43 Emerging Research Devices 7 logic (and also for neurmorphic applications) it is called a memristor, where it will be subject to the more stringent requirements in Table ERD11. ITRS considers memristors an emerging research device for this application because the device technology is less mature when measured against the more stringent requirements of Table ERD11. Table ERD 11: Figure-of-merit of three reconfigurable architectures Reconfigurable (CMOL/FPNI-like): Memristor Artificial neural network: Memristor Array interconnect Crossbar Equivalent logic diagram: SPST switch Natural neural network deformed to show equivalence: Dendrite Synapse Axo n Reconfigurable-memristor functional mapping. Memristors fill the role of a SPST switch in a crossbar wiring network. The logic circuit can be rewired by appropriate switch/memristor settings Artificial-natural neural network mapping. Memristors fill the role of synapses with row/column wires filling the role of dendrites and axons. Figure ERD6: Two variants of learning devices for configuration The key issues for reconfigurable logic are described below: The low, on resistance requirement is due to the memristor switches being in series with the signal wires. This means the low, on resistance of the memristor R on must be on the order of the on resistance of a drive transistor R Ton otherwise the switches will lengthen transition times and reduce the throughput of the system. The high, off resistance of a memristor R off must be high enough to avoid generating too much leakage current and thus raising static power dissipation. For a gate, the ratio of dynamic to static power dissipation will be approximately equal to R Toff /R Ton ~ 10,000, where R T is the resistance of the transistors. Say the circuits in figure Table ERD11 have f ~ 0 (for fanout) memristors on each of the array wires. In this case, f-1 of the memristors with resistance R off will have the system power supply voltage across their leads. Thus, f R off must be on the order of R Toff otherwise the memristors will significantly exacerbate leakage current and power Devices That Learn by Examples Brain-inspired, neural systems are capable of performing tasks close to the level of human intellect and there is strong indication that these systems can be made more efficient using physical devices that support learning. Computers that learn by examples have been successfully applied to activities that have previously been beyond the reach of computers, such as the Watson computer playing Jeopardy 601. To the extent a new type of computer can address formerly humanonly tasks, the proper cost comparison becomes computer cost versus the cost of a human being paid a salary, working in an office, and so forth. The Jeopardy-playing computer demonstration shows that computer methods are known for performing at least some formerly human-only tasks, but the Watson computer was actually a 500-core microprocessor cluster. If we assume the artificial intelligence community has correctly identified the artificial neural network as the necessary computational primitive for this new type of computing, it would be important to find a way to build an artificial neural network that is more efficient than a microprocessor cluster. The motivating example of an artificial neural network is shown in figure ERD6, where a new nanodevice fills the role of a synapse. Many mechanisms and devices have been proposed to implement the synapse, including software emulation, memristors, and other devices. As stated at the beginning of this section, the memristor is covered by ITRS (in the form of ReRAM for memory) and seems destined to be much more efficient than software emulation.

44 8 Emerging Research Devices The basis of learning by example is illustrated in the example in figure ERD6. The example shows a device in the architecture configuration of a neural synapse. Information flowing across the device when it is performing its function causes a change in the non-volatile state of the device. Table ERD11 includes rows for three variants of neural networks: There are projects in the literature that simulate artificial neural networks using software models for synapse behavior. Some models are biologically inspired 60 whereas others are device inspired 60. The authors have tried to capture the common behavioral attributes of the device-inspired models. The memristor models are not derived from real memristors but may form a goal for device designers. The authors have tried to capture what would appear to be a reasonable target for artificial synapses. We include the final row on hybrid neuromorphic systems to emphasize that many projects in the literature have to accommodated the unavailability of acceptable devices by using methods such as multiple memristors per synapse, offline learning and other methods that produce working systems but at reduced efficiency. See reference 604 for an example MORE THAN MOORE FOR RF The More than Moore idea is to enhance the logic family of from strictly Boolean logic to additionally include a family of analog RF devices. In this instance, the term family implies both manufacturing and use of the manufactured device. For manufacturing, the objective would be to enhance fab lines so the new RF devices could either be fabricated with using the as few additional materials or process steps as possible. An alternative would be separate fabrication of RF components and subsequent integration into the same package. For usage, it implies that the RF components would operate with compatible voltages and signals. The representative application would be the RF front end for a 1 cubic centimeter cell phone. A function view of the augmentation appears in Figure ERD7 below. In this attempt to address the wide field of More-than-Moore emerging RF devices, this section is focused on a few devices and functional blocks. rf wave Higher level function control Intermediate level function LNA PA LO ADC DAC RF front-end antenna switch filter oscillator mixer etc. Lower level functions LO spin-torque oscillator etc. NEMS nanoresonator C-based electronics Figure ERD7 High-level RF functions partitioned into generic lower-level functions implemented in emerging devices Graphene RF Transistors Owing to the potentially very high carrier velocity of graphene, RF transistors using this material possibly can reach very high unity current gain cutoff frequency, f T : graphene RF transistors have been reported with higher f T and lower f Max than Si transistor at the same gate length. The maximum cutoff frequency reported for graphene is 00 GHz using Co Sinanowire gate and an exfoliated graphene. 605 A f T of 40 GHz was reported using wafer scale epitaxial graphene 606 and a f T of 00 GHz was reported using a CVD-grown graphene layer 607

45 Emerging Research Devices 9 To obtain higher f T, the device configuration needs to be optimized. The source and drain of the graphene transistor are usually defined by depositing a metal film, which can induce parasitic capacitances: in top-gate configuration, where source, drain and gate are on the same side of graphene channel, gate-source or gate-drain capacitances can be large, which results in a reduced f T. However, in back-gate configuration, where the gate is on the opposite side of the graphene layer than the source and drain, the gate-source or gate-drain capacitances could be lower even for an overlapped situation (L gs <0). CVD graphene can be fabricated easily on the back-gate configuration; epitaxial graphene could not. Methods to fabricate a local embedded back gate on a SiC wafer are, however, suggested, although high growth temperature could make it difficult 608 Since the cutoff frequency is inversely proportional to the channel length, the limit of f T in graphene transistor has not been fully assessed by the reported wafer scale devices. Using a nanowire gate instead of a patterned metal, f T was estimated down to a 45 nm channel length based on transition time. High Fermi velocity of carriers in graphene, resulting in high drift velocity (~ 4 x 10 7 cm/s) in a channel, makes 1THz f T achievable for a sub-70 nm channel length device 609. This is greater than sub-0 nm channel length of HEMT and Si RF transistors 610. The unity power gain frequency or maximum frequency of oscillation, f Max, which is around GHz even for devices with f T of 00 GHz, could be increased by improving the device structure and reducing the parasitics. This is a field which is presently less investigated than the intrinsic properties of graphene Spin Torque Oscillators Spin transfer torque in metallic spin-valves and magnetic tunnel junctions using nano-sized magnetic multilayer structures can drive uniform precession of the free layer magnetization under external magnetic field conditions 611,61. When combined with the giant magnetoresistance (GMR) or tunneling magnetoresistance (TMR) effect, this precession produces voltage responses that make those magnetic multilayers high frequency spin torque oscillators. The oscillation frequency in a spin torque oscillator can be tuned by adjusting the electric current or the external magnetic field. Due to its high compactness, extremely wide tunability, and compatibility with standard process, spin torque oscillator has the potential to be an agile RF oscillator 61,614 Currently, oscillation frequencies ranging from several hundred MHz to tens of GHz have been demonstrated depending on the magnetic structures, magnetic field and input current levels 615. The output power in spin torque oscillators based on metallic spin valve structures is about several hundred pw and it was improved up to about tens of nw in MTJ based spin torque oscillators 616,617. Despite these experimental advancements in spin torque oscillators, there are several challenges to be overcome for the practical application of spin torque oscillator. These challenges include 1) autooscillation structures, ) increase of output power, and ) high spectral purity (low phase noise). Auto-oscillation structures are required to eliminate the need for an external magnetic field that is used in most recent experimental demonstrations. Spin torque oscillator structures with a perpendicular polarizer and a planar free layer 618, vortex magnetization state in free layer 619, and wavy angular dependence of spin torque 60 have been suggested. For the spin torque oscillator to be useful, the output power of RF oscillation should be improved above a few microwatts. Although achieving higher magnetoresistance (MR) ratios through high spin polarization of magnetic layers or large precession angle of free layers could be a first approach for higher output power, phase locking of many weakly coupled oscillators is further needed for a sufficient increase of the output power. Many theoretical predictions and experimental demonstrations of synchronization of electrically connected spin torque oscillators have been reported so far 61,6,6. Among the remaining challenges, obtaining spectral purity that is compatible to the levels of existing current oscillator devices may be the biggest obstacle for the spin torque oscillators to be applied in the telecommunication field. Issues in the spin torque oscillation linewidth are reportedly coming from the lack of temporal coherence 64 and from the nonlinearity of the oscillation frequency 65,66. The adoption of PLL circuits and synchronization of several spin torque oscillators can be one of the solutions for the higher spectral purity NEMS Resonators There is an increasing interest to miniaturize and integrate the off-chip RF components, especially the quartz crystal used in the reference oscillator whose quality factor Q (> ) and temperature stability (better than 1 ppm/ C) are difficult to achieve with integrated devices. The quality factors of integrated LC-tank circuits are limited by the poor quality factor values of integrated inductors and capacitors (from 10 s to 100 s). As a consequence, the most promising solutions to miniaturize reference oscillators with uncompromised quality factors 67 are related to the classes of vibrating devices.

46 40 Emerging Research Devices Among the most promising of these vibrating structures are the capacitively transduced micro- and nano-electromechanical (M/NEM) resonators. In recent years tremendous progress has been achieved in the frequency x quality factor product, the main figure of merit for MEM/NEM resonators. The general trend of increasing their resonance frequency to values exceeding the GHz domain pushed such resonators towards very small, very stiff and low mass NEM systems. However, their ability to conserve a high-q at low dimensions is questionable when the main energy dissipation mechanisms are the gas friction, the clamping and surface losses 68. Another important issue is how the (in)stabilities of these resonators scale with dimensions as the effects of fluctuations in numbers of photons, phonons, electrons and adsorbed molecules can significantly affect the noise characteristics NEM resonators based on silicon nanowires, carbon nanotubes and graphene At micrometer scale, recent successful demonstrations of very high frequency resonators are the extensional wine-glass resonators with frequencies ranging from 400 MHz to 1.5 GHz (and Q > 700) 60 and the dielectrically actuated and piezoresistively sensed 4.41 GHz silicon bar resonator exploiting internal dielectric actuation 61. Piezoresistive sensing 6 of capacitively actuated resonator exceeding 4GHz with Q>8000 and using the 9 th harmonic longitudinal model was implemented. Very high frequency (VHF) NEM resonators were described using platinum nanowires, resonating at frequencies higher than 100MHz and with a quality factor of 8500 at 4K 6. The same group later reported 64,65 VHF NEM resonators based upon single-crystal Si nanowires. Carbon nanotubes attracted major interest for building NEM resonators, due to their high stiffness (Young s Elastic Modulus, E, near 1Tpa), low density, defect-free structure and ultra-small cross section. Resonator responses have been 66 reported varying from to 00MHz with voltage-tunable characteristics, in CNTs with 1-4nm in diameter, suspended over a trench. The NEM resonator with resonance frequencies up to 4 GHz were reported with a similar CNT device loaded in an abacus style with inertial metal clamps, yielding very short effective beam lengths 67,68. One issue of such small vibrating SiNWs and CNTs is the early onset, at very low applied power, of nonlinearities characterized by frequency bistability arising from the effect of tension built-up in the wire at large vibration amplitudes. Recently, graphene material attracted further attention for its extremely high strength, stiffness, and thermal conductivity along the basal plane. In Ref. 69 exfoliated graphene sheets are suspended to form two-dimensional NEM resonators with resonance frequencies from 1MHz to 170MHz NEM resonators based on resonant gate or vibrating body transistors The capacitively transduced signals of MEM resonators are very small and the impedance matching can be limiting. The movable gate and body FET transistor structures can operate as M/NEM resonators, with the main difference that the output is the drain current of the transistor, offering the possibility of building active resonators. Resonant gate transistors were reported with out-of-plane AlSi resonant gate MOSFETs 640,641 and with in-plane resonant silicon gate transistors 64. Aggressively scaled versions of the in-plane resonant gate transistors have been reported 64 based on Silicon-On-Nothing technology to achieve sub-100nm gaps and 400nm-thick single crystal resonators with a front-end process. The lateral MOS transistor could suffer from poor carrier mobility due to the roughness of the vertically etched sidewalls and have shown very little gain, but can be integrated with advanced 644 to minimize the influence of parasitic capacitances. An alternative resonant transistor, called Vibrating-Body FET (VB-FET) has been proposed 645,646 ; the movable body modulates both the inversion or accumulation charge in the lateral channels and the piezoreistance of the structure (carrier mobility and mass). Silicon nanowires show an unusually large piezoreistance effect compared with bulk Si. An outstanding gain of more than +0dB for the output signal was obtained in micrometer scale double-gate VB-FET when the output is taken from the transistor drain. Moreover, the device motional resistance is reduced from 16kΩ to below 100Ω, which enables excellent conditions for 50Ω matching in RF applications. Another active resonator has been proposed 647 by using the mechanical strain rather than an electric field to modulate the conductivity of the silicon. This resonator can provide a gain higher than unity at 15MHz, similarly to the VB-FET RF MIXERS An RF mixer is an important building block of the RF front end and many emerging solutions seek attention 648 Resonant tunneling diodes were explored for decades. Owing to their negative differential resistance and fast response, they had some potential in the RF domain, and subharmonic mixers were demonstrated 649,650. The potential advantages of

47 Emerging Research Devices 41 such an approach are a wide range of operating temperature, frequencies ranging up to 10 THz, and a reduced noise figure due to RTD shot noise suppression. While this field wasn t very active in the recent years it could regain interest due to the increased demand for THZ applications and the coming integration of III-V materials on Si. For the same reason single electron transistors were considered with resonant frequencies in the range 1 10 GHz. A SET-based mixer with fully tunable band selection from 0 to 00 MHz was demonstrated but at cryogenic temperature 651,65. Recently the ambipolar I-V characteristic of the graphene transistor which mimics the response of full-wave rectifier has been used to demonstrate a frequency-doubler circuit 65,654. Finally the nonlinear I-V characteristic of a carbon nanotube can be used to demodulate AM signal. However the demonstrations were limited to <100 khz due to the external bias circuitry and to < GHz due to intrinsic parasitics (chip bond pads, etc.) 655, EMERGING RESEARCH ARCHITECTURES The objective of the ERA section is to identify possible applications for emerging research memory and logic devices. This is a difficult challenge because in many cases, circuit-level models and/or architecture-level models for these devices and their interconnect systems either do not exist or they are primitive. Moreover, the envisioned applications for these new devices can take many forms; i.e., i) as a drop-in replacement for conventional circuits, ii) as supplemental devices that complement and coexist with devices, or iii) as devices whose unusual properties can provide unique functionality for selected information processing applications. With this in mind, this section has been organized to reflect the potential application space for emerging research devices from an architecture perspective. Section 5.1 is focused on the application of emerging devices in conventional computing. Sections 5. to 5.4 address evolved architectures that still exploit conventional computation paradigms. Section 5.5 is focused on utilizing emerging devices in morphic computing paradigms, i.e. computing not based on conventional approaches but on approaches that are inspired by other paradigms, such as neuromorphic computing. This chapter can be summarized, in part, by Figure ERD8. This figure shows a taxonomy for traditional and emerging archictectures. At a high level computer architectures can be thought of as either program-centric or data-centric. In a program-centric computer, a human details a set of instructions, or a specific configuration that permits a specific set of tasks to be performed. The most ubiquitous example is the stored program digital computer. However, analog computers in which analog configurations are pre-wired or pre-programmed have been of interest in the past and have growing interest again due to new ways of using devices, which can be used to design a hybrid analog/digital computer. Section 5.1 explores memory architectures in programmed computers while Section 5. discusses the emerging concept of Storage Class Memory and the architectural challenges that evolve. Section 5. discusses how emerging devices can be of relevance to the logic portion of programmed computers. Ever since the concept of neuromorhic computing was introduced, there has been growing interest in data-centric computers where data is used to determine the equivalent of a program. The most classic example is a neural network that goes through a training phase in which inputs and their associated expected outputs are used to determine the weights. A supervised learning phase is used to determine the weights. Numerous devices, configurations and algorithms have been proposed in the context of architectures that can learn, which are summarized in Section 5.4. Models of Computing Program Centric Designed computation Data Centric- Learned computation Digital Analog Hybrid Analog/Digital Supervised Learning Unsupervised Learning Figure ERD8. Taxonomy for traditional and emerging models of computation

48 4 Emerging Research Devices Of particular emerging interest is the concept of architectures that can learn in an unsupervised fashion, which is learning as data is processed. There is potential for existing algorithms, such as HTM to enable unsurpervised learning in addition to supervised learning, and new algorithms are starting to be explored. 5.1 MEMORY ARCHITECTURES FOR PROGRAM CENTRIC ARCHITECTURES INTRODUCTION In traditional computing, since no technology provides capacity and speed simultaneously, SRAM is used to build caches, while DRAM is designed to refill a cache line as fast as possible. Furthermore, the entire system image is stored in a nonvolatile medium, traditionally a hard drive, and swapped to and from memory as needed. Solid State non-volatile storage (flash) has enabled cost effective small disk-drive replacements or can serve as a cache for larger hard drive. Some ASICs employ SRAM as local, fast managed store, and occasionally use Content Addressable Memories (CAM) for associative lookups. FPGAs use SRAM to build look up tables for small logic and to program look-up tables. However, this situation is changing rapidly. With continued scaling, application needs are also scaling, as well as evolving, and are rapidly exhausting the capabilities of the traditional memory hierarchy. Simultaneously, new memory technologies create opportunities to solve these problems and create new memory hierarchies. The main objective of this section is to explore this landscape CHALLENGES IN MEMORY SYSTEMS Current memory systems range in size from Gigabytes (low-volume ASIC systems, FPGAs and mobile systems) through Terabytes (multicore systems that manage execution of many threads for personal or departmental computing), to Petabytes (for database, Big Data, cloud computing, and other data analytics applications), and up to Exabytes (nextgeneration, exascale scientific computing). In all cases, speed (both in terms of latency of data reads and writes as well as bandwidth), power consumption, and cost are absolutely critical. However, the importance of other system aspects can vary across these different application spaces. Table ERD1 presents a summary of memory needs by application space. The table is organized as a cross-matrix of application drivers indexed against desired memory properties. It is not indexed by year but is meant to be read in the context of computing in the timeframe. Table ERD1. Anticipated Important Properties of Emerging Memories as driven by application need. Roughly one-third of the power in a large computer system is consumed in the memory sub-system 657. Some portion of this is refresh power, required by the volatile nature of DRAM. As a result, modern data servers consume considerable power even when operating at low utilization rates. For example, Google 658 reports that servers are typically operating at over 50% of their peak power consumption even at very low utilization rates. The requirement for rapid transition to full operation precludes using a hibernate mode. Thus a persistent memory that did not require constant refresh would be valuable. Many computer systems are not running at peak load continuously. Such systems (including mobile or data analytics) become much more efficient if power can be turned off rapidly while maintaining persistent stored data, since power usage can then become proportional to the computational load. This provides additional incentive for the nonvolatile storage aspect of SCM. Some applications such as data analytics and ASIC systems can benefit from having associative memories or content addressability, while other applications might gain little. Mobile systems can become even more compact if many different memory tiers can be combined on the same chip or package, including non-volatile M-class or even S-class Storage Class Memory. Total cost of ownership is influenced by cost-to-purchase, cost-to-maintain, and system lifetime. Current cost-to-purchase trends are that Hard Disk Drives (HDD) cost roughly an order of magnitude less per bit than flash memory, which in turn costs almost an order of magnitude less per bit than DRAM. However, cost-to-purchase is not the only consideration. It is anticipated that S-class SCM will consume considerably less power than HDD (both directly and in terms of required cooling), and will take up considerably less floorspace. By 00, if the main storage system of a data center is still built solely from HDD, the target performance of 8.4 G-SIO/s could consume as much as 9 MW and require 98,568 square feet of floor space 659. In contrast, the improved performance of emerging memories could supply this performance with only 4 kw and 1 square feet. Given the cost of energy, this differential can easily shift the total cost advantage to emerging memory, away from HDD, even if a cost per bit differential still exists.

49 Emerging Research Devices 4 These requirements have led to considerable early investigation into new memory architectures, exploiting emerging memory devices, often in conjunction with DRAM and HDDs in novel architectures. These new Storage Class Memories (SCM) are differentiated as whether they are intended to be close to the CPU (M-class), or to largely supplement the harddrives and SSDs (S-class). The emergence of SCM leads to the need to resolve issue beyond the device level, including software organization, wear leveling management, and error management. Because of the inherent speed in SCMs, the software can easily limit the system performance. Thus IO software from the file system, through the operating system and up to applications will have to be redesigned in order to best leverage SCMs. The number of software interactions must be reduced and disk-centric features will need to be removed. Conventional software can account for anywhere from 70% to 94% of the total IO latency 660. It is likely to be valuable to give application software direct access to the SCM interface, although this can then require additional considerations to protect the SCM device from malicious software. However, this approach is not typically used in current operating systems that use some form of File Address Table as an intermediate index mechanism. Access patterns in data-intensive computing can vary substantially. While some companies continue to use relational databases, others have switched to flat databases that must be separately indexed to create connections amongst entries. In general, database accesses tend to be fairly atomic (as small as a few Bytes), and can be widely distributed across the entire database. This is true for both reads and writes, and since the relative ratio of reads and writes varies widely by application, the optimality of any one design can depend strongly on the particular workload. A specific issue that arises with SCMs is wear leveling. While DRAMs and HDDs can support a large number of writes to the same location without failure, most of the emerging non-volatile memory device technologies can not. Thus there is a need for low-overhead mechanisms to spread the writes around uniformly, generally referred to as wear leveling. An important issue in any file system is that certain data, mainly metadata, is written to quite frequently. It is important to make sure that such storage locations are not subject to fast wearout, e.g. by using a more robust technology for such portion s of the file system. Error management is a broader problem than just wear leveling. While DRAM has traditionally benefited from simple methods such as ECC and EDCs, flash with its large page sizes and slow accesses can afford more sophisticated algorithms such as LDPC. Unfortunately, SCM will need more error correction than DRAM but will need faster error correction than flash, especially for M-class SCMs. This is an open area for research. Some possible options include exploring codes that exploit specifics of error patterns, such as Tensor codes, the use of in-situ scrub 661 (periodic elimination of accumulated errors so that one or two bit error correction can remain useful). 5. STORAGE CLASS MEMORIES INTRODUCTION In traditional computing, SRAM is used as a series of caches, which DRAM tries to refill as fast as possible. The entire system image is stored in a non-volatile medium, traditionally a hard drive, and is then swapped to and from memory as needed. However, this situation has been changing rapidly. Application needs are both scaling in size and evolving in scope, and thus are rapidly exhausting the capabilities of the traditional memory hierarchy. By combining the reliability, fast access, and endurance of a solid-state memory together with the low-cost archival capabilities and vast capacity of a magnetic hard disk drive, Storage Class Memory (SCM) offers several interesting opportunities for creating new levels in the memory hierarchies that could help with these problems. SCM offers compact yet robust non-volatile memory systems with greatly improved cost/performance ratios relative to other technologies. S- class SCM represents ultra-fast long-term storage, similar to an SSD but with higher endurance, lower latencies, and byteaddressable access. M-class represents dense and low-power nonvolatile memory at speeds close to DRAM. In order to implement SCM, both the emerging memory technologies as well as new interfaces and architectures will be needed, in order to fully use the potential, and to compensate for the weaknesses of various new memory technologies. In this section, we explore the Emerging Research Architecture implications and challenges associated with Storage Class Memory EMERGING MEMORY ARCHITECTURES FOR M-CLASS SCM Storage Class Memory architectures that are intended to replace, merge with, or support DRAM, and be close to the CPU, are referred to as M-type or Memory-type SCM (M-SCM). The required properties of this memory have many similarities

50 44 Emerging Research Devices to DRAM, including its interfaces, architecture, endurance, and read and write speed. Since write endurance of an emerging research memory is likely to be inferior to DRAM, considerable scope exists for architectural innovation. It will necessary to choose how to integrate multiple memory technologies to optimize performance and power while maximizing lifetime. In addition, advanced load leveling that preserves the word level interface and suitable error correction will be needed. The interface is likely to be a word-addressable bus, treating the entire memory system as one flat address space. Since the cost of adapting to a new memory interfaces is sizeable, an interface standard that could support multiple generations of M-SCM devices would be highly preferred. Many systems (such as in automobiles) might be deployed for a long time, so any new standard should be backward-compatible. Such a standard should compatible to DRAM interfaces (though with simpler control commands), and should reuse existing controllers and PHY (physical layers), as well as power supplies, as much as possible. It should be power efficient, e.g. supporting small page sizes, and should support future directions, such as D Master/slave configurations. The M-SCM device should indicate when writes have been completed successfully. Finally, an M-SCM standard should support multiple data rates, such as a DDR-like speed for the DRAM and a slower rate for the NVRAM 66. While wear-leveling in a block-based architecture requires significant overhead to track the number of writes to each block, simple techniques such as Start-Gap Wear-Leveling are available for direct-byte-access memories such as PCM (Phase Change Memory) 66. In this technique, a pair of registers are used to identify the location of the start point and an empty gap within a region of memory. After some threshold number of write accesses, the gap register is moved through the region, with the start register incrementing each time the gap register passes through the entire region. Additional considerations can be added to defend against detrimental attacks intended to intentionally wear out the memory. With such techniques, even an M-class SCM that is markedly slower than DRAM can offer improved performance by increasing available capacity and by reducing the occurrence of costly cache misses 664. The presence of a small DRAM cache helps keep the slower speed of the M-class SCM from affecting overall system performance in many common workloads. Even with an endurance of 10 7, device lifetime has been shown to be on the order of years. Techniques for reducing the write traffic back to the SCM device can help improve this by as much as a factor of under realistic workloads. Direct replacement of DRAM with a slightly slower M-class SCM has also been considered, for the particular example of STT-MRAM 665. Since individual byte-level writes to STT-MRAM consume more power than in DRAM, a direct replacement is not competitive in terms of energy or performance. However, by re-architecting the interaction between the output buffer and the STT-MRAM, unnecessary writes back to the NVM can be eliminated, producing a sizeable energy improvement at almost no loss in performance. However, the use of write buffers means that the device must be able to complete all writes back to non-volatile memory in the event of power loss. Integrating PCM into the mobile environment, together with a redesigned memory management controller, is predicted to deliver a six times improvement in speed and also extends the memory lifetime six times 666. Caches are intended to ensure that frequently-needed data is located near to the processor, in nearby, low-latency memory. In storage architectures, hot or frequently-accessed data is identified and then moved to faster tiers of storage. However, as the number of tiers or caches increases, a significant amount of time and energy is being spent moving data. An alternative approach is to completely rethink the hardware/software interface. By organizing the computational system around the data, data is not brought to the processor but instead processing is performed in proximity to the stored data. One such emerging data-centric chip architectures, termed Nanostores 667, is predicted to offer 10-60x improvements in energy efficiency EMERGING MEMORY ARCHITECTURES FOR S-CLASS SCM S (Storage) type SCMs are intended to replace or supplement hard-disk drive as main storage. Their main advantage will be speed, avoiding the seek time penalty of main drives. However, to succeed, their total cost of ownership needs to approach that of HDDs. Research issues include whether the SCM serves as a disk cache or is directly managed, how load leveling is implemented while retaining a sufficiently fast and flexible interface, how error correction is implemented, and what is the optimal mix of fast-yet-expensive and slow-yet-inexpensive storage technologies. The effective performance of flash SSD, itself slower than S-SCM, has been strongly affected by interface performance. The standard SATA (Serial Advanced Technology Attachment) interface, which is a commonly used interface for SSD, was originally designed for HDD, and is not optimized for flash SSD 669. There are several approaches to employ novel interface or architecture to take advantage of the flash SSD performance 670, 671, include PCIe, Thunderbolt, and Infiniband.

51 Emerging Research Devices 45 A likely introduction of these new memory devices to the market would be as hybrid solid-state discs, where the new memory technology complements the traditional flash memory to boost the SSD performance. Experimental implementations of FeRAM/flash 67 and PCRAM/flash 67 have been explored. It was shown that the PCRAM/Flash hybrid improves SSD operations by decreasing the energy consumption and increasing the lifetime of flash memory. Additional open questions for S-SCM include storage management, interface and architectural integration, such as whether such a system should be treated like a fast disk drive, or as a managed extension of main memory. To date, disklike systems built using non-volatile memories have had disk-like interfaces, with fixed-sized blocks and a translation layer used to obtain block addresses. However, since the file system also performs a table lookup, some portion of SCM performance is sacrificed. In addition, non-nand-flash SCMs have randomly accessible bits and do not need to be organized as fixed-size blocks 674. While preserving this two table structure means that no changes to the operating system are required to use, or to switch between, new S-SCM technologies, the full advantages of such fast storage devices cannot be realized. There are two alternative approaches to eliminate one of these lookup tables. In the Direct Access mode, the translation table is removed, so that the operating system must then understand how to address the SCM devices. However, any change in how table entries are calculated (such as improvements in garbage collection or wear-leveling) would then require changes in the operating system. In contrast, in an Object-Based access model, the file system is organized as a series of (key, value) objects. While this requires a one-time change to operating systems, all specific details of the SCM could be implemented at a low level. This model leads to greater efficiency in terms of both speed and effective file density, and also offers the potential for enhanced reliability. Even first-generation PCM chips, although implemented without a DRAM cache, compare favorably with state-of-the-art SSDs implemented with NAND Flash, particularly for small (<KB) writes and for reads of all sizes 675. The CPU overhead per input-output operation is also greatly reduced. Another observation for even first-generation PCM chips is that while the average read latency is similar to NAND Flash, the worst-case latency outliers for NAND Flash can be many orders of magnitude slower than the worst-case PCM access. This is particularly important considering that such S- class SCM systems will typically be used to increase system performance by improving the delivery of urgently-needed hot data. Another new software consideration for both S- and M-class SCM is the increased importance of avoiding memory corruption, either through memory leaks, pointer errors, or other issues related to memory allocation and deallocation 676. Since part of the memory system is now non-volatile, such issues are now pervasive and may be difficult to detect and remove without affecting stored user data. Table ERD1. Likely desirable properties of M (Memory) type and S (Storage) type Storage Class Memories 5. EVOLVED ARCHITECTURES EXPLOITING EMERGING RESEARCH MEMORY DEVICES Emerging non-volatile memory devices are being considered for implementing logic functions. The potential density of nano memory devices makes this option particularly attractive. A common approach is to investigate the potential for emerging memory devices to replace functions within FPGAs. Field Programmable Gate Array architectures rely on numerous small SRAMs to act as Look Up Tables (LUTs) for small combinational logic functions (typically a few inputs and outputs) and to program the pass gates used in the interconnect Programmable Switch Matrices (PSM) used for interconnect. Replacing these SRAMs with NV memories has been suggested numerous times and was even once implemented in a commercial product using Floating Gate FETS. Recently researchers have suggested using STT-RAM, ReRAM or Nanocrystal Floating Gate FETs instead [e.g. Ref. 677 ]. Typically, these substitutions reduce the 6T SRAM cell to one or two devices when used in the LUT and are used instead of the SRAM cell and pass gate in the PSM. This decreases the size of each, leading typically to a -x improvement in the performance/power ratio at any particular node. Additional advantage can be obtained by using an NVRAM technology with high cycle endurance, such as STT-RAM, to implement reconfigurable dynamic logic 678. Another interesting direction is to use emerging memories to build high density associative memories 679. The high power and low density of SRAM based associative memories limits their employability today. An open question is how to better leverage emerging devices to obtain improvements larger than -x. Directions being pursued include using nano crossbars as programmable logic arrays, merging such arrays with (CMOL a

52 46 Emerging Research Devices neuromorphic architecture see Section 5.5 below), reconfigurable computing using nano memories, and associative memories or content addressable memories (discussed above). One possible approach to obtain substantial improvements is via the employment of spintronics. A significant area of recent focus is the application of STT-RAM to logic, and the potential for broadening these classes of devices into emerging logic structures. The most straightforward implementation is to use STT-RAM arrays as look-up tables and PLAs, for example reference 680. A more aggressive employment is to mix STT-RAM and so as to create stateful logic; leveraging the non-volatile aspect of the STT-RAM device to retain state information. One example reference 681 is a stateful adder. An even more aggressive utilization is to build and use MTJ devices wherein spin state can be communicated directly between devices. This gives potential for dramatic reductions in power. A nano-magnetic channel can act as a layer to communicate transfer logic information spintronically 68, leading to a complete ALU design consisting of only 0 MTJ cells. Table ERD14. Current Research Directions for Employing emerging research memory devices to enhance logic. 5.4 ARCHITECTURES THAT CAN LEARN An exciting prospect is to create architectures that can learn or be trained from the available data. The most successful demonstration of such a system is the IBM Watson 68, which was able to defeat human champions in the TV game Jeopardy, and which is now being further developed towards medical applications. Watson parsed large corpuses of knowledge in order to be able to answer a set of wide-ranging and difficult questions i.e. it went through an intensive training process. It is worthy to note that Watson is implemented entirely without any specific morphic features - i.e. there are no features that are built so as to imitate how the human brain learns. Such morphic architectures were introduced in the ERA section of ITRS 007, to refer to biologically-inspired architectures that embody a new kind of computation paradigm in which adaptation plays a key role. Similarly the process of automated speech recognition draws little on how humans learn to recognize speech. In general there are few deployed morphic systems that deliberately imitate how the human neuron system or human brain works, and these are generally limited to audio functions, for example cochlear implants. Thus the open challenge remains are there morphic approaches to building computing machines that can leverage our understanding of how the human brain learns and then uses that knowledge to solve complex tasks. The next section surveys the state of art in this area. 5.5 MORPHIC ARCHITECTURES Biological systems give us examples of amorphous, unstructured, devices capable of noise- and fault-tolerant information processing. They excel in massively parallel spatial problems, as opposed to digital processors, which are rather weak in that area. The morphic architecture was thus introduced in the ERA section of ITRS 007, to refer to biologically-inspired architectures that embody a new kind of computation paradigm in which adaptation plays a key role to effectively address the particulars of problems 684. This subsection focuses on recent progress of a number of morphic architectures that offer opportunities for emerging research device. It starts with circuit level architectures that are intended in some manner to mimic how the brain behaves at the circuit or sub-cortical level; specifically neuromorphic architectures and cellular automaton architectures. Then it briefly turns to attempts to mimic the brain at a higher, or cortical, level NEUROMORPHIC ARCHITECTURES The term neuromorphic was introduced by Carver Mead in the late 1980s, to describe VLSI systems containing electronic analog circuits that mimic neuro-biological architectures in the nervous system 685. Traditional neurocomputers employ components that are biologically rather implausible, such as static threshold elements that represent neurons, whereas neuromorphic architectures are closer to biology. An example of a neuromorphic VLSI system is the silicon retina 686 that is modeled after the neuronal structures of the vertebrate retina. The appeal of neuromorphic architectures lies in i) their potential to achieve (human-like) intelligence based on unreliable devices similar to those found in neuronal tissue, ii) their strategies for dealing with anomalies, emphasizing not only tolerance to noise and faults, but also the active exploitation of noise to increase the effectiveness of operations, and iii) their potential for low-power yet high-density operation. Traditional von Neumann machines are less suitable with regard to item i), since for these types of tasks they require a machine complexity (the number of gates and computational power), that tends to increase exponentially with the complexity of the environment (the size of the input). Neuromorphic systems, on the other hand, exhibit a more gradual increase of their machine complexity with respect to the environmental complexity 687. Therefore, at the level of human-like computing tasks, neuromorphic machines have the potential to be superior to von Neumann machines. Points ii) and iii) are strongly related to each other in von Neumann machines, since

53 Emerging Research Devices 47 tolerance to noise runs counter to lowering power supply voltage and bias currents. Neuromorphic architectures, on the other hand, suffer much less from this trade-off. Unlike von Neumann machines, which can correct bit-errors only to a certain extent through the use of error-control techniques, neuromorphic machines tend to keep working even under high error rates. Conversely, however, even when all devices are working correctly and noise is low, the accuracy and errorrates of a neuromorphic system rarely reach the performance levels (near 100% accuracy and <<10-9 error-rate) for which von Neumann machines are typically designed. Like the areas of the human brain, neuromorphic machines (VLSIs) are application-specific. Significant performance benefits can be achieved by employing them as supplemental, as an addition to a von Neumann system, which provides universal computation ability. Neuromorphic systems thus have a diversified functionality, and consequently can be categorized as more-than-moore candidates. Table ERD15 shows trends in the development of neuromorphic systems and their applications. The application category Information processing is only a single item in the table, which understates the potentially large benefits in terms of machine complexity of human-like information processing, since functionalities like prediction, associative memory, and inference offer new opportunities left unexplored by von Neumann architectures. The ERA section of ITRS 009, for example, introduced an inference engine based on Bayesian neural networks 688, and in 010 Lyric Semiconductor introduced NAND gates where the input probabilities are combined using Bayesian logic rather than the binary logic of conventional processors. Lyric Error Correction (LEC) uses this probabilistic logic to perform error detection and correction with about percent of the circuitry and 8 percent of the power that would be needed for the equivalent conventional error correction scheme 689. Table ERD15. Applications and Development of Neuromorphic System ITRS 007 (ERA section) did not address neuromorphic Intelligent sensors, since they were considered ancillary to the central focus on information processing of ITRS at the time. However, intelligent sensors were revived in the table, because there exist vast opportunities for high performance architectures that combine them with emerging research devices. So far implementations (in the items titled: Vision and Others ; see Table ERD15) and singleelectron-transistor (SET) implementations (in the item titled Vision ) have been proposed. Another approach to neuromorphic systems is inspired by biochemical reactions in living organisms. Reaction-diffusion computers, for example, are based on a biochemical reaction 690, and they are able to efficiently solve combinatorial problems through the use of natural parallelism. Electronic implementations of this type of information processing require strong nonlinear I-V characteristics to mimic the chemical reactions involved, and there exist many opportunities for emerging research devices in this respect. On the technology side, one of the key issues for neuromorphic systems is how neuronal elements are implemented. An important consideration in this respect is the level of abstraction of a biological neuron, which can range from (almost) physically representative to a very simple model, such as an integrate-and-fire neuron. Depending on the technology used (SET neurons, resonant-tunnel-diode (RTD), memristors, etc.), this level may vary, and opportunities for emerging research devices exist in this respect. Another important issue is how non-volatile analog synapses are implemented. Many attempts have been made to design analog synaptic devices based on existing flash-memory technologies, but these efforts have experienced difficulties in designing appropriate controllers for electron injection and ejection as well as increasing the limit on the number of rewrites. Memristive devices (e.g., resistive RAMs and atomic switches) offer a promising alternative for the implementation of non-volatile analog synapses. They can be applied in the CMOL architecture, which combines memristive nano-junctions with neurons and their associated controllers. In ITRS 007, CMOL (CrossNets) was introduced in terms of nanogrids of (ideally) single molecules fabricated on top of a traditional layer, but the concept has since been expanded to use a nanowire-crossbar add-on as well as memristive (two-terminal) crosspoint devices like nanowire resistive RAMs 691. The CMOL architecture may be further expanded to include multiple stacks of layers and crossbar layers. This may result in the implementation of largescale multi-layer neural networks, which have thus far evaded direct implementations by devices only. However, while CMOL remains an interesting concept it has yet to be reduced to practice or even demonstrated. In contrast, other projects rely on conventional devices, architectures and close variants thereof. One approach is to build a multiprocessor system purposed to modeling the brains circuits in software. An example of this is Spinnaker 69. While many of the technologies above are analog in nature, IBM is running a substantial project investigating the construction and application of building spiking neurons using purely digital techniques 69. In contrast, researchers at Georgia Tech are using conventional with capacitively coupled gates to build analog spiking neurons 694. A final important issue is noise tolerance and noise utilization in neural systems and their possible application in electronics. Noise and fluctuations are usually considered obstacles in the operation of both analog and digital circuits and systems, and most strategies to deal with them focus on suppression. Neural systems, on the other hand, tend to employ

54 48 Emerging Research Devices strategies in which the properties of noise are exploited to improve the efficiency of operations. This concept may be especially useful in the design computing systems with noise-sensitive devices (e.g., extremely low-power devices like SET and subthreshold analog devices). Table ERD16. Noise-Driven Neural Processing and its Possible Applications Table ERD16 shows examples of noise-driven neural processing and their possible applications in electronics. Stochastic resonance (SR) is a phenomenon where a static or dynamic threshold system responds stochastically to a subthreshold or suprathreshold input with the help of noise. In biological systems SR is utilized to detect weak signals under a noisy environment. Stochastic resonance for some emerging research devices (a SET network and GaAs nanowire FETs) has been demonstrated. Stochastic resonance can be observed in many bi-stable systems, and will be utilized to facilitate the state transitions in emerging logic (bi-stable) memory devices. Noise-driven fast signal transmission is observed in neural networks for the vestibulo ocular reflex, where signals are transmitted with an increased rate over a neuronal path when non-identical neurons and dynamic noise are introduced. Implementation in terms of a SET circuit has demonstrated that when several non-identical pulse-density modulators were used as noisy neurons, performances on input-output fidelity of the population increased significantly as compared to that of a single neuron circuit. Phase synchronization among isolated neurons can be utilized for skew-free clock distribution where independent oscillators are implemented on a chip as distributed clock sources, while the oscillators are synchronized by a common temporal noise. Noise in synaptic depression can be used to facilitate the operation of a neuromorphic burst-signal detector, where the output range of the detector is significantly increased by noise. Noise-shaping in inhibitory neural networks has been demonstrated in subthreshold, where static and dynamic noises can be used constructively if one could not remove a certain level of noise or device mismatches. The circuits exploit properties of device mismatches and external (temporal) noise to perform noise-shaping 1-bit AD conversion (pulse-density modulation) CELLULAR-AUTOMATA ARCHITECTURES Cellular Automata represent a concept of computing in which logic and memory are integrated and organized in a regular grid of tiny simple cells. Having the functionality of a Finite Automaton, each cell can be in one of a finite number of states from a predefined state set. The state of each cell is updated according to transition rules, which determine the cell s next state from its current state as well as from the states of the neighboring cells. The neighbors of a cell are usually the cells directly adjacent in orthogonal directions of a cell, like the north, south, east, and west neighbor in the case of a two-dimensional grid, but other neighborhoods have also been demonstrated. The transition rules are usually the same for all cells, but heterogeneous sets of rules have also been considered, as well as programmable rules. The appeal of cellular automata as emerging research architectures lies in a number of factors. First, their regular structure has the potential for manufacturing methods that can deliver huge numbers of cells in a cost-effective way, in particular by processes that include bottom-up manufacturing. Second, regularity facilitates reuse of logic designs. The design of a cell is relatively simple as compared to that of a microprocessor unit, so design efforts are greatly reduced for cellular automata. Third, errors are more easily managed in the regular structures of cellular automata, since a unified approach for all cells can be followed. Fourth, wire lengths between cells are short, or wires are completely unnecessary if cells can interact with their neighboring cells through electrostatic or magnetic interactions. Fifth, cells can be used for multiple purposes, from logic or memory to the transfer of data. This makes cellular automata configurable in a flexible way. Sixth, cellular automata are massively parallel, offering a huge computational power, especially for applications of which the logical structure fits the topology of the grid of cells. A cellular automaton can be considered a memory in which the individual memory elements interact with their direct neighbors. It is these interactions that give a cellular automaton its functionality, which may vary from applicationspecific, like matrix calculations or image processing operations, to universal computation, which is the domain of von Neumann architectures. Cellular automata are configured by changing the states of cells. Writing access to cells may be implemented in similar ways as in conventional memory architectures. This implies that a cellular automaton requires a layer on top to configure it, in the form of a processor optimized for memory access. The configurability of cellular automata comes at a cost of an overhead in terms of hardware. Cells require a minimum level of complexity in order to be useful for computation 695. The density of functionality per unit of area tends to be lower than in von Neumann architectures, but it depends on how well an application can be mapped on a cellular automaton structure. A large hardware overhead may be acceptable, if cells are available in huge numbers at a low cost. Depending on the design of a cellular automaton, the complexity of a cell may vary from fine-grained (hundreds of bits per cell) to tiny-grained (less than one hundred bits per cell). Systems that are coarser grained are not considered cellular automata, but rather multi-core architectures. Fine-grained cellular automata have cells that can be configured each as one

55 Emerging Research Devices 49 or a few logic gates, or as a simple hub for data transfer. Cells are usually addressed on an individual basis for input and output, or for configuration. Typically, the transition rules governing the functionality of a cell are programmable. An example of a fine-grained approach is the Cell Matrix 696, which is a model capable of universal computation. Finegrained cellular automata have a good degree of control over configuration and computation, but it comes at the price of relatively complex cells, limiting the use of cost-effective manufacturing methods that can exploit the regularity of these architectures. The other end of the spectrum in the design of cellular automata is tiny-grained. Cells in this approach have extremely simple functionality, in the order of a few states per cell and a limited number of transition rules. This and the nonprogrammable nature of the rules drastically limit the complexity of cells. An example of a tiny-grained approach is the cellular automaton in Ref. 697, which is capable of universal computation and the correction of errors. Tiny-grained cellular automata have the potential of efficient realizations of cells on nanometer scales. The challenge is to design models with as few states and transition rules per cell as possible, while achieving sufficient functionality. The theoretical minimum is two states and one transition rule per cell. In synchronously timed models this has been approached by the Game-of-Life cellular automaton (two states, two rules), and in asynchronously timed models (no clock) by the Brownian Cellular Automata 698 (three states, three rules). Both models are computationally universal. The number of states and rules should be considered only as rough yardsticks, since ultimately the most important measure is the efficiency at which cells can be realized in a technology. A general approach to carry out operations efficiently on a cellular automaton is to configure it such that it emulates a logic circuit. In fine-grained cellular automata a cell will then function as one or a few gates or it will be used to transfer data between cells. In tiny-grained cellular automata, on the other hand, clusters of cells need to work together in order to obtain logic gate functionality. A cluster typically consists of up to 10 cells, its size depending on the functionality covered. This may seem a large overhead, but cells tend to be much less complex than in fine-grained cellular automata, making this approach feasible. Furthermore, cells used merely to transfer data and this is the majority of the cells see much less of their hardware unused when carrying out this simple task. Most hardware realizations of cellular automata to date are application specific. In this context cellular automata are used as part of a larger system to conduct a specific set of operations with great efficiency. Applications typically have a structure that can be mapped efficiently on cellular automaton hardware, and the approach followed is generally tinygrained, since cells are optimized for one or a few simple operations. Image processing applications are the most common in such realizations 699,700,701, since they can be mapped with great efficiency onto two-dimensional cellular automata. Though the focus in the past was mostly on operations like filtering, thinning, skeletonizing, and edge detection, recent applications of cellular automata include the watermarking of images with digital image copyright 70,70. Cellular automata have also been used for the implementation of a Dictionary Search Processor 704, memory controllers 705, and the generation of test patterns for Built-In Self-Test (BIST) of VLSI chips 706,707. An overview of application-specific cellular automata is given in Ref Cellular Neural Networks (CNNs) 709 resemble cellular automata in terms of local connectivity, but they differ in that their cells functionality is defined by a nonlinear function rather than a finite automaton. As the name implies, CNNs should be considered neuromorphic architectures, and it is mainly in this realm that they find applications. Though CNNs have low power consumption and high speed as advantages, they suffer from noise and variability due to being analog. They cannot be implemented in MOS as efficiently as digital systems, and they are sometimes realized on FPGAs or simulated on GPUs. Recently there have also been proposals to implement CNN by spin torque nano-oscillators 710, with a claimed potential for integration densities and power consumption beyond the reach of analog MOS circuitry. Most applications of CNNs revolve around image processing and pattern recognition. Cellular automata have seen only limited attempts at realizations at nanometer scales. Molecule Cascades 711 use CO molecules on a Cu(111) grid to conduct simple logic operations. The CO molecules jump from grid point to grid point, triggering each other sequentially, like dominos. This process is quite slow and error-prone, though there appears to be potential for improvement. The mechanical nature of the operations, though, means that this cellular automaton is unlikely to reach competitive speeds. Another proposal uses layers of organic molecules on a gold grid 71. Interactions between molecules take place via the tunneling of electrons between them. The rules that have been identified as governing those interactions appear to be influenced by the local presence of excess electrons in the grid. This may limit the control over the operation of the cellular automaton, but it also carries the promise of efficient ways to configure the grid.

56 50 Emerging Research Devices 5.5. CORTICAL ARCHITECTURES Whereas most of the morphic architectures above are trying to model or imitate the brain at the circuit, or neuron, level, there are a growing body of approaches trying to model the learning and analysis functions of the brain at a higher, or cortical level. While to date all these approaches have been run as software on conventional parallel computers, they have potential to leverage conventional, or unconventional, electronics to give a speedup. Common themes to all of these are adaptive multi-level representations and large amounts of interconnected information. Grok Solutions 71, formerly Numenta, has developed an algorithm called Hierarchical Temporal Memory (HTM). In an HTM, information is learned as both spatial and temporal patterns, much like the brain forms percepts. This is then used to create predictions, or other outputs, based on the learned information. Their model includes concepts of both proximal and distal connections that are highly plastic, again imitative of the brain. However, unlike most neuromorphic circuits, they do not model spiking behavior. HTM has been successfully applied to several problems in energy management, server optimization and predictive maintenance. Another cortical model gaining attention is cogent confabulation. This model is purely a spatial model that is connection based and includes a multi-level representation. Recently AFRL used this model to build a text recognition system on a parallel computer. As in the case of the HTM, it has potential for hardware acceleration 714. Like HTM, it depends conceptually on a highly interconnected granular structure.

57 Emerging Research Devices EMERGING MEMORY AND LOGIC DEVICES A CRITICAL ASSESSMENT 6.1 INTRODUCTION The purpose of this section is to assess the potential of each emerging research technology entry considered in this chapter to perform its intended memory or information processing function benchmarked against current memory or technologies, all at their full maturity. These targeted functions are: 1) eventually replace with a highly scalable, high performance, low power, information processing device technology, or ) provide a memory or storage technology capable of scaling either volatile and/or nonvolatile memory technology beyond the 15nm generation. Two independent methods are used to perform this assessment. In one method, referred to as Quantitative Logic Benchmarking, each emerging logic device is evaluated by its operation in conventional Boolean Logic circuits, e.g., a unity gain inverter, a -input NAND gate, and a -bit shift register. Metrics evaluated include speed, areal footprint, power dissipation, etc. Each parameter is compared with the performance projected for high performance and low power 15nm applications. The second method, referred to as Survey-Based Benchmarking, is to conduct a survey in the ERD working group to evaluate each technology entry against eight evaluation criteria normalized to high-performance at full maturity for logic or to the memory technology targeted for replacement. The rating for each criterion reflects consensus opinions on the potential and maturity of these technology entries. An important issue regarding emerging charge-based nanoelectronic switch elements is related to the fundamental limits to the scaling of these new devices, and how they compare with technology at its projected end of scaling. An analysis 715 concludes that the fundamental limit of scaling an electronic charge-based switch is only a factor of smaller than the physical gate length of a silicon MOSFET in 04. Furthermore, the density of these switches is limited by maximum allowable power dissipation of approximately 100W/cm, and not by their size. The conclusion of this work is that MOSFET technology scaled to its practical limit in terms of size and power density will asymptotically reach the theoretical limits of scaling for charge-based devices. Most of the proposed beyond- replacement devices are very different from their counterparts, and often pass computational state variables (or tokens) other than charge. Alternative state variables include collective or single spins, excitons, plasmons, photons, magnetic domains, qubits, and even material domains (e.g. ferromagnetic). With the multiplicity of programs characterizing the physics of proposed new structures, it is necessary to find ways to benchmark the technologies effectively. This requires a combination of existing benchmarks used for and new benchmarks which take into account the idiosyncrasies of the new device behavior. Even more challenging is to extend this process to consider new circuits and architectures beyond the Boolean architecture used by today, which may enable these devices to complete transactions more effectively. 6. QUANTITATIVE LOGIC BENCHMARKING FOR BEYOND TECHNOLOGIES The first method for benchmarking emerging information processing devices is based on their quantitative evaluation in conventional circuits mentioned in Section 6.1 above. The Nanoelectronics Research Initiative ( has been benchmarking several diverse beyond- technologies over the past four years, trying to balance the need for quantitative metrics to assess a new device concept s potential with the need to allow device research to progress in new directions which might not lend themselves to existing metrics. Several of the more promising NRI devices have been described in detail in the Logic and Emerging Information Processing Device Section 4., and the intermediate results on the benchmarking efforts were outlined in a 010 IEEE Proceedings article 716. The benchmark results were updated again based on refined device data in Another independent benchmark effort reported in 01 pursued uniform methodology for benchmark on the same set of NRI device using similar reference circuits and parameters 718. While all these efforts are still very much a work in progress and no concrete decisions have been made on which devices should be chosen or eliminated as candidates for significantly extending or augmenting the roadmap as scaling slows this section summarizes some of the data and insights gained from these studies. Further benchmark may alter some of the conclusions here and the outlook on some of these devices, but the overall message on the challenge of finding a beyond device which can compete well across the full spectrum of benchmarks of interest remains.

realization of a wide range of applications.

58 5 Emerging Research Devices 6..1 ARCHITECTURAL REQUIREMENTS FOR A COMPETITIVE LOGIC DEVICE The circuit designer and architect depend on the logic switch to exhibit specific desired characteristics in order to insure successful realization of a wide range of applications. These characteristics 719,which have since been supplemented in the literature, include: Inversion and flexibility (can form an infinite number of logic functions) Isolation (output does not affect input) Logic gain (output may drive more than one following gate and provides a high I on /I off ratio) Logical completeness ( the device is capable of realizing any arbitrary logic function) Self restoring / stable (signal quality restored in each gate) Low cost manufacturability (acceptable process tolerance) Reliability (aging, wear-out, radiation immunity) Performance (transaction throughput improvement) Span of control is an important means of connecting device performance and area to communication performance, relating time to space. The metric measures how other devices may be contacted within a characteristic delay of the switch, and is dependent on not only switch delay but also switch area, as well as communication speed 70. Successful architectures also need effective fan-out. Devices with intrinsic properties supporting the above features will be adopted more readily by the industry. Moreover, devices which enable architectures that address emerging concerns such as computational efficiency, complexity management, self-organized reliability and serviceability, and intrinsic cyber-security 71 are particularly valuable. 6.. QUANTITATIVE RESULTS Preliminary analyses sponsored by SRC/NRI 716, 717 and the independent study 718 surveyed the potential logic opportunities afforded by major emerging research switches using a variety of information tokens and communication transport mechanisms. Specifically, the projected effectiveness of these devices used in a number of logic gate configurations was evaluated, and normalized to at the 15nm generation as captured by the ITRS. The initial work has focused on standard Boolean logic architecture, since the equivalent is readily available for comparison. It should also be noted that the majority of this data is based on simulations only, since many of these structures have not yet been built, so it should be considered only a snapshot in time of any given device s potential, as the research on all of them is at a very early stage and hence the data is evolving. (a) (b) Figure ERD9 Median delay, energy, and area of proposed devices in NRI benchmark (normalized to ITRS 15-nm ), based on principal investigators data. (a) 011 benchmark results 717 ; (b) 010 benchmark results 716. At a high level, the data from these studies corroborates qualitative insights from earlier works, suggesting that many new logic switch structures are superior to in energy and area, but inferior to in delay, as shown in the plot of median data for the device (Figure ERD9). This is perhaps not surprising; the primary goal for nanoelectronics and NRI is to find a lower power device 7 since power density is a primary concern for future scaling. The power-speed tradeoffs commonly observed in is also extended into the emerging devices. The difference between panel (a) and panel (b) shows that the benchmarking results were evolving from 010 to 011, due to more refined device data input

delay. Figure ERD10 Area, energy, and delay of NAND gate of various post- technologies from 011 NRI benchmark, based on principal investigators data 717.

59 Emerging Research Devices 5 from NRI researchers in 011. In the area-energy-delay characteristics of a NAND gate (Figure ERD10), several devices have significantly lower power (even lower than a low voltage version of ), while maintaining a reasonable delay. Figure ERD10 Area, energy, and delay of NAND gate of various post- technologies from 011 NRI benchmark, based on principal investigators data 717. Figure ERD11 Transport impact on switch delay, size, and area of control. Circle size is logarithmically proportional to physically accessible area in one delay. Projections for 15nm included as reference 717. (Based on principal investigators data.) Moving beyond the logic gate, it is important to understand the potential impact of the transport delay for the different information tokens these devices employ. As shown in Figure ERD11, communication with many of the non-charge tokens can be significantly slower than moving charge, although this may be balanced in some cases with significantly lower energy for transport. Moreover, the combination of the new balance between switch speed, switch area, and interconnect speed can lead to advantages in the span of control for a given technology. Finally, for some of the technologies, such as nanomagnetic logic, there is no strong distinction between the switch and the interconnect, indicating the need for additional thinking on the appropriate architecture to exploit some of these attributes.

54 Emerging Research Devices (a) Nomenclature and signals (b) bit adder parameters Figure ERD1 (a) Nomenclature and signals in devices benchmarked; (b) summary of bit adder circuit parameters 718.

throughput of bit adders built from these devices, reflecting power-constrained (< 10 W/cm ) throughput 718.

60 54 Emerging Research Devices (a) Nomenclature and signals (b) bit adder parameters Figure ERD1 (a) Nomenclature and signals in devices benchmarked; (b) summary of bit adder circuit parameters 718. (a) bit adder energy vs. delay (b) bit adder power vs. throughput Figure ERD1 (a) Energy vs. delay plot of bit adder built from benchmarked devices; (b) power vs. throughput of bit adders built from these devices, reflecting power-constrained (< 10 W/cm ) throughput 718. Figure ERD1 lists the devices benchmarked in the independent study and summarizes key circuit parameters of bit adders built from these devices (except BisFET) 718. The key switching parameters are also plotted in Figure ERD1. The Energy-delay plot clearly shows disadvantages of spin-based devices in terms of speed, while no obvious energy advantage is demonstrated in this study. However, it should be emphasized that the study is only based on active switching energy without considering other unique advantages of these devices for creative designs to reduce overall power consumption, e.g., nonvolatility to eliminate standby power. Figure ERD1(b) shows power-constrained throughput of these devices, an important parameter for low-power design. It should be pointed out that the evaluations from the NRI benchmark and the independent study do not always agree with each other on the same types of devices and logic gates, due to different assumptions on device parameters, interpretations of physical mechanisms, and logic gate designs. At the architecture level, the ability to speculate on how these devices will perform is still in its infancy. While the ultimate goal is to compare at a very high level e.g. how many MIPS can be produced for 100mW in 1 mm? the current work must extrapolate from only very primitive gate structures. One initial attempt to start this process has been to look at the relative logical effort 7 for these technologies, a figure of merit which ties fundamental technology to a resulting logic transaction. Several of the devices appear to offer advantage over in logical effort, particular for more complex functions, which increases the urgency of doing more joint device architecture co-design for these emerging technologies.

Advanced Flash and Nano-Floating Gate Memories

Advanced Flash and Nano-Floating Gate Memories Mater. Res. Soc. Symp. Proc. Vol. 1337 2011 Materials Research Society DOI: 10.1557/opl.2011.1028 Scaling Challenges for NAND and Replacement Memory Technology