AI Chip (MKTs & ICs )

Editor S.T.(Linkedin)

Welcome to My Wechat Blog Barbara for more AI+ IC+ Car related articles

AI IC in Car

企业类型	企业名称
OEM	BWM,Benz,Audi,大众
Tier1	Bosch，Cruise，

Latest updates

Add news of 鲲云科技
Add news of 澜起科技
Add news of AppleLidar in Ipad pro.
Add news of Intel神经拟态计算系统Pohoiki Springs.
Add news of Mobileye智慧城市 | 英国地理测绘局与Mobileye合作创建数据采集新模式.
Add startup Furiosa AI.
Add news of Graphcore and SambaNova.
Add startup DinoPlusAI.
Add startup XMOS.
Add news of Eta Compute.
Add news of BrainChip and Tachyum.
Add news of Enflame's DTU data center training chip.
Updates of AImotive.
Add news of Cerebras and Graphcore and Intel.
Thinci is now Blaize
Check out MLPerf Inference v0.5 Results.
Add startup Rain Neuromorphics and Applied Brain Research.
Add startup Untether AI and GrAI Matter Lab.
Add startup SiMa.ai.
Remove DeepScale who was acquired by Tesla.
Add news of Alibaba's cloud inference chip Hanguang 800.
Add startup Luminous Computing, Efinix, AISTORM.
Add news of NVDLA Deep Learning Inference Compiler.
Add news of Intel's AI chip.
Add news of Cerebras's huge WSE AI chip.
Add startup Neuroblade.
Add EEMBC MLMark Benchmark in AI Chip Benchmarks.
Add news of InnoGrit.
Add startup Areanna AI.
Add links to TSING MICRO, Ambarella and Black Sesame.
Add Chinese startup WITINMEM.
MLPerf v0.5 Inference Benchmarks are here.
Add news of Renesas Electronics's PIM AI Chip.
Add news of Habana's Gaudi AI Training Chip.
Add news of LG's AI Chip.
Add The Tensor Algebra Compiler (taco).
Add news of Achronix’s Speedster7t FPGA.
Add news of Hailo’s Hailo-8 chip.
Add MLIR in AI Compiler section and news of Tesla's FSD chip..
Add startup Eta Compute.
Add Compiler ONNC.
Add Chinese startup EEasy Tech.
Add startup Optaylsys.
Add a new section of AI Chip Compilers.
Add Chinese startup Enflame.
Add news of Google Edge TPU.
Add a new section of AI Chip Benchmarks.

Shortcut

IC Vendors	Intel, Qualcomm, Nvidia, Samsung, AMD, Xilinx, IBM, STMicroelectronics, NXP, Marvell, MediaTek, HiSilicon, Rockchip, Renesas Electronics, Ambarella	15
Tech Giants & HPC Vendors	Google, Amazon_AWS, Microsoft, Apple, Aliyun, Alibaba Group, Tencent Cloud, Baidu, Baidu Cloud, HUAWEI, Fujitsu, Nokia, Facebook, HPE, Tesla, LG	15
IP Vendors	ARM, Synopsys, Imagination, CEVA, Cadence, VeriSilicon, Videantis	7
Startups in China	Cambricon, Horizon Robotics, Bitmain, Chipintelli, Thinkforce, Unisound, AISpeech, Rokid, NextVPU, Canaan, Enflame, Eesay Tech, WITINMEM, TSING MICRO, Unisoc , Black Sesame	15
Startups Worldwide	Cerebras, Wave Computing, Graphcore, PEZY, Tenstorrent, Blaize, Koniku, Adapteva, Knowm, Mythic, Kalray, BrainChip, AImotive, Leepmind, Krtkl, NovuMind, REM, TERADEEP, DEEP VISION, Groq, KAIST DNPU, Kneron, Esperanto Technologies, Gyrfalcon Technology, SambaNova Systems, GreenWaves Technology, Lightelligence, Lightmatter, ThinkSilicon, Innogrit, Kortiq, Hailo,Tachyum,AlphaICs,Syntiant, Habana, aiCTX, Flex Logix, Preferred Network, Cornami, Anaflash, Optaylsys, Eta Compute, Achronix, Areanna AI, Neuroblade, Luminous Computing, Efinix, AISTORM, SiMa.ai,Untether AI, GrAI Matter Lab, Rain Neuromorphics, Applied Brain Research, XMOS, DinoPlusAI, Furiosa AI	57

Application Category

Both	Datacenter	Edge/Terminal
Intel, Nvidia, IBM, Xilinx, HiSilicon, Google, Baidu, Alibaba Group, Cambricon, Bitmain, Wave Computing,Tachyum,AlphaICs, Marvell, Achronix	AMD, Microsoft, Apple, Tencent Cloud,Aliyun, Baidu Cloud, HUAWEI, Fujitsu, Nokia, Facebook, HPE, Thinkforce, Cerebras, Graphcore, Groq, SambaNova Systems, Adapteva, PEZY, Habana, Enflame	Qualcomm, Samsung, STMicroelectronics, NXP, MediaTek, Tesla, Rockchip, Amazon_AWS, ARM, Synopsys, Imagination, CEVA, Cadence, VeriSilicon, Videantis, Horizon Robotics, Chipintelli, Unisound, AISpeech, Rokid, Tenstorrent, Blaize, Koniku, Knowm, Mythic, Kalray, BrainChip, AImotive, Leepmind, Krtkl, NovuMind, REM, TERADEEP, DEEP VISION, KAIST DNPU, Kneron, Esperanto Technologies, Gyrfalcon Technology, GreenWaves Technology, Lightelligence, Lightmatter, ThinkSilicon, Innogrit, Kortiq, Hailo,Syntiant, NextVPU, aiCTX, Cornami, Anaflash, Eesay Tech, Optaylsys, Eta Compute, LG, Renesas Electronics, WITINMEM, Ambarella, TSING MICRO, Black Sesame, Areanna AI, Neuroblade, SiMa.ai, Untether AI, GrAI Matter Lab, XMOS

I. IC Vendors

Nervana

Intel® Nervana™ Neural Network processors

Mobileye EyeQ

> Mobileye is currently developing its fifth generation SoC, the EyeQ®5, to act as the vision central computer performing sensor fusion for Fully Autonomous Driving (Level 5) vehicles that will hit the road in 2020. To meet power consumption and performance targets, EyeQ® SoCs are designed in most advanced VLSI process technology nodes – down to 7nm FinFET in the 5th generation.

最早于2022年推出机器人出租车

监管和定价:自动驾驶系统成本约为1.5至4万美元之间，对于机器人出租车而言还可以，但是对于普通消费者而言却很贵。如果可以将自动驾驶系统的成本降至约5000美元左右，我们相信我们在2025年可以做到这一点，到时，消费者也会开始对自动驾驶汽车感兴趣。

智慧城市 | 英国地理测绘局与Mobileye合作创建数据采集新模式

2019 年 5 月，Mobileye 与英国国家地理测绘机构——地理测绘局合作，开启名为路网基础设施资产采集试验（RIACT）的项目。通过具备云计算能力的防碰撞系统——Mobileye 8 Connect 来创建精确且实时更新的路网资产GIS图层。

Movidius

New Intel Vision Accelerator Solutions Speed Deep Learning and Artificial Intelligence on Edge Devices

Today, Intel unveiled its family of Intel® Vision Accelerator Design Products targeted at artificial intelligence (AI) inference and analytics performance on edge devices, where data originates and is acted upon. The new acceleration solutions come in two forms: one that features an array of Intel® Movidius™ vision processors and one built on the high-performance Intel® Arria® 10 FPGA.

FPGA

Intel FPGA OpenCL and Solutions.

Loihi

Intel's Loihi test chip is the First-of-Its-Kind Self-Learning Chip.

The Loihi research test chip includes digital circuits that mimic the brain’s basic mechanics, making machine learning faster and more efficient while requiring lower compute power. Neuromorphic chip models draw inspiration from how neurons communicate and learn, using spikes and plastic synapses that can be modulated based on timing. This could help computers self-organize and make decisions based on patterns and associations.

2020/3/19 英特尔公布其迄今为止最大规模神经拟态计算系统Pohoiki Springs，包含1亿个神经元

768颗芯片,below 500W，集成在5集成在5台标准服务器大小的机箱中，形成了一个更强大的机架式数据中心系统。 Loihi芯片将训练和推理整合到一个芯片上，实现了存储与计算的融合，拥有超级并行性能和异步信号传输能力，并支持多种学习模式的可扩展片上学习能力，即能够一边运作一边学习。它采用一种新颖的“异步脉冲”方式来计算，也具有大脑般低功耗的特性，与训练人工智能系统的通用计算芯片相比，Loihi芯片的能效提升了1000倍。英特尔在神经拟态计算领域的工作是基于几十年来的研究与合作。这项研究是由加州理工学院Carver Mead教授最先开始的，他以半导体设计的基础性工作而闻名。芯片专业知识、物理学和生物学的结合，为这一创想提供了可行的土壤。著名研究机构Gartner预测，到2025年，神经拟态芯片有望取代GPU，成为先进人工智能部署的主要计算架构。超低延时、超低功耗，比量子系统先落地

Qualcomm Brings Power Efficient Artificial Intelligence Inference Processing to the Cloud

Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated (NASDAQ: QCOM), announced that it is bringing the Company’s artificial intelligence (AI) expertise to the cloud with the Qualcomm® Cloud AI 100. Built from the ground up to meet the explosive demand for AI inference processing in the cloud, the Qualcomm Cloud AI 100 utilizes the Company’s heritage in advanced signal processing and power efficiency.

Snapdragon Ride

Snapdragon Ride高速公路自动驾驶系统围绕Snapdragon Ride自动驾驶软件栈和Snapdragon Ride自动驾驶硬件平台两个关键模块进行设计和优化。

Snapdragon 855 Mobile Platform

Our 4th generation on-device AI engine is the ultimate personal assistant for camera, voice, XR and gaming – delivering smarter, faster and more secure experiences. Utilizing all cores, it packs 3 times the power of its predecessor for stellar on-device AI capabilities... Greater than 7 trillion operations per second (TOPS)

GPU

A100 五大关键性能：

超过540亿个晶体管，史上最大的7nm处理器,3D 堆叠技术。
第三代Tensor Core AI核心，支持英伟达自创数值格式TF32 (Tensor Float 32) 运算，无需任何代码修改就能让单精度 AI 训练提速20倍，同时支持FP64双精度运算，比起上一代GPU提速了2.5倍。这样一来，NVIDIA 被广泛应用的 Tensor Core 也就变得更灵活，更快，更易于使用了，黄教主如是说。同时支持 TF32 和 BF16 格式，拥有 438 个第三代 Tensor Core，支持虚拟成为 77 个 GPU 来执行不同的任务。
结构稀疏加速，这是一种新的高效技术，主要利用了现有 AI（神经网络）固有的稀疏性来获得更高的性能。
多实例GPU，又名MIG，允许将一个 A100 划分为多达七个独立 GPU，每个GPU都有自己的资源。
第三代 NVLink 技术使GPU之间的高速连接能力加倍，允许多个 A100 服务器可以充当一个巨型GPU。
GPU A100 有 2000TOPS 的算力，但其功耗也达到了 800 瓦。大算力大功耗的运算单元在未来很可能不是使用在单车上，而是使用在车路协同，V2X 的路端单元成为核心的运算器。

Orin 10T/5W Orin SOC(Arm)

NVDLA Deep Learning Inference Compiler is Now Open Source

With the open-source release of NVDLA’s optimizing compiler on GitHub, system architects and software teams now have a starting point with the complete source for the world’s first fully open software and hardware inference platform.

NVIDIA TESLA T4 TENSOR CORE GPU

Powering the TensorRT Hyperscale Inference Platform.

NVIDIA Reveals Next-Gen Turing GPU Architecture: NVIDIA Doubles-Down on Ray Tracing, GDDR6, & More

at NVIDIA’s SIGGRAPH 2018 keynote presentation, company CEO Jensen Huang formally unveiled the company’s much awaited (and much rumored) Turing GPU architecture. The next generation of NVIDIA’s GPU designs, Turing will be incorporating a number of new features and is rolling out this year.

Nvidia’s DGX-2 System Packs An AI Performance Punch

Building Bigger, Faster GPU Clusters Using NVSwitches

Nvidia launched its second-generation DGX system in March. In order to build the 2 petaflops half-precision DGX-2, Nvidia had to first design and build a new NVLink 2.0 switch chip, named NVSwitch. While Nvidia is only shipping NVSwitch as an integral component of its DGX-2 systems today, Nvidia has not precluded selling NVSwitch chips to data center equipment manufacturers.

Nvidia's latest GPU can do 15 TFlops of SP or 120 TFlops with its new Tensor core architecture which is a FP16 multiply and FP32 accumulate or add to suit ML.

Nvidia is packing up 8 boards into their DGX-1for 960 Tensor TFlops.

Nvidia Volta - 架构看点 gives some insights of Volta architecture.

SoC

On edge, Nvidia provide NVIDIA DRIVE™ PX, The AI Car Computer for Autonomous Driving and JETSON TX1/TX2 MODULE, "The embedded platform for autonomous everything".

NVIDIA在自动驾驶汽车行业报告中位列榜首

借助NVIDIA DGX系统和先进的AI学习工具，开发者可以在数据中心中有效地训练深度神经网络，使它们能够在车辆中运行PB级数据。此外，借助车内部署的同一硬件，通过比特位精确的基于云的DRIVE Constellation仿真平台，开发者可以对这些DNN进行测试和验证。

NVDLA

Nvidia anouced "XAVIER DLA NOW OPEN SOURCE" on GTC2017. We did not see Early Access verion yet. Hopefully, the general release will be avaliable on Sep. as promised. For more analysis, you may want to read 从Nvidia开源深度学习加速器说起.
Now the open source DLA is available on Github and more information can be found here. > The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that promotes a standard way to design deep learning inference accelerators. With its modular architecture, NVDLA is scalable, highly configurable, and designed to simplify integration and portability. The hardware supports a wide range of IoT devices. Delivered as an open source project under the NVIDIA Open NVDLA License, all of the software, hardware, and documentation will be available on GitHub. Contributions are welcome.

Samsung Brings On-device AI Processing for Premium Mobile Devices with Exynos 9 Series 9820 Processor > Fourth-generation custom core and 2.0Gbps LTE Advanced Pro modem enables enriched mobile experiences including AR and VR applications
Samsung resently unveiled “The new Exynos 9810 brings premium features with a 2.9GHz custom CPU, an industry-first 6CA LTE modem and deep learning processing capabilities”.

The soon to be released AMD Radeon Instinct MI25 is promising 12.3 TFlops of SP or 24.6 TFlops of FP16. If your calculations are amenable to Nvidia's Tensors, then AMD can't compete. Nvidia also does twice the bandwidth with 900GB/s versus AMD's 484 GB/s. > AMD has put a very good X86 server processor into the market for the first time in nine years, and it also has a matching GPU that gives its OEM and ODM partners a credible alternative for HPC and AI workload to the combination of Intel Xeons and Nvidia Teslas that dominate hybrid computing these days.

Tesla is reportedly developing its own processor for artificial intelligence, intended for use with its self-driving systems, in partnership with AMD. Tesla has an existing relationship with Nvidia, whose GPUs power its Autopilot system, but this new in-house chip reported by CNBC could potentially reduce its reliance on third-party AI processing hardware.

xilinx makes mipi csi/dsi （camera/display serial interface）ip bundled with IDE in Xilinx Vivado

Xilinx Launches the World's Fastest Data Center and AI Accelerator Cards

Xilinx launched Alveo, a portfolio of powerful accelerator cards designed to dramatically increase performance in industry-standard servers across cloud and on-premise data centers.

Xilinx provide "Machine Learning Inference Solutions from Edge to Cloud" and naturally claim their FPGA's are best for INT8 with one of their white papers.

Whilst performance per Watt is impressive for FPGAs, the vendors' larger chips have long had earth shatteringly high chip prices for the larger chips. Finding a balance between price and capability is the main challenge with the FPGAs.

TrueNorth is IBM's Neuromorphic CMOS ASIC developed in conjunction with the DARPA SyNAPSE program.

It is a manycore processor network on a chip design, with 4096 cores, each one simulating 256 programmable silicon "neurons" for a total of just over a million neurons. In turn, each neuron has 256 programmable "synapses" that convey the signals between them. Hence, the total number of programmable synapses is just over 268 million (228). In terms of basic building blocks, its transistor count is 5.4 billion. Since memory, computation, and communication are handled in each of the 4096 neurosynaptic cores, TrueNorth circumvents the von-Neumann-architecture bottlenecks and is very energy-efficient, consuming 70 milliwatts, about 1/10,000th the power density of conventional microprocessors. Wikipedia

With IBM POWER9, we’re all riding the AI wave

"With POWER9, we’re moving to a new off-chip era, with advanced accelerators like GPUs and FPGAs driving modern workloads, including AI...POWER9 will be the first commercial platform loaded with on-chip support for NVIDIA’s next-generation NVLink, OpenCAPI 3.0 and PCI-Express 4.0. These technologies provide a giant hose to transfer data."

ST preps second neural network IC

STMicroelectronics is designing a second iteration of the neural networking technology that the company reported on at the International Solid-State Circuits Conference (ISSCC) in February 2017.

ISSCC2017 Deep-Learning Processors文章学习（一） is a reference.

nxp-5nm

NXP invested Kalray HPC 大规模的并行处理架构可以保证在计算密集型实时任务方面具有强大的性能，过去几年，上述仅仅出现在数据中心应用上的性能要求，被越来越多的应用于车端嵌入式系统，也带动了汽车芯片市场的大变革。NXP宣布对人工智能处理器公司Kalray（一家成立于2008年，早期服务航空航天领域的并行计算芯片厂商）战略投资800万欧元（约合900万美元），加速开发安全、可靠和可伸缩的智能计算处理解决方案。Kalray的MPPA处理器支持ASIL B/C，符合ISO 26262标准。更关键的是能耗可以低至现有解决方案的十分之一，并且由于MPPA的可编程性，用户可以轻松定制和更新。

此前，Kalray已经参与了多家汽车制造商的预研项目，雷诺是其公开披露的一个合作案例。搭载的Kalray处理器采用台积电的16nm制程，功耗只有20到30W。 MPPA©智能处理器，ASIL B/C，符合ISO 26262标准，16nm制程，功耗只有20到30W 最新的处理器Cooldige，可进行大规模的并行处理，集成了5个计算集群，每个集群有16个核和16个协处理器。QNX或Linux，也可以运行实时操作系统(RTOS) 第三代Coolidge™处理器的开发，大规模生产计划将于2020年第二季度开始。性能是第二代的25倍，扩展了人工智能能力，更容易编程，速度更快。同时，借助NXP处理器的决策能力和相关功能安全经验，在后者的Bluebox车载自动驾驶平台上进行量产落地。去年的营收为126.5万欧元，较2018年的77.5万欧元增长63%

S32 AUTOMOTIVE PLATFORM
S32 AUTOMOTIVE PLATFORM

The NXP S32 automotive platform is the world’s first scalable automotive computing architecture. It offers a unified hardware platform and an identical software environment across application domains to bring rich in-vehicle experiences and automated driving functions to market faster.

ADAS Chip
S32V234: Vision Processor for Front and Surround View Camera, Machine Learning and Sensor Fusion Applications

The S32V234 is our 2nd generation vision processor family designed to support computation intensive applications for image processing and offers an ISP, powerful 3D GPU, dual APEX-2 vision accelerators, security and supports SafeAssure™. S32V234 is suited for ADAS, NCAP front camera, object detection and recognition, surround view, machine learning and sensor fusion applications. S32V234 is engineered for automotive-grade reliability, functional safety and security measures to support vehicle and industrial automation.

Marvell Demonstrates Artificial Intelligence SSD Controller Architecture Solution

Marvell will demonstrate today at the Flash Memory Summit how it will provide artificial intelligence capabilities to a broad range of industries by incorporating NVIDIA’s Deep Learning Accelerator (NVDLA) technology in its family of data center and client SSD controllers.

MediaTek announced Helio P90, highlighting AI processing.

This article, "MediaTek Announces New Premium Helio P90 SoC", from AnandTech has more in-deepth analysis.

Kirin for Smart Phone
Kirin 980, the World's First 7nm Process Mobile AI Chipset

Introducing the Kirin 980, the world's first 7nm process mobile phone SoC chipset, the world’s first cortex-A76 architecture chipset, the world’s first dual NPU design, and the world’s first chipset to support LTE Cat.21. The Kirin 980 combines multiple technological inFtions and leads the AI trend to provide users with impressive mobile performance and to create a more convenient and intelligent life.

HiSilicon Kirin 970 Processor annouced fearturing with dedicated Neural-network Processing Unit.
In this article,we can find more details about NPU in Kirin970.

Mobile Camera SoC
According to a Brief Data Sheet of Hi3559A V100ESultra-HD Mobile Camera SoC, it has:

Dual-core CNN@700 MHz neural network acceleration engine

Rockchip Released Its First AI Processor RK3399Pro -- NPU Performance up to 2.4TOPs

RK3399Pro adopted exclusive AI hardware design. Its NPU computing performance reaches 2.4TOPs, and indexes of both high performance and low consumption keep ahead: the performance is 150% higher than other same type NPU processor; the power consumption is less than 1%, comparing with other solutions adopting GPU as AI computing unit.

Renesas Electronics Develops New Processing-In-Memory Technology for Next-Generation AI Chips that Achieves AI Processing Performance of 8.8 TOPS/W

Renesas Electronics Corporation (TSE: 6723), a premier supplier of advanced semiconductor solutions, today announced it has developed an AI accelerator that performs CNN (convolutional neural network) processing at high speeds and low power to move towards the next generation of Renesas embedded AI (e-AI), which will accelerate increased intelligence of endpoint devices. A Renesas test chip featuring this accelerator has achieved the power efficiency of 8.8 TOPS/W (Note 1), which is the industry's highest class of power efficiency. The Renesas accelerator is based on the processing-in-memory (PIM) architecture, an increasingly popular approach for AI technology, in which multiply-and-accumulate operations are performed in the memory circuit as data is read out from that memory.

Intelligent Vision Processors For Edge Applications

II. Tech Giants & HPC Vendors

Google begins selling the $150 Coral Dev Board, a hardware kit for accelerated AI edge computing

If you’re a software dev looking to get a head start on AI development at the edge, why not try on Google’s new hardware for size? The search company today made available the Coral Dev Board, a $150 computer featuring a removable system-on-module with one of its custom tensor processing unit (TPU) AI chips.

Google's original TPU had a big lead over GPUs and helped power DeepMind's AlphaGo victory over Lee Sedol in a Go tournament. The original 700MHz TPU is described as having 95 TFlops for 8-bit calculations or 23 TFlops for 16-bit whilst drawing only 40W. This was much faster than GPUs on release but is now slower than Nvidia's V100, but not on a per W basis. The new TPU2 is referred to as a TPU device with four chips and can do around 180 TFlops. Each chip's performance has been doubled to 45 TFlops for 16-bits. You can see the gap to Nvidia's V100 is closing. You can't buy a TPU or TPU2.

Lately, Google is making Cloud TPUs available for use in Google Cloud Platform (GCP). Here you can find the latest banchmark result of Google TPU2.

Pixel Visual Core is Google’s first custom-designed co-processor for consumer products. It’s built into every Pixel 2, and in the coming months, we’ll turn it on through a software update to enable more applications to use Pixel 2’s camera for taking HDR+ quality pictures.

Tearing Apart Google’s TPU 3.0 AI Coprocessor

Google did its best to impress this week at its annual IO conference. While Google rolled out a bunch of benchmarks that were run on its current Cloud TPU instances, based on TPUv2 chips, the company divulged a few skimpy details about its next generation TPU chip and its systems architecture. The company changed from version notation (TPUv2) to revision notation (TPU 3.0) with the update, but ironically the detail we have assembled shows that the step from TPUv2 to what we will call TPUv3 probably isn’t that big; it should probably be called TPU v2r5 or something like that.

Edge TPU

AI is pervasive today, from consumer to enterprise applications. With the explosive growth of connected devices, combined with a demand for privacy/confidentiality, low latency and bandwidth constraints, AI models trained in the cloud increasingly need to be run at the edge. Edge TPU is Google’s purpose-built ASIC designed to run AI at the edge. It delivers high performance in a small physical and power footprint, enabling the deployment of high-accuracy AI at the edge.

Other references are:
Google TPU3 看点

 Google TPU 揭密

 Google的神经网络处理器专利

 脉动阵列 - 因Google TPU获得新生

 Should We All Embrace Systolic Arrays?

Amazon may be developing AI chips for Alexa

The Information has a report this morning that Amazon is working on building AI chips for the Echo, which would allow Alexa to more quickly parse information and get those answers.

AWS Inferentia. High performance machine learning inference chip, custom designed by AWS.

AWS Inferentia provides high throughput, low latency inference performance at an extremely low cost. Each chip provides hundreds of TOPS (tera operations per second) of inference throughput to allow complex models to make fast predictions. For even more performance, multiple AWS Inferentia chips can be used together to drive thousands of TOPS of throughput. AWS Inferentia will be available for use with Amazon SageMaker, Amazon EC2, and Amazon Elastic Inference.

AWS FPGA instance

Amazon EC2 F1 is a compute instance with field programmable gate arrays (FPGAs) that you can program to create custom hardware accelerations for your application. F1 instances are easy to program and come with everything you need to develop, simulate, debug, and compile your hardware acceleration code, including an FPGA Developer AMI and Hardware Developer Kit (HDK). Once your FPGA design is complete, you can register it as an Amazon FPGA Image (AFI), and deploy it to your F1 instance in just a few clicks. You can reuse your AFIs as many times, and across as many F1 instances as you like.

Inside the Microsoft FPGA-based configurable cloud is also a good reference if want to know Microsoft's vision on FPGA in cloud.

This article "智慧云中的FPGA" gives and overview about FPGA used in AI aceleration in the cloud.

Drilling Into Microsoft’s BrainWave Soft Deep Learning Chip shows more details based on Microsoft's presentation on Hot Chips 2017.

Real-time AI: Microsoft announces preview of Project Brainwave

At Microsoft’s Build developers conference in Seattle this week, the company is announcing a preview of Project Brainwave integrated with Azure Machine Learning, which the company says will make Azure the most efficient cloud computing platform for AI.

Microsoft is hiring engineers to work on A.I. chip design for its cloud

Microsoft is following Google's lead in designing a computer processor for artificial intelligence, according to recent job postings.

Think Different：Lidar（AR,DTOF）in iPad Pro.

激光雷达的核心价值无疑是“可靠且准确的测距能力”与“三维建图能力”。

车载激光雷达，大多自动驾驶测试车一开始采用的是机械式激光雷达,以昂贵(上万$）、体积大,内部因有活动部件而有失稳定性，但却测距(150m)与建模精准而著称。谷歌旗下的Waymo、通用Cruise等致力于研发高级别自动驾驶汽车的企业，视其为必不可少的汽车传感器件。
固态激光雷达（机械式为非固态，稳定性差） 寻求通过MEMS、Flash（面阵）、相控阵等不同的技术路径来解决成本、稳定性以及安全性等诸多车厂关心的问题。
LeddarTech、Sense Photonics、Ouster等一众明星激光雷达创业公司都在积极尝试运用不同的技术手段，研发可以过车规的探测距离更长的Flash激光雷达。 Flash，一种结构和光子发射原理有别于机械式与MEMS的固态激光雷达技术路径。简单说，就是它不靠活动的机械部件，而是靠电子部件发出的数字信号来控制激光发射角度。这种激光雷达的最大优点是内部系统简单，稳定、体积可控以及精度高,可以被做成芯片的形式被嵌入进其他硬件里。便宜，距离短(50米就会受到很多干扰)
Flash只能说明激光雷达的激光发射形式。要让激光雷达正常测距，需要包括TOF（Time of Flight，飞行时间）、三角测距以及FMCW（调频连续波）等深度测距技术的加持。
ITOF系统除了只能在30米内正常工作外，另一个缺点是，不能同时准确分辨附近的两个物体，功耗大。
“DTOF可以解决这些问题，但需要集成更为精密的传感器件，兼顾更好的照明控制、数据处理与光学计算能力。低功耗。这类方案有个缺陷是会受到室外强光的干扰，但是根据苹果‘可户外使用’的介绍，他们可能提高了SiPM光电传感器的动态范围以适应户外的强光。
面向苹果供货的VCSEL厂商有艾迈斯（AMS）等企业，最值得注意的是，AMS在2019年9月边推出了全球体积最小的用于测量直接飞行时间(dToF)距离的集成式模块——TMF8801。而SiPM光电传感器生产商，则以Sony等企业为代表。激光雷达要与苹果的镜头、A12Z芯片，以及一系列视觉算法融合得恰到好处，还要考虑到产能以及装配精度……也有低调的创业公司从2017年就开始研发这类Flash激光雷达探测芯片，已经接到了相关订单。飞芯电子已经接受了两轮博世投资，CEO雷述宇向我们透露：“我们正在给车厂和手机做iTOF，下半年将给手机做DTOF。

A12 Bionic The smartest, most powerful chip in a smartphone.

A whole new level of intelligence. The A12 Bionic, with our next-generation Neural Engine, delivers incredible performance. It uses real-time machine learning to transform the way you experience photos, gaming, augmented reality, and more.

Apple unveiled the new processor powering the new iPhone 8 and iPhone X - the A11 Bionic. The A11 also includes dedicated neural network hardware that Apple calls a "neural engine", which can perform up to 600 billion operations per second.
Core ML is Apple's current sulotion for machine learning application.

阿里巴巴旗下的平头哥半导体发布了号称业界最强的高性能RISC-V处理器——玄铁910，并宣布开放玄铁910 IP，降低高性能芯片的门槛。�玄铁910基于12nm工艺，拥有16核心（四个簇，一个簇四核心），主频2.5GHz，性能高达7.1 Coremark/MHz，比此前业界最好的RISC-V处理器性能高40%以上。(https://mp.weixin.qq.com/s/2hVoTirXYaDup0Un1eCQiA)

Alibaba’s New AI Chip Can Process Nearly 80K Images Per Second

At the Alibaba Cloud (Aliyun) Apsara Conference 2019, Pingtouge unveiled its first AI dedicated processor for cloud-based large-scale AI inferencing. The Hanguang 800 is the first semiconductor product in Alibaba’s 20-year history.

Tencent cloud introduces FPGA instance(Beta), with three different specifications based on Xilinx Kintex UltraScale KU115 FPGA. They will provide more choices equiped with Inter FPGA in the future.

AN EARLY LOOK AT BAIDU’S CUSTOM AI AND ANALYTICS PROCESSOR

We’ve written much over the last few years about the company’s emphasis on streamlining deep learning processing, most notably with GPUs, but Baidu has a new processor up its sleeve called the XPU. For now, the device has just been demonstrated in FPGA, but if it continues to prove useful for AI, analytics, cloud, and autonomous driving the search giant could push it into a full-bore ASIC.

Baidu creates Kunlun silicon for AI

A pair of chips from the Chinese search giant are aimed at cloud and edge use cases. The company said it started developing a field-programmable gate array AI accelerator in 2011, and that Kunlun is almost 30 times faster. The chips are made with Samsung's 14nm process, have 512GBps memory bandwidth, and are capable of 260 tera operations per second at 100 watts.

Chinese tech giant Huawei unveils A.I. chips, taking aim at giants like Qualcomm and Nvidia

Huawei unveils two new artificial intelligence (AI) chips called the Ascend 910 and Ascend 310. The two chips are aimed at uses in data centers and internet-connected consumer devices, Rotating Chairman Eric Xu says at the Huawei Connect conference in Shanghai. The move pits the Chinese tech giant against major chipmakers including Qualcomm and Nvidia.

FPGA Accelerated Cloud Server, high performance FPGA instance is open for beta test.

FPGA云服务器提供CPU和FPGA直接的高达100Gbps PCIe互连通道，每节点提供8片Xilinx VU9P FPGA，同时提供FPGA之间高达200Gbps的Mesh光互连专用通道，让您的应用加速需求不再受到硬件限制。

This DLU that Fujitsu is creating is done from scratch, and it is not based on either the Sparc or ARM instruction set and, in fact, it has its own instruction set and a new data format specifically for deep learning, which were created from scratch. Japanese computing giant Fujitsu. Which knows a thing or two about making a very efficient and highly scalable system for HPC workloads, as evidenced by the K supercomputer, does not believe that the HPC and AI architectures will converge. Rather, the company is banking on the fact that these architectures will diverge and will require very specialized functions.

Nokia has developed the ReefShark chipsets for its 5G network solutions. AI is implemented in the ReefShark design for radio and embedded in the baseband to use augmented deep learning to trigger smart, rapid actions by the autonomous, cognitive network, enhancing network optimization and increasing business opportunities.

Facebook Is Forming a Team to Design Its Own Chips

Facebook Inc. is building a team to design its own semiconductors, adding to a trend among technology companies to supply themselves and lower their dependence on chipmakers such as Intel Corp. and Qualcomm Inc., according to job listings and people familiar with the matter.

HPE DEVELOPING ITS OWN LOW POWER “NEURAL NETWORK” CHIPS

In the context of a broader discussion about the company’s Extreme Edge program focused on space-bound systems, HPE’s Dr. Tom Bradicich, VP and GM of Servers, Converged Edge, and IoT systems, described a future chip that would be ideally suited for high performance computing under intense power and physical space limitations characteristic of space missions. To be more clear, he told us as much as he could—very little is known about the architecture, but there was some key elements he described.

特斯拉牌呼吸机

用特斯拉的汽车零件来制造呼吸机,其中有Model 3的中控屏幕和计算机系统，也有Model S汽车悬挂系统的零部件。
Tesla EE envolution-软件定义汽车 Model 3的拓扑:自动驾驶及娱乐控制模块Autopilot & Infotainment Control Module这次彻底接管了所有辅助驾驶相关的sensor，摄像头camera、毫米波雷达Radar，超声波雷达除外，主要用于泊车为低速场景由右车身控制器BCM RH完成-初步判断集成了自动驶入驶出AP（Automatic Parking/Autonomous Pull Out）、热管理、扭矩控制等；事实上，这里正是特斯拉厉害的地方：硬件抽象（硬件和软件的分离）=在电子电气架构方面一向激进、开放的宝马规划的下一代EE架构

即将推出识别交通信号灯和停车标志并做出反应，以及在城市街道中自动辅助驾驶两项功能。
按照原计划在今年年底前推出自动驾驶出租车车队。50万辆以及后续更多的特斯拉“虚拟”车队为其核心的神经网络不断收集数据，并每隔一段时间为用户提供一次新的驾驶体验，并改善性能。

Dojo / Training D1

特斯拉预计其 2021 年汽车交付量将同比增长超过 50%，也就是在去年 49.95 万辆的基础上增长到约 75 万辆。
Tesla向车载超级计算机迈进其终极形态将是超级中央计算机，这其中包括四个关键趋势：计算集中化、软硬件解耦、平台标准化以及功能定制化。商业化落地时间大约在 2025 年。智能汽车的新构架将基于中央计算机-层-区的概念构建。

Tesla’s new self-driving chip is here, and this is your best look yet

...And today, at Tesla’s Autonomy Investor Day in Palo Alto, California, the company gave the world its first, detailed glimpse at what Musk is now calling “the best chip in the world” — a 260 square millimeter piece of silicon, with 6 billion transistors, that the company claims offers 21 times the performance of the Nvidia chips it was using before.

Tesla’s new AI chip isn’t a silver bullet for self-driving cars

Processing power is important, but building chips could be an expensive distraction for Tesla

LG TO ACCELERATE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE WITH OWN AI CHIP

New AI Processor with LG Neural Engine Designed for Use in Various Products Including Robot Vacuum Cleaners, Washing Machines and Refrigerators

III. Traditional IP Vendors

MIPS/ARM IP updates ARM 50% IP market, 2018 44.7%,2019 40.8%,营收略有下降，其中专利收入下降了6-7%，但包括实体IP在内的授权收入上升了13%。已经搭载在量产ADAS产品的65%芯片组是基于ARM IP的。数据显示，2019年全球半导体IP市场的总规模约为39.4亿美元，较2018年的37.4亿美元增长5.2%，增速相比上一年出现明显的放缓（2018年同比增长10%）。

arm ex-CEO 谭军 interview

DynamIQ is embedded IP giant's answer to AI age. It may not be a revolutionary design but is important for sure.

ARM also provide a open source Compute Library contains a comprehensive collection of software functions implemented for the Arm Cortex-A family of CPU processors and the Arm Mali family of GPUs.

Arm Machine Learning Processor

Specifically designed for inference at the edge, the ML processor gives an industry-leading performance of 4.6 TOPs, with a stunning efficiency of 3 TOPs/W for mobile devices and smart IP cameras.

ARM Details "Project Trillium" Machine Learning Processor Architecture

Arm details more of the architecture of what Arm now seems to more consistently call their “machine learning processor” or MLP from here on now. The MLP IP started off a blank sheet in terms of architecture implementation and the team consists of engineers pulled off from the CPU and GPU teams.

DesignWare EV6x Embedded Vision Processors

处理器IP厂商的机器学习方案 - Synopsys

北汽-Imagination合资公司正式成立北京核芯达科技有限公司是第一家由中国国有整车企业与国际芯片巨头合资成立的汽车芯片设计公司，将专注于面向自动驾驶的应用处理器和面向智能座舱的语音交互芯片研发，为以北汽集团为代表的国内车企在汽车芯片领域提供先进解决方案。基于语音交互术的智能驾仓芯片和面向L3-L5多级别环境感知方案，预计将分别于2021和2022年实现成功流片、量产。

Imagination欲重夺移动GPU市场份额，加大中国区投入

2018年1.05亿美金营收，手机50%-60%，汽车20%-30%，1/3GPU市场（新型市场汽车GPU43%），中国占比10%（35人2019/7）

曾经50% 移动GPU，2017/4 lose Apple，5亿英镑Canyon Bridge（国新）
2018：52.6％的智能手机采用了ARM的GPU( Mali)，33%的智能手机搭载了全球最大手机设计芯片企业高通的SoC(系统芯片)和高通的GPU (Adreno)

PowerVR Series2NX Neural Network Accelerator

Imagination Announces First PowerVR Series2NX Neural Network Accelerator Cores: AX2185 and AX2145

the company is announcing the first products in the 2NX NNA family: the higher-performance AX2185 and lower-cost AX2145.

RISC-V

RISC-V 路在何方

2010年5月，伯克利大学的Krste Asanović教授和研究生Yunsup Lee和Andrew Waterman启动了RISC-V指令集。 RISC-V支持chisel,一种敏捷的硬件构造语言,相比与传统的verilog, 大大降低开发周期.

截止到2020年5月8日，RISC-V基金会共有会员193家，其中来自中国大陆地区的会员约为33家（不完全统计），占比为17%，当中不泛华为和中兴微这样的芯片“前辈”，也有嘉楠云智、比特大陆、睿思芯科和芯来科技这些新玩家。据分析机构Semico Research在去年年底发表的报告，
预计到2025年，市场将总共消费624亿个RISC-V CPU内核，其中预计工业领域将是最大的细分市场，拥有167亿个内核。Semico预测，在包括计算机，消费者，通讯，运输和工业市场在内的细分市场，RISC-V CPU内核的复合年增长率（CAGR）在2018年至2025年之间的平均复合年增长率将高达146.2％。
首先是性能差距；�以RISC-V的领军公司sifive来说,其最新的RISC-V cpu u8系列(2019.10.24发布), 与Arm Cortex-A72相比，是在性能上仅仅是具有可比性(同时提供1.5倍的功耗效率，并且面积只有A72的一半)。需要注意到, 目前sifive官方网站的cpu ip授权只支持到u7系列,还暂未看到u8的身影, 而arm a72则是2016年就已经上市的经过了市场验证的成熟芯片。
其次是工具差距；作为RISC-V开发者推崇的chisel语言则在配套工具上还与主流芯片设计和验证工具上有很大差距，chisel没有EDA工具支持。换而言之,通过chisel设计的芯片目前无法直接生成硬件电路,只能通过firrtl编译器,先生成firrtl文件,再生产verilog,才能生成最终的电路.
�第三，软件和DSP的差距；�李兴仁告诉半导体行业观察记者，芯片的性能离不开针对的软件优化和DSP加速.由于RISC-V刚刚启动(RISC-V架构于2017.7.1合入linux), 无论是编译工具链还是针对图像\音频等处理的dsp生产测试的工具还远未成熟。那就意味着基于底层架构的优化都还有很长的路要走。
最后，市场预期的差距。�从数据看来，arm还拥有大部分的市场份额，相比而言RISC-V仍处于起步阶段。

CEVA-XM6 Fifth-generation computer vision and deep learning embedded platform

处理器IP厂商的机器学习方案 - CEVA

CEVA Announces NeuPro Neural Network IP

Ahead of CES CEVA announced a new specialised neural network accelerator IP called NeuPro.

Tensilica Vision DSPs for Imaging, Computer Vision, and Neural Networks

VeriSilicon’s Vivante VIP8000 Neural Network Processor IP Delivers Over 3 Tera MACs Per Second

神经网络DSP核的一桌麻将终于凑齐了

The v-MP6000UDX processor from Videantis is a scalable processor family that has been designed to run high-performance deep learning, computer vision, imaging and video coding applications in a low power footprint.

IV. Startups in China

澜起科技坚持先做强、再做大的策略

澜起科技已经开始做服务器CPU，未来还要做数据中心的AI芯片。

经过十多年的耕耘，我们已经在内存接口这个细分领域占有一席之地，自然要扩展产品线，
澜起科技专注在云端和数据中心，所以迈出了一步去服务器CPU领域，
同时我们也开始研究，在数据中心和云端AI计算接下来的需求是什么？芯片应该提供怎样的解决方案？

寒武纪科创版IPO估值342亿

Chinese AI Chip Maker Cambricon Unveils New Cloud-Based Smart Chip

Chinese artificial intelligence chip maker Cambricon Technologies Corp Ltd has unveiled two new products, a cloud-based smart chip Cambricon MLU100 and a new version of its AI processor IP product Cambricon 1M, at a launching event in Shanghai on May 3rd.

Cambricon release new product page, including IP, Chip and Software tools

AI Chip Explosion: Cambricon’s Billion-Device Ambition

On November 6 in Beijing, China’s rising semiconductor company Cambricon released the Cambrian-1H8 for low power consumption computer vision application, the higher-end Cambrian-1H16 for more general purpose application, the Cambrian-1M for autonomous driving applications with yet-to-be-disclosed release date, and an AI system software named Cambrian NeuWare.

地平线waymo_dataset算法Champion

Waymo开放数据集挑战赛包括5项挑战，地平线在2D追踪、3D检测、3D追踪和域适应四项挑战中获得第一，在2D检测中获得第二。
搭载征程2的长安全新SUV——UNI-T正式上市，地平线车规级芯片征程二代实现首次前装量产。

Chinese AI chip maker Horizon Robotics raises $600 million from SK Hynix, others

Chinese chip maker Horizon Robotics said on Wednesday it had raised $600 million in its latest funding round, bringing its valuation to $3 billion, amid a push from Chinese companies and the government to boost the semiconductor industry.

Dec. 20, Horizon Robotics annouced two chip products, "Journey" for ADAS and "Sunrise" for Smart Cameras.

Bitcoin Mining Giant Bitmain is developing processors for both training and inference tasks.

Bitmain’s newest product, the Sophon, may or may not take over deep learning. But by giving it such a name Zhan and his Bitmain co-founder, Jihan Wu, have signaled to the world their intentions. The Sophon unit will include Bitmain’s first piece of bespoke silicon for a revolutionary AI technology. If things go to plan, thousands of Bitmain Sophon units soon could be training neural networks in vast data centers around the world.

On Nov.8, Bitmain announced its Sophon BM1869 Tensor Computing Processor, Deep Learning Accelerating Card SC1 and IVS server SS1.

Chipintelli's first IC, CI1006, is designed for automatic speech recognition application.

Sequoia, Hillhouse, Yitu Technology Join $68M Series A Round In Chinese AI Chip Maker ThinkForce

Unisound raises US$100 million to fund AI, chip development

China’s AISpeech Raises $76M on Advanced Speech Tech; Eyes AI Chips

Chinese AI startup Rokid will mass produce their own custom AI chip for voice recognition

The world leading computer vision processing IC and system company, NextVPU, today unveiled AI vision processing IC N171. N171 is the flagship IC of NextVPU’s N1 series computer vison chips. As a VPU, N171 pushes the Edge AI computing limit further from many aspects. With powerful computing engines embedded, N171 has unprecedent geometry calculation and deep neural network processing capabilities, and can be widely used in surveillance, robots, drones, UGV, smart home, ADAS applications, etc.

Canaan's Kendryte is a series of AI chips which focuses on IoT.

Biren

壁韧Series A-fund 11亿

Enflame Tech is a startup company based in Shanghai, China. It was established in March 2018 with two R&D centers in Shanghai and Beijing. Enflame is developing the deep learning accelerator SoCs and software stack, targeting AI training platform solutions for the Cloud service provider and the data centers.

Enflame Technology Announces CloudBlazer with DTU Chip on GLOBALFOUNDRIES 12LP FinFET Platform for Data Center Training

SHANGHAI, China, Dec. 12, 2019 – In conjunction with the launch of Enflame’s CloudBlazer T10, Enflame Technology and GLOBALFOUNDRIES (GF) today announced a new high-performing deep learning accelerator solution for data center training. Designed to accelerate deep learning deployment, the accelerator’s core Deep Thinking Unit (DTU) is based on GF’s 12LP FinFET platform with 2.5D packaging to deliver fast, power-efficient data processing for cloud-based AI training platforms.

Chinese tech startups Cloudpick, EEasy Tech snag Intel Capital funding

EEasy Technology Co. Ltd is an AI system-on-chip (SoC) design house and total solution provider. Its offerings include AI acceleration; image and graphic processing; video encoding and decoding; and mixed-signal ULSI design capabilities.

Founded in Oct. 2017, WITINMEM focuses on Low cost, low power AI chips and system solutions based on processing-in-memory technology in NOR Flash memory.

Qingwei Intelligent Technology (Tsing Micro) is AI chip company spin-off from Tsinghua University.

HS A1000 Release

DynamAI NN引擎的NPU来进行AI加速。
这个NPU内部最多可搭载4个3D卷积MAC阵列、1个2D GEMM阵列，以及1个EDP运算单元和5个DSP，支持4/8/16位多种运算精度，工作频率为1.2GHz。
Only A500 comparison disclosed

Black Sesame Technologies Nearly Completes 100 Million Series B Financing Round

Black Sesame Technologies (黑芝麻智能科技) has nearly completed its 100 million Series B Financing round which will be used to expand cooperation with OEMs, accelerate mass production, reference design development of autopilot controllers, and software-vehicle integration.

展锐虎贲T710基于四颗主频2GHz的Cortex-A75核心＋四颗1.8GHz的Cortex-A55核心，GPU未知，在AI方面则是集成了独立的NPU内核，支持运行FP16、INT8、INT4等多种数据位宽的AI算法。
虎贲T710开发板基于虎贲T710芯片平台，符合96board开放规范。开发板提供了USB、GPIO、 PCIe等丰富的外部接口，利用96board成熟的开发生态，可以快速开发和验证产品原型。同时，虎贲T710为开发板提供了强大的算力，并提供2G/3G/4G以及Wi-Fi等无线网络连接能力。
另外，虎贲T710也可以搭配春藤510平台，以提供5G连接能力，实现多种产品形态，当前已经在智能医疗、智能零售等产品中得到了应用验证。软件平台支持Android、Ubuntu、Yocto等多种平台，未来还会提供Debian、AGL、ROS等多种OS的支持，进一步加强对不同领域产品的支持，为用户提供一个灵活成熟的产品开发平台。

中国AI芯片新秀，造出全球首款量产数据流AI芯

去年6月，鲲云科技成为英特尔全球旗舰FPGA合作伙伴，并与浪潮信息达成元脑计划战略合作，在AI计算加速方面开展深入合作
产品发布：
- 2020年6月X3边缘推理卡，8月云端X9
- CAISA 3.0相比对手Wavecomputing， Groq（数据流）更快商用落地，1.0/2.0 FPGA
- 28nm，INT8 10.9T，算力利用率98.4%
- 支持主流网络框架Caffe，Tensorflow，ONNX
- 支持算法模型：VGG，Resnet,Yolo
- 工业级温度范围：-40~125°C

V. Startups Worldwide

The Cerebras CS-1 computes deep learning AI problems by being bigger, bigger, and bigger than any other chip

Today, the company announced the launch of its end-user compute product, the Cerebras CS-1, and also announced its first customer of Argonne National Laboratory.

TO POWER AI, THIS STARTUP BUILT A REALLY, REALLY BIG CHIP

New artificial intelligence company Cerebras Systems is unveiling the largest semiconductor chip ever built. The Cerebras Wafer Scale Engine has 1.2 trillion transistors, the basic on-off electronic switches that are the building blocks of silicon chips. Intel’s first 4004 processor in 1971 had 2,300 transistors, and a recent Advanced Micro Devices processor has 32 billion transistors.

Cerebras Systems unveils a record 1.2 trillion transistor chip for AI

Computer chips are usually small. The processor that powers the latest iPhones and iPads is smaller than a fingernail, and even the beefy devices used in cloud servers aren’t much bigger than a postage stamp. Then there’s a new chip from startup Cerebras: It’s bigger than an iPad all by itself. The silicon monster is almost 22 centimeters—roughly 9 inches—on each side, making it likely the largest computer chip ever, and a monument to the tech industry’s hopes for artificial intelligence. Cerebras plans to offer it to tech companies trying to build smarter AI more quickly.

wave 申请破产

Wave Computing并非倒闭，只是申请破产保护，进行资产重组，也并未解雇所有员工，但中国区已全部关闭。2018年6月，Wave Computing收购老牌半导体IP公司MIPS，计划通过将它的数据流架构与它的MIPS嵌入式RISC多线程CPU核心和IP相结合，为下一代AI提供了动力。软件可动态重构计算（或者说“软件定义芯片”）,2019/11停止开放

Wave’s Compute Appliance is capable to run TensorFlow at 2.9 PetaOPS/sec on their 3RU appliance. Wave refers to their processors at DPUs and an appliance has 16 DPUs. Wave uses processing elements it calls Coarse Grained Reconfigurable Arrays (CGRAs). It is unclear what bit width the 2.9 PetaOPS/s is referring to. Some details can be fund in their white paper.

After HotChips 2017, in the next plateform article "First In-Depth View of Wave Computing’s DPU Architecture, Systems", more details were discussed.

Microsoft ML scientist run SONIC that recognises breast Xray image of corona-virus on Graphcore, 10x NV traditional chip

在Intelligent Health 2020峰会上，微软机器学习科学家Sujeeth Bharadwaj展示了他对Graphcore芯片的应用。Bharadwaj使SONIC神经网络在Graphcore芯片上运行，并将其用于识别新冠肺炎患者的胸透图像。系统运行结果显示，Graphcore芯片可在30分钟内完成NVIDIA的传统芯片5个小时的训练工作量。10x NV，2nd IPU better than A100

Graphcore, the AI chipmaker, raises another $150M at a $1.95B valuation

Graphcore, the Bristol-based startup that designs processors specifically for artificial intelligence applications, announced it has raised another $150 million in funding for R&D and to continue bringing on new customers. It’s valuation is now $1.95 billion.

Microsoft and Graphcore Colleborate to Accelerate Artificial Intelligence

解密又一个xPU：Graphcore的IPU give some analysis on its IPU architecture.

Graphcore AI芯片：更多分析 More analysis.

深度剖析AI芯片初创公司Graphcore的IPU In-depth analysis after more information was disclosed.

The 2,048-core PEZY-SC2 sets a Green500 record

The SC2 is a second-generation chip featuring twice as many cores – i.e., 2,048 cores with 8-way SMT for a total of 16,384 threads. Operating at 1 GHz with 4 FLOPS per cycle per core as with the SC, the SC2 has a peak performance of 8.192 TFLOPS (single-precision). Both prior chips were manufactured on TSMC’s 28HPC+, however in order to enable the considerably higher core count within reasonable power consumption, PEZY decided to skip a generation and go directly to TSMC’s 16FF+ Technology.

Tenstorrent is a small Canadian start-up in Toronto claiming an order of magnitude improvement in efficiency for deep learning, like most. No real public details but they're are on the Cognitive 300 list.

Blaize emerges from stealth with $87 million for its custom-designed AI chips

The fierce competition isn’t deterring Blaize (formerly Thinci), which hopes to stand out from the crowd with a novel graph streaming architecture. The nine-year-old startup’s claimed system-on-chip performance is impressive, to be fair, which is likely why it’s raised nearly $100 million from investors including automotive component maker Denso.

Founded in 2014, Newark, California startup Koniku has taken in $1.65 million in funding so far to become “the world’s first neurocomputation company“. The idea is that since the brain is the most powerful computer ever devised, why not reverse engineer it? Simple, right? Koniku is actually integrating biological neurons onto chips and has made enough progress that they claim to have AstraZeneca as a customer. Boeing has also signed on with a letter of intent to use the technology in chemical-detecting drones.

Adapteva has taken in $5.1 million in funding from investors that include mobile giant Ericsson. The paper "Epiphany-V: A 1024 processor 64-bit RISC System-On-Chip" describes the design of Adapteva's 1024-core processor chip in 16nm FinFet technology.

Knowm is actually setup as a .ORG but they appear to be pursuing a for-profit enterprise. The New Mexcio startup has taken in an undisclosed amount of seed funding so far to develop a new computational framework called AHaH Computing (Anti-Hebbian and Hebbian). The gory details can be found in this publication, but the short story is that this technology aims to reduce the size and power consumption of intelligent machine learning applications by up to 9 orders of magnitude.

A battery powered neural chip from Mythic with 50x lower power.

Founded in 2012, Texas-based startup Mythic (formerly known as Isocline) has taken in $9.5 million in funding with Draper Fisher Jurvetson as the lead investor. Prior to receiving any funding, the startup has taken in $2.5 million in grants. Mythic is developing an AI chip that “puts desktop GPU compute capabilities and deep neural networks onto a button-sized chip – with 50x higher battery life and far more data processing capabilities than competitors“. Essentially, that means you can give voice control and computer vision to any device locally without needing cloud connectivity.

Kalray Releases the Kalray Neural Network 3.0

Kalray (Euronext Growth Paris – ALKAL), a pioneer in processors for new intelligent systems, has announced the launch of the Kalray Neural Network 3.0 (KaNN), a platform for Artificial Intelligence application development. KaNN allows developers to seamlessly port their AI-based algorithms from well-known machine learning frameworks including Caffe, Torch and TensorFlow onto Kalray’s Massively Parallel Processor Array (MPPA) intelligent processor.

BrainChip Showcases Vision and Learning Capabilities of its Akida Neural Processing IP and Device at tinyML Summit 2020

BrainChip Holdings Ltd. (ASX: BRN), a leading provider of ultra-low power, high-performance edge AI technology, today announced that it will present its revolutionary new breed of neuromorphic processing IP and Device in two sessions at the tinyML Summit at the Samsung Strategy & Innovation Center in San Jose, California February 12-13.

BrainChip Inc (CA. USA) was the first company to offer a Spiking Neural processor, which was patented in 2008 (patent US 8,250,011). The current device, called the BrainChip Accelerator is a chip intended for rapid learning. It is offered as part of the BrainChip Studio software. BrainChip is a publicly listed company as part of BrainChip Holdings Ltd.

aiWare3 Hardware IP Helps Drive Autonomous Vehicles To Production.

Latest technology enables scalable, low-power automotive inference engines with >50 TMAC/s NN processing power.

MOUNTAIN VIEW, Calif., October 30, 2018 – AImotive™, the global provider of full stack, vision-first self-driving technology, today announced the release of aiWare3™, the company’s 3rd generation, scalable, low-power, hardware Neural Network (NN) acceleration core.

Leepmind is carrying out research on original chip architectures in order to implement Neural Networks on a circuit enabling low power DeepLearning

A crowdfunding effort for Snickerdoodle raised $224,876 and they’re currenty shipping. If you pre-order one, they’ll deliver it by summer. The palm-sized unit uses the Zynq “System on Chip” (SoC) from Xilinix.

NovuMind combines big data, high-performance, and heterogeneous computing to change the Internet of Things (IoT) into the Intelligent Internet of Things (I²oT). Here is a paper from Moor Insights & Strategy, a global technology analyst and research firm. about NovuMind

Reduced Energy Microsystems are developing lower power asynchronous chips to suit CNN inference. REM was Y Combinator's first ASIC venture according to TechCrunch.

TeraDeep is building an AI Appliance using its deep learning FPGA’s acceleration. The company claims image recognition performance on AlexNet to achieve a 2X performance advantage compared with large GPUs, while consuming 5X less power. When compared to Intel’s Xeon processor, TeraDeep’s Accel technology delivers 10X the performance while consuming 5X less power.

DEEP VISION

Deep Vision is bulding low-power chips for deep learning. Perhaps one of these papers by the founders have clues, "Convolution Engine: Balancing Efficiency & Flexibility in Specialized Computing" [2013] and "Convolution Engine: Balancing Efficiency and Flexibility in Specialized Computing" [2015].

Groq

Groq is founded by Ex-googlers, who designed Google TPU.

Groq's website claims that its first chip will run 400 trillion operations per second with 8TOP/s per Watt power efficiency.

KAIST DNPU

Face Recognition System “K-Eye” Presented by KAIST

从ISSCC Deep Learning处理器论文到人脸识别产品

Kneron to Accelerate Edge AI Development with more than 10 Million USD Series A Financing

According to this article, "Gyrfalcon offers Automotive AI Chip Technology"

Gyrfalcon Technology Inc. (GTI), has been promoting matrix-based application specific chips for all forms of AI since offering their production versions of AI accelerator chips in September 2017. Through the licensing of its proprietary technology, the company is confident it can help automakers bring highly competitive AI chips to production for use in vehicles within 18 months, along with significant gains in AI performance, improvements in power dissipation and cost advantages.

According to this article, "Esperanto exits stealth mode, aims at AI with a 4,096-core 7nm RISC-V monster"

Although Esperanto will be licensing the cores they have been designing, they do plan on producing their own products. The first product they want to deliver is the highest TeraFLOP per Watt machine learning computing system. Ditzel noted that the overall design is scalable in both performance and power. The chips will be designed in 7nm and will feature a heterogeneous multi-core architecture.

SambaNova Systems raises $250 million for software-defined AI hardware

The infrastructure required to handle AI workloads is often as complex as it is sprawling, but a cottage industry of startups has emerged whose focus is developing solutions for end customers. SambaNova Systems is one such startup — the Palo Alto, California-based firm, which was founded in 2017 by Rodrigo Liang and Stanford Professors Kunle Olukotun and Chris Ré, provides systems that run AI and data-intensive apps from the datacenter to the edge. In a reflection of investors’ ravenous appetite for the market, it today announced that it’s raised $250 million in series C funding.

According to the linkedin page of its CEO, former SPARC developer in ORACLE, SambaNova Systems is a computing startup focused on building machine learning and big data analytics platforms. SambaNova's software-defined analytics platform enables optimum performance for any ML training, inference or analytics models.

The red-hot AI hardware space gets even hotter with $56M for a startup called SambaNova Systems

SambaNova is the product of technology from Kunle Olukotun and Chris Ré, two professors at Stanford, and led by former Oracle SVP of development Rodrigo Liang, who was also a VP at Sun for almost 8 years.

GreenWaves Technologies develops IoT Application Processors based on Open Source IP blocks enabling content understanding applications on embedded, battery-operated devices with unmatched energy efficiency. Our first product is GAP8. GAP8 provides an ultra-low power computing solution for edge devices carrying out inference from multiple, content rich sources such as images, sounds and motions. GAP8 can be used in a variety of different applications and industries.

Light-Powered Computers Brighten AI’s Future

Optical computers may have finally found a use—improving artificial intelligence

Lightmatter aims to reinvent AI-specific chips with photonic computing and $11M in funding

It takes an immense amount of processing power to create and operate the “AI” features we all use so often, from playlist generation to voice recognition. Lightmatter is a startup that is looking to change the way all that computation is done — and not in a small way. The company makes photonic chips that essentially perform calculations at the speed of light, leaving transistors in the dust. It just closed an $11 million Series A.

First Low-Power AI-Inference Accelerator Vision Processing Unit From Think Silicon To Debut at Embedded World 2018

TORONTO, Canada/NUREMBERG, Germany – FEB 21st, 2018 – Think Silicon®, a leader in developing ultra-low power graphics IP technology, will demonstrate a prototype of NEMA® xNN, the world’s first low-power ‘Inference Accelerator’ Vision Processing Unit for artificial intelligence, convolutional neural networks at Embedded World 2018.

Startup Puts AI Core in SSDs

Startup InnoGrit debuted a set of three controllers for solid-state drives (SSDs), including one for data centers that embeds a neural-network accelerator. They enter a crowded market with claims of power and performance advantages over rivals.

Innogrit Technologies Incorporated is a startup seting out to solve the data storage and data transport problem in artificial intelligence and other big data applications through innovative integrated circuit (IC) and system solutions: Extracts intelligence from correlated data and unlocks the value in artificial intelligence systems; Reduces redundancy in big data and improves system efficiency for artificial intelligence applications; Brings networking capability to storage devices and offers unparalleled performance at large scales; Performs data computation within storage devices and boosts performance of large data centers.

Kortiq is a startup providing "FPGA based Neural Network Engine IP Core and The scalable Solution for Low Cost Edge Machine Learning Inference for Embedded Vision". Recently, they revealed some comparison data. You can also find the Preliminary Datasheet of their AIScaleCDP2 IP Core on their website.

Hailo unveils Hailo-8, an edge chip custom-designed for AI workloads

......Hailo-8 is capable of 26 tera operations per second (TOPs) ...... In one preliminary test at an image resolution of 224 x 224, the Hailo-8 processed 672 frames per second compared with the Xavier AGX’s 656 frames and sucked down only 1.67 watts (equating to 2.8 TOPs per watt) versus the Nvidia chip’s 32 watts (0.14 TOPs per watt)......

Tachyum Running Apache is a Key Milestone for Prodigy Universal Processor Software Stack

Semiconductor startup Tachyum Inc. today announced that it has completed another critical stage in software development by successfully achieving an Apache web server port to Prodigy Universal Processor Instruction Set Architecture (ISA). This latest milestone by Tachyum’s software team brings the company’s Prodigy Universal Processor one step closer to being customer-ready in anticipation of its commercial launch in 2021.

Startup AI Chip Passes Road Test

AlphaICs designed an instruction set architecture (ISA) optimized for deep-learning, reinforcement-learning, and other machine-learning tasks. The startup aims to produce a family of chips with 16 to 256 cores, roughly spanning 2 W to 200 W.

Syntiant: Analog Deep Learning Chips

Startup Syntiant Corp. is an Irvine, Calif. semiconductor company led by former top Broadcom engineers with experience in both innovative design and in producing chips designed to be produced in the billions, according to company CEO Kurt Busch.

HABANA LABS Announces Gaudi AI Training Processor

TEL-AVIV, ISRAEL and SAN JOSE, CA–June 17, 2019 – Habana Labs, Ltd. (www.habana.ai), a leading developer of AI processors, today announced the Habana Gaudi™ AI Training Processor. Training systems based on Gaudi processors will deliver an increase in throughput of up to four times over systems built with equivalent number GPUs.

You can also find the reports in the media

Startup’s AI Chip Beats GPU

The Goya chip can process 15,000 ResNet-50 images/second with 1.3-ms latency at a batch size of 10 while running at 100 W. That compares to 2,657 images/second for an Nvidia V100 and 1,225 for a dual-socket Xeon 8180. At a batch size of one, Goya handles 8,500 ResNet-50 images/second with a 0.27-ms latency.

Baidu Backs Neuromorphic IC Developer

MUNICH — Swiss startup aiCTX has closed a $1.5 million pre-A funding round from Baidu Ventures to develop commercial applications for its low-power neuromorphic computing and processor designs and enable what it calls “neuromorphic intelligence.” It is targeting low-power edge-computing embedded sensory processing systems.

AI startup Flex Logix touts vastly higher performance than Nvidia

Four-year-old startup Flex Logix has taken the wraps off its novel chip design for machine learning. CEO Geoff Tate describes how the chip may take advantage of an "explosion" of inferencing activity in "edge computing," and how Nvidia can't compete on performance.

Preferred Networks develops a custom deep learning processor MN-Core for use in MN-3, a new large-scale cluster, in spring 2020

Dec. 12, 2018, Tokyo Japan – Preferred Networks, Inc. (“PFN”, Head Office: Tokyo, President & CEO: Toru Nishikawa) announces that it is developing MN-Core (TM), a processor dedicated to deep learning and will exhibit this independently developed hardware for deep learning, including the MN-Core chip, board, and server, at the SEMICON Japan 2018, held at Tokyo Big Site.

AI Startup Cornami reveals details of neural net chip

Stealth startup Cornami on Thursday revealed some details of its novel approach to chip design to run neural networks. CTO Paul Masters says the chip will finally realize the best aspects of a technology first seen in the 1970s.

AI chip startup offers new edge computing solution

Anaflash Inc. (San Jose, CA) is a startup company that has developed a test chip to demonstrate analog neurocomputing taking place inside logic-compatible embedded flash memory.

Optalysys launches world’s first commercial optical processing system, the FT:X 2000

Optalysys develops Optical Co-processing technology which enables new levels of processing capability delivered with a vastly reduced energy consumption compared with conventional computers. Its first coprocessor is based on an established diffractive optical approach that uses the photons of low-power laser light instead of conventional electricity and its electrons. This inherently parallel technology is highly scalable and is the new paradigm of computing.

Low-Power AI Startup Eta Compute Delivers First Commercial Chips

The firm pivoted away from riskier spiking neural networks using a new power management scheme

Eta Compute Debuts Spiking Neural Network Chip for Edge AI

Chip can learn on its own and inference at 100-microwatt scale, says company at Arm TechCon.

Achronix Rolls 7-nm FPGAs for AI

Achronix is back in the game of providing full-fledged FPGAs with a new high-end 7-nm family, joining the Gold Rush of silicon to accelerate deep learning. It aims to leverage novel design of its AI block, a new on-chip network, and use of GDDR6 memory to provide similar performance at a lower cost than larger rivals Intel and Xilinx.

Startup Runs AI in Novel SRAM

Areanna is the latest example of an explosion of new architectures spawned by the rise of deep learning. The debut of a whole new approach to computing has fired imaginations of engineers around the industry hoping to be the next Hewlett and Packard.

NeuroBlade Preps Inference Chip

Add NeuroBlade to the dozens of startups working on AI silicon. The Israeli company just closed a $23 million Series A, led by the founder of Check Point Software and with participation from Intel Capital.

Bill Gates just backed a chip startup that uses light to turbocharge AI

Luminous Computing has developed an optical microchip that runs AI models much faster than other semiconductors while using less power.

Chip startup Efinix hopes to bootstrap AI efforts in IoT

Six-year-old startup Efinix has created an intriguing twist on the FPGA technology dominated by Intel and Xiliinx; the company hopes its energy-efficient chips will bootstrap the market for embedded AI in the Internet of Things.

AIStorm raises $13.2 million for AI edge computing chips

David Schie, a former senior executive at Maxim, Micrel, and Semtech, thinks both markets are ripe for disruption. He — along with WSI, Toshiba, and Arm veterans Robert Barker, Andreas Sibrai, and Cesar Matias — in 2011 cofounded AIStorm, a San Jose-based artificial intelligence (AI) startup that develops chipsets that can directly process data from wearables, handsets, automotive devices, smart speakers, and other internet of things (IoT) devices.

SiMa.ai™ Introduces MLSoC™ – First Machine Learning Platform to Break 1000 FPS/W Barrier with 10-30x Improvement over Alternative Solutions

SiMa.ai, the company enabling high performance machine learning to go green, today announced its Machine Learning SoC (MLSoC) platform – the industry’s first unified solution to support traditional compute with high performance, lowest power, safe and secure machine learning inference. Delivering the highest frames per second per watt, SiMa.ai’s MLSoC is the first machine learning platform to break the 1000 FPS/W barrier for ResNet-501. In customer engagements, the company has demonstrated 10-30x improvement in FPS/W through its automated software flow across a wide range of embedded edge applications, over today’s competing solutions. The platform will provide machine learning solutions that range from 50 TOPs@5W to 200 TOPs@20W, delivering an industry first of 10 TOPs/W for high performance inference.

Untether AI raises $20 million to develop machine learning inferencing hardware

Untether AI, a Toronto-based startup that’s developing high-efficiency, high-performance chips for AI inferencing workloads, this morning announced that it has raised a $20 million series A round, following a small seed investment. Radical Ventures joined Intel Capital and other investors in the round, with Radical Ventures partner Tomi Poutanen joining as a board member.

GrAI Matter Labs Reveals NeuronFlow Technology and Announces GrAIFlow SDK

GrAI Matter Labs (aka GML), a neuromorphic computing pioneer today revealed NeuronFlow – a new programmable processor technology – and announced an early access program to its GrAIFlow software development kit.

Rain Neuromorphics on Crunchbase

We build artificial intelligence processors, inspired by the brain. Our mission is to enable brain-scale intelligence.

Applied Brain Research on Crunchbase

ABR makes the world's most advanced neuromoprhic compiler, runtime and libraries for the emerging space of neuromorphic computing.

XMOS adapts Xcore into AIoT ‘crossover processor’

EE Times exclusive! The new chip targets AI-powered voice interfaces in IoT devices — “the most important AI workload at the endpoint.”

XMOS unveils Xcore.ai, a powerful chip designed for AI processing at the edge

The latest xcore.ai is a crossover chip designed to deliver high-performance AI, digital signal processing, control, and input/output in a single device with prices from $1.

We design and produce AI processors and the software to run them in data centers. Our unique approach optimizes for inference with the focus on performance, power efficiency, and ease of use; and at the same time our approach enables cost-effective training.

We build high-performance AI inference coprocessors that can be seamlessly integrated into various computing platforms including data centers, servers, desktops, automobiles and robots.

AI Chip Compilers

1. pytorch/glow
2. TVM:End to End Deep Learning Compiler Stack
3. Google Tensorflow XLA
4. Nvidia TensorRT
5. PlaidML
6. nGraph
7. MIT Tiramisu compiler
8. ONNC (Open Neural Network Compiler)
9. Multi-Level Intermediate Representation
10. The Tensor Algebra Compiler (taco)

AI Chip Benchmarks

TPU v4的性能较TPU v3平均提升2.7倍
MLPerf训练基准测试结果链接
当前MLPerf训练基准测试包含图像分类、图像分割、目标检测、翻译等8种机器学习模型，最新版本的MLPerf包括两个新的测试BERT、DLRM和一个大幅修订的测试MiniGo。前沿对话式AI模型BERT是现有最复杂的神经网络模型之一，常被用作翻译、搜索、文本理解、问答等任务。推荐系统是日益普及的一项AI任务，深度学习推荐模型DLRM常被用于在线购物推荐、搜索、社会媒体内容排序等任务。强化学习模型MiniGo使用了全尺寸19x19围棋版本，是本轮最复杂的测试，内容涵盖从游戏到训练的多项操作。

Reference

FPGAs and AI processors: DNN and CNN for all
12 AI Hardware Startups Building New AI Chips
Tutorial on Hardware Architectures for Deep Neural Networks
Neural Network Accelerator Inference
"White Paper on AI Chip Technologies 2018". You can download it from here, or Google drive.

Name		Name	Last commit message	Last commit date
Latest commit History 660 Commits
resource		resource
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Chip (MKTs & ICs )

AI IC in Car

Latest updates

Shortcut

Application Category

I. IC Vendors

Nervana

Mobileye EyeQ

Movidius

FPGA

Loihi

GPU

SoC

NVDLA

II. Tech Giants & HPC Vendors

III. Traditional IP Vendors

RISC-V

IV. Startups in China

Biren

V. Startups Worldwide

DEEP VISION

Groq

KAIST DNPU

AI Chip Compilers

AI Chip Benchmarks

Reference

About

Releases

Packages

Languages

barbara-x/AI-Chip

Folders and files

Latest commit

History

Repository files navigation

AI Chip (MKTs & ICs )

AI IC in Car

Latest updates

Shortcut

Application Category

I. IC Vendors

Nervana

Mobileye EyeQ

Movidius

FPGA

Loihi

GPU

SoC

NVDLA

II. Tech Giants & HPC Vendors

III. Traditional IP Vendors

RISC-V

IV. Startups in China

Biren

V. Startups Worldwide