Inference platform for brainshark

14 3 0
Inference platform for brainshark

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Tesla Master Deck NVIDIA AI INFERENCE PLATFORM ‹› 1 LIVE VIDEO SPEECH Creating a 20 Billion Opportunity in Next 5 Years AI INFERENCE IS EXPLODING RECOMMENDATIONS 1 Billion Voice Searches Per Day Goo.Tesla Master Deck NVIDIA AI INFERENCE PLATFORM ‹› 1 LIVE VIDEO SPEECH Creating a 20 Billion Opportunity in Next 5 Years AI INFERENCE IS EXPLODING RECOMMENDATIONS 1 Billion Voice Searches Per Day Goo.

NVIDIA AI INFERENCE PLATFORM AI INFERENCE IS EXPLODING Creating a $20 Billion Opportunity in Next Years Billion Billion Trillion Videos Watched Per Day Facebook Voice Searches Per Day Google, Bing, etc Ads/Rankings Per Day Impressions LIVE VIDEO SPEECH RECOMMENDATIONS GPU INFERENCE ADOPTION IS ACCELERATING VISUAL SEARCH VIDEO ANALYSIS ADVERTISING INFERENCE USE CASES Image Maps NLP Search Speech 60X Latency Improvement RealTime Search 12X Faster Inference Live Video Analysis 40X Higher Performance RealTime Brand Impact Video Tesla P4, TensorRT Rapid Adoption CONVOLUTIONAL NETWORKS RECURRENT NETWORKS A CAMBRIAN EXPLOSION OF DL MODELS Encoder/Decode r ReLu Concat Dropout BatchNorm Pooling GENERATIVE ADVERSARIAL NETWORKS LSTM GRU Beam Search WaveNet CTC Attentio n REINFORCEMENT LEARNING 3D-GAN Coupled GAN MedGAN Conditional GAN Speech Enhancement GAN NEW SPECIES Capsule Nets DQN Simulation DDPG Mixture of Experts Neural Collaborativ e Filtering Block Sparse LSTM NEURAL NETWORK COMPLEXITY IS EXPLODING Bigger and More Compute Intensive 1.9 GB 480 MB U-Net DeepSpeech MaskRCNN DeepSpeech ResNet50 DeepSpeec h 2011 2012 2013 2014 Speech 2015 2016 GoogleN et Classificatio 2011 2012 n Segmentatio 2013 2014 n Enhanceme 2015 2016 nt Images INEFFICIENCY LIMITS INNOVATION Difficulties with Deploying Data Center Inference Single Model Only Single Framework Only Custom Development ! ASR NLP Recommender Some systems are overused while others are underutilized Solutions can only support models from one framework Developers need to reinvent the plumbing for every application ANNOUNCING TENSORRT HYPERSCALE INFERENCE PLATFORM WORLD’S MOST ADVANCED INFERENCE GPU INTEGRATED INTO FRAMEWORKS & ONNX SUPPORT TENSORRT INFERENCE SERVER ANNOUNCING TESLA T4 WORLD’S MOST ADVANCED INFERENCE GPU Universal Inference Acceleration 320 Turing Tensor cores 2,560 CUDA cores 65 FP16 TFLOPS | 130 INT8 TOPS | 260 INT4 TOPS 16GB | 320GB/s NEW TURING TENSOR CORE MULTI-PRECISION FOR AI INFERENCE 65 TFLOPS FP16 | 130 TeraOPS INT8 | 260 TeraOPS INT4 ANNOUNCING NVIDIA TensorRT FRAMEWOR KS GPU PLATFORMS Fastest Deep Learning Inference Platform TESLA P4 TensorRT Optimizer Data Center, Embedded & Automotive JETSON TX2 Runtime DRIVE PX In-framework support for TensorFlow NVIDIA DLA TESLA V100 Support for all other frameworks and ONNX Containerized Inference Serving Engine Docker and Kubernetes integration New Layers and APIs New OS Support for Windows and CentOS Layer & Tensor Fusion *New in TRT5 developer.nvidia.com/tensorrt Precision Calibrati on Kernel Auto-Tuning Dynamic Tensor Memory 10 ANNOUNCING NVIDIA TENSORRT HYPERSCALE Containerized Microservice For Data Center Inference DNN Models Kubernetes and Docker on NVIDIA GPUs New Inference Serving Engine NV DL SDK NV Docker TensorRT Inference Server Kubernetes Multiple Model Types and Frameworks Concurrently Maximize Datacenter Throughput and Utilization 11 WORLD’S MOST PERFORMANT INFERENCE PLATFORM 260 250 TFLOPS / TOPS 200 Speech Inference 25 21X 20 130 50 5.5 30 10 25 P4 T4 CPU S erver Tesla T4 30 20 15 10X 5 1.0 Float INT8 INT4 36X 35 10X 10 4X 1.0 Float INT8 40 25 10 22 27X 15 100 65 Natural Language Processing Inference Video Inference 20 15 150 Speedup v CPU Server 300 S peedup v CPU S erver Peak Performance S peedup v CPU S erver Up To 36X Faster Than CPUs | Accelerates All AI Workloads Tesla P4 - Speedup: 21X faster DeepSpeech CPU Server Tesla T4 Tesla P4 - Speedup: 27x faster ResNet-50 (7ms latency limit) 1.0 CPU Server Tesla T4 Tesla P4 - Speedup: 36x faster GNMT 12 DRAMATIC SAVINGS FOR CUSTOMERS Game-Changing Inference Performance INFERENCE WORKLOAD : Speech, NLP and Video 200 CPU Servers 60 KWatts INFERENCE WORKLOAD : Speech, NLP and Video T4 Accelerated Server KWatts 13 NVIDIA INFERENCE MOMENTUM Image Tagging Video Analysis Finding Music Sports Performance Advertising Impact Video Captioning Cybersecurity Visual Search Customer Service Visual Search Industrial Inspection Voice Recognition 14

Ngày đăng: 30/08/2022, 07:02

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan