W&B x NVIDIA live workshop
What to expect?
Automatic Speech Recognition (ASR) refers to automatically transcribing spoken language, otherwise known as speech-to-text. Traditionally, ASR is accomplished by training multiple models separately to tackle different speech recognition challenges. End-to-end ASR architecture allows training all models towards the same goal, converting speech into text. In this hands-on lab, you will learn how to use NVIDIA’s Neural Modules (NeMo) toolkit to train an end-to-end ASR system and Weights & Biases to keep track of various experiments and performance metrics. NeMo, a Python toolkit, simplifies and abstracts building GPU SOTA real-time speech (ASR, TTS) and NLP semantically correct models. Weights and Biases enables storing all experiments in one place, including model weights and datasets, which are handy when comparing experiments. In this workshop, we will walk you through setting up the environment, explain different code blocks and tools as we execute the Jupyter Notebook, and then deploy the model to test its performance on your voice.