Speech Recognition using Weighted Finite-State Transducers
Abstract
Speech recognition has always been a prominent field of research in NLP, due to its numerous applications such as speech-to-text conversion, voice assistants, enabling smart homes, etc. Computer algorithms are used by speech recognition systems to process, interpret, and transform spoken words into textual content. Modern statistically-based speech recognition systems use a variety of algorithms such as Dynamic Time Warping(DTW), Neural Networks, and end-to-end automatic system. In this paper, we will perform speech recognition using a type of Finite Automata called Weighted Finite-State Transducer.A Finite Transducer is a machine that has no final state. It only takes in some input and produces the appropriate output. Finite State Transducers can contain weights, in which case they are called Weighted Finite-State Transducers (WFST), where each transition is labeled with a weight along with their input and output labels. Weights can represent penalties, probabilities, durations, or any other type of value that gets added along the paths for the computation of the overall weight of mapping an input sequence to an output sequence. This property of weighted transducers allows it to be an essential choice for representation of the probabilistic finite-state models that are widely used in speech processing. Thus, weighted finite-state transducers elucidate a common framework for the formulation and use of models in speech recognition, with shared algorithms that provide significant algorithmic and software engineering benefits.
Type
Publication
2022 IEEE 7th International conference for Convergence in Technology (I2CT)
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.