Application of the algorithm for approximation of the graphic of energy shares for determining pauses in a speech signal

Authors

DOI:

https://doi.org/10.17308/sait.2021.3/3740

Keywords:

energy fractions, mixture of radial-basis functions, mixture of Gaussian functions, decision function

Abstract

In this paper, a speech signal is considered as a set of fragments containing speech components and fragments with noises corresponding to pauses between words. The task is to formulate a decisive function capable of accepting or rejecting the hypothesis of the absence of speech in a segment of a speech signal. On the basis of the subband method for a segment of a speech signal, its energy distribution over frequencies is compiled. For this distribution, in what follows, a mixture approximation procedure is applied by radial basis functions (Gaussian functions). The mixture is a weighted sum of radial basis functions and a uniformly distributed component. Based on the ratio of the maximum values of the components of the mixture, a decisive rule is drawn up. To carry out a computational experiment, the nonlinearity «dead zone» is introduced, the choice of which is due to the peculiarities of the electrical activity of the pathways and centers of the auditory system. The paper presents the result of applying the algorithm for determining pauses in a speech signal. The database of marked speech fragments of the American agency for advanced defense research projects DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus was used as a working material. In total, 100 sound recordings were processed, the size of the analysis segment was taken 9 milliseconds, the sampling rate was 16000Hz. To test the efficiency of the proposed algorithm, errors of the first kind “miss the target” were evaluated — when the algorithm did not start to mark a pause, but such a mark is present during manual placement, as well as errors of the second kind “false alarm” — when an erroneous setting of a pause occurred. The results obtained in the course of computational experiments make it possible to judge the sufficiently high efficiency of the proposed approach for determining pauses in a speech signal.

Author Biographies

  • Tatiana N. Balabanova, Belgorod National Research University

    PhD in Technical Sciences, associate professor of the department of information and telecommunication systems and technologies, Belgorod National Research University

  • Aleksei V. Boldyshev, Belgorod branch of PJSC «Rostelekom»

    PhD in Technical Sciences, Lead engineer of station department, Belgorod branch of PJSC «Rostelekom»

  • Sergei V. Umanets, Belgorod branch of PJSC «Rostelekom»

    Lead engineer of station department, Belgorod branch of PJSC «Rostelekom»

References

Downloads

Published

2021-12-02

Issue

Section

Computer Linguistics and Natural Language Processing

How to Cite

Application of the algorithm for approximation of the graphic of energy shares for determining pauses in a speech signal. (2021). Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies, 3, 106-114. https://doi.org/10.17308/sait.2021.3/3740

Most read articles by the same author(s)