THE SHORT TERM FOURIER TRANSFORM EXPLAINED ByU. Vivekananda. Mtech NOTE: This article only explains the functioning and operation of the Short Term Fourier Transform. Mathematical analysis of the STFT is not dealt with in this article. For a mathematical overview of the STFT, one can refer the books stated at the end of this article. In the previous article we spoke of the Limitations of the Fourier transform in obtaining frequency content of a time-varying signal with respect to time. The Fourier transform was able to obtain the different frequency components of the signal, but did not provide information on the actual time at which these frequency components occurred. One possible solution that was suggested was that the signal be broken into a number of small segments by a windowing signal and then taking the Fourier transform of that signal. This article explains the basic working of the Short-Term Fourier Transform. The Basic methodology of the Short-Term Fourier Transform (STFT) is to extract N samples of the input speech signal by windowing and compute the DFT (Discrete Fourier Transform) of the signal without a significant loss of data. The short term Fourier Transform is given by the equation: 
Where ‘xn’ is the windowed input and ‘N’ is the size of the Window. The windowed signal ‘xn’ can be represented mathematically as given below: 
Where ‘wn’ is the windowing signal (Hamming window in our case). The Inverse Short-Term Fourier Transform is given by the following equation: 
Normally used Window signals are many but the Hamming window, Hanning window, and Kaiser window are more popular with the Hamming Window being the most used. The Hamming window is given by w(i) = 0.54 - 0.46*cos (2*pi*i/(n-1)) Where n = 64 is the window size. The Hamming Window is shown in the figure below 
FIGURE 1 The process of extracting the samples could be explained with the help of a simple diagram (Refer Figure 2). The Input sample in our case is a speech signal and the hamming window is used to slice a part of the signal and the STFT is computed on the signal.
One simple but straightforward question that might arise in the readers mind is to why such a complicated window function is used. Is it not possible to have a rectangle window function, which would be much simpler to implement like X (n) = 1 for -25< n <25 0 otherwise. And then move this window over the input signal as is done for the case of the hamming window? Yes, the Fourier transform of the function does have a narrow main lobe. But, it also has appreciable side lobes (-6dB) or secondary lobes. The rectangular function gives the best frequency resolution but due to the appreciable side-lobes is not used much in applications. Another issue regarding the use of windows to extract signals is the window Size. Since frequency and time are always inversely associated, the choice of window size affects time-resolution of the signal or the frequency resolution depending on the size of the window. This is explained by the exponential component “e-jwt” in the Fourier transform equation. Since “wt” in the term is to be a constant, increasing ‘w’ results in decrease in ‘t’ and vice-versa. To maintain good audio time-resolution and frequency-resolution and also depending on the application, window sizes in the range of 10ms-50ms are usually chosen. The short term Fourier transform is used in numerous fields of speech and audio. Design of filter banks utilizes the STFT significantly and is a major element in speech and audio compression. References: 1) Speech Communications –Human and Machines By Douglas o’ Shaughnessy 2) Digital Processing of Speech Signals By Rabiner/R.W. Schafer 3) The Wavelet Tutorial By Robi Poliker |