Unity ScreencapAfter new year festivities it’s time to get back to my series of Unity3D tutorials. This time, I’ll show you how to extract the fundamental, or strongest, frequency in a mixed-signal input such as coming from a microphone into Unity3D. Then we’ll look into how you can compare them to notes from a bass or any other instrument.

Do the Fast Fourier Transform

As mentioned in a previous tutorial, we can utilize Fast Fourier Transform (FFT) to get the frequency data out of a signal. When using Unity3D we don’t have to implement our own FFT function since Unity3D provides us with GetSpectrumData function. To use this function, you pass it a float array with a size that’s power of two (ie. 128, 256, 512) with a minimum of 64 and maximum of 8192 along with a channel to extract data from and a possible window function to increase precision. Now, if we take the MicrophoneInput -script from my previous tutorial and start to build on that, we’ll add a new function called GetFundamentalFrequency, where we first grab the spectrum data to an array. I’ve also defined a variable for the fundamental frequency we are going to calculate later on.

float GetFundamentalFrequency()
{
  float fundamentalFrequency = 0.0f;
  float[] data = new float[8192];
  audio.GetSpectrumData(data,0,FFTWindow.BlackmanHarris);
  return fundamentalFrequency;
}

Find the bin

Now, we are not really calculating the exact frequency that is strongest in the signal, but we are going to find out the FFT bin that has the strongest signal. We do that by iterating through the data and keeping track of the signal level in the loudest bin. We do that by using a simple loop and a couple of temporary variables, s will keep the strength of the strongest signal and i will keep the index of the bin where that signal was found.

float s = 0.0f;
int i = 0;
for (int j = 1; j < 8192; j++)
{
  if ( s < data[j] )
  {
    f = data[j];
    i = j;
  }
}

Calculate the frequency

In order to get the frequency, we have to do some maths. Since the precision of FFT depends also on our sample rate, we must take this into account. Earlier, I wrote a post about the FFT and it’s precision so you might want to check that out too in order to get the details. But the formula we are using to calculate the frequency of the bin we found that was the strongest, is as follows:

frequency = binIndex * samplerate / bins

As you can see, the precision is dependent on the sample rate and the number of bins (size of array) used in the FFT. After adding that equation to the function, it looks like this.

float GetFundamentalFrequency()
{
  float fundamentalFrequency = 0.0f;
  float[] data = new float[8192];
  audio.GetSpectrumData(data,0,FFTWindow.BlackmanHarris);
  float s = 0.0f;
  int i = 0;
  for (int j = 1; j < 8192; j++)
  {
    if ( s < data[j] )
    {
      s = data[j];
      i = j;
    }
  }
  fundamentalFrequency = i * samplerate / 8192;
  return fundamentalFrequency;    
}

Putting it together

Now we have a function that provides us with a frequency that is strongest in the signal fed in by our microphone. To combine this properly to the script, we should add a global variable for the sample rate and for the frequency we found so we can access it from other scripts. With these changes, the full MicrophoneInput script should be something like this:

using UnityEngine;
using System.Collections;

[RequireComponent(typeof(AudioSource))]
public class MicrophoneInput : MonoBehaviour {
  public float sensitivity = 100.0f;
  public float loudness = 0.0f;
  public float frequency = 0.0f;
  public int samplerate = 11024;

  void Start() {
    audio.clip = Microphone.Start(null, true, 10, samplerate);
    audio.loop = true; // Set the AudioClip to loop
    audio.mute = true; // Mute the sound, we don't want the player to hear it
    while (!(Microphone.GetPosition(AudioInputDevice) > 0)){} // Wait until the recording has started
    audio.Play(); // Play the audio source!
  }

  void Update(){
    loudness = GetAveragedVolume() * sensitivity;
    frequency = GetFundamentalFrequency();
  }

  float GetAveragedVolume()
  {
    float[] data = new float[256];
    float a = 0;
    audio.GetOutputData(data,0);
    foreach(float s in data)
    {
      a += Mathf.Abs(s);
    }
    return a/256;
  }

  float GetFundamentalFrequency()
  {
    float fundamentalFrequency = 0.0f;
    float[] data = new float[8192];
    audio.GetSpectrumData(data,0,FFTWindow.BlackmanHarris);
    float s = 0.0f;
    int i = 0;
    for (int j = 1; j < 8192; j++)
    {
      if ( s < data[j] )
      {
        s = data[j];
        i = j;
      }
    }
    fundamentalFrequency = i * samplerate / 8192;
    return fundamentalFrequency;
  }
}

Now lets figure out what note that is…

Ok, we have the strongest frequency now. If you want to convert that to a note, you need to know the fundamental frequency of that note and compare it to the frequency given by our function. Let’s say we want to know if the note being played is C4, or “middle-C”. If we assume that A4 is 440Hz, as it usually is with normal tuning, C4 is 261.63Hz. Now all you need to do, is make a simple comparison between that and the frequency you get from the script above. Lets make that into a script, I’ll call it NoteFinder for now and make it display the note in a GUIText component if it is found. The beginning of the script is pretty much the same as the SpawnByLoudness -script from previous post, except for the inclusion of requirement for GUIText component.

using UnityEngine;
using System.Collections;

[RequireComponent(typeof(GUIText))] // Require GUIText component so we can display a text
public class NoteFinder : MonoBehaviour {
  public GameObject audioInputObject;
  public float threshold = 1.0f;
  MicrophoneInput micIn;
  // Use this for initialization
  void Start () {
    if (audioInputObject == null)
      audioInputObject = GameObject.Find("MicMonitor");
    micIn = (MicrophoneInput) audioInputObject.GetComponent("MicrophoneInput");
  }
  
  // Update is called once per frame
  void Update () {
    int f = (int)micIn.frequency; // Get the frequency from our MicrophoneInput script
    if (f >= 261 && f <= 262) // Compare the frequency to known value, take possible rounding error in to account
    {
      this.guiText.text="Middle-C played!";
    }
    else
    {
      this.guiText.text="Play another note...";
    }
  }
}

That’s all folks, or is it?

You should now have a system that can detect frequencies and notes for you. You can go ahead and implement different versions of the frequency detection, like divide it to a frequency bands and use their combined loudness value to trigger events in your game. If you however want to detect more notes, you could refer to a table like http://www.phy.mtu.edu/~suits/notefreqs.html for more frequency values.

There is a couple of considerations though. Make sure you select an appropriate sample rate for your implementation. For example, since I want to have a good resolution in the low frequencies, I use sample rate of 11025 and FFT size of 8192. this gives me a bit over 1Hz resolution up to around 5000Hz. Then there is a way to speed up the frequency calculation. Since FFT, by nature, does not give any meaningful information with a real-world signal over the Nyquist frequency, we can ignore the upper half of the bins. So when using 8192 bins, we need to iterate only to 4096 bins, which speeds up the GetFundamentalFrequency loop quite a bit.