Pitch Me
Sadly Unity does not support using a microphone in the WebGL build of a game so I cannot show my game off here :(
To play you will have to download the source from Github and build it yourself
Pitch Me was my first game ever publicly released and I'm proud of how it turned out.
Even if it is just squares and circles.
It was created during the Global Game Jam 2018 with the theme for that year being Transmission.
The idea behind the game is that you are a square trying to dodge the circles flying at you. Now.
I know what you're thinking, what's so great about that?
Why are you proud of it?
Well, the unique thing about it is that you control the square with your voice!
Not like a "move up" "move down" control with your voice, but with a constant tone or hum instead.
If you want the square to go up, you raise the pitch of your voice and if you want the square to go lower you lower the pitch of your voice!
If you want to keep it where it is at, well you just hold the pitch.
There's only one thing you have to do before you try and play the game and that's calibrate your voice.
1. Hum a comfortable tone, not too high not too low.
2. Click the calibrate voice text and wait for the green box to fill.
3. You are now ready to play! The pitch you just hummed is the center position of the box as it flies along through space.
1. If you go too low the algorithm for determining the pitch of your voice does not work and will think you are making a tone that is higher than you actually are.
If you find that you are lowering your voice but the square keeps going up then you have probably hit the bottom range.
The technical details below somewhat goes into the cause of this bug, but does not flat out say the root cause. I have some hunches as to the cause of the bug but I have not explored enough.
2. On some computers there is a huge latency between speaking into the microphone and the game picking up the audio.
I have no idea why this is happening and believe it has something to do with the microphone drivers
With the current build of the game, I run it on one computer with no issues at all while running it on a different computer I can see the audio input being delayed by almost a full second.
You can tell if you have this bug if you raise your voice and the box does not move up instantly.
I will start by saying I built the game with the Unity Engine and let the engine handle all the audio processing.
With that said, I do know that it uses the Fast Fourier Transform to convert the waveform from the time domain into the frequency domain.
The math for the FFT is pretty complex and not something I would be able to do or even explain, maybe if I did this back in university I would be able to explain the math.
But right now I had to google "Fourier" to make sure I was spelling it correctly.
There are several flavors of the Fourier Transform, each optimized for a specific usecase.
The few I am aware of are the Fourier Transform (FT), Fast Fourier Transform (FFT), and Real Time Fourier Transform (RTFT) but they all get you the same information.
The idea behind these transforms is to figure out the frequncy(ies) that make up a tone.
Sadly, it is not some magical algorithm that spits out the exact frequency(ies) being played.
Instead I like to think if it as giving you "probability buckets" that will show you what frequency(ies) are most likely playing.
I know what you are thinking, what is a probability bucket? Well, to explain that I need to talk about the input and output of the FTs.
One of the inputs to the FT is waveform as an array where each element is a discrete datapoint in the wave.
I am making the assumption here that you already know how audio signals are represented in computers.
If not, take a look at Intro to Audio Programming
The output is another array with the same length as the input. This output array is what I was previously referring to as probability buckets.
Now that you know where the probability buckets come from, lets define what that means.
The "probability" comes from the fact that the value of each array element is some value greater than 0, with higher numbers being more likely the frequency of the tone.
If you look at the image below you can see that there is a spike in the plot of frequency buckets. So that top point is the bucket the frequency belongs in!
So in order to figure out the frequency of the waveform you just need to iterate over the output array and find the element with the highest value. Easy!
So what about "bucket"? Why did I use the name probability bucket instead of probability frequency?
Well that is because each element of the output does not give you an exact frequency, but instead a range of frequencies.
More specifically, each bucket is has a range of:
i/N*SampleRate
Where i is the index of the output array, N is the number of elements in the input array, and SampleRate is the sampling frequency of the input waveform.
If you are super awesome at math and paying close attention to what that equation means, then you have already noticed that the interesting thing about the FT is that the more samples you feed into the algorithm the more precise each bucket gets (since N is getting larger).
The down side to this is that more samples means more processing time.
Which means you have to find a balance between accuracy and performance!
In
Pitch Me
I use a sample rate of 44.1 kHz (which is CD quality audio) and a sample size of 8192 (
the max Unity supports).
Which means each bucket can be accurate to 5.4 Hz. Cool, so is this good or bad? My gut says bad because 8192 is a pretty small buffer size, but I want something more than gut feeling.
According to a very
quick google search
with absolutely no fact checking, a trained ear can be accurate up to 0.1 Hz. So no, this is not accurate at all!
Lucky for me that does not matter. My game only needs to detect if a pitch is higher or lower than the last FT, and having a tolerance of 5 Hz is perfectly fine for that.
Look at the video below to see a line renderer I used to debug probability buckets.
You can easily see me raising and lowering the pitch of my voice as the peak goes up and down.
All the code can be found on Github with absolutely zero documentation :)
Since this was a game jam I also did not worry about writing "good code", so I am sure there is plenty of shortcuts done to get stuff working quick.
Regardless, if you are interested or have any questions feel free to shoot me an email at contact@toasterfuel.com