You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/hardware/06.nicla/boards/nicla-voice/tutorials/getting-started-ml/content.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ software:
21
21
22
22
The Arduino® Nicla Voice runs audio inputs through the powerful Syntiant NDP120 Neural Decision processor, which mimics human neural pathways to run multiple AI algorithms and automate complex tasks. In other words, it recognizes different events and hears keywords simultaneously. It is capable of understanding and learning its surrounding sounds.
23
23
24
-
To make use of these keyword triggers, such as blinking the LED when it recognizes a specific word, a machine learning model is required. It is possible with Edge Impulse®to build, train these machine learning models, and easily deploy the model to the Nicla Voice board. This tutorial will explain how to start with the board, test the default built-in sketch, and create your own models.
24
+
To make use of these keyword triggers, such as blinking the LED when the board recognizes a specific word, a machine learning model is required. With Edge Impulse®, it is possible to build, train and easily deploy the machine learning model to the Nicla Voice. This tutorial will explain how to start with the board, test the default built-in sketch, and create your own models.
25
25
26
26

27
27
@@ -54,13 +54,13 @@ To train a machine learning model to classify audio, we first need to feed it wi
54
54
55
55
### What Is Overfitting and How to Avoid It
56
56
57
-
If a machine learning model is overfitting, it means that it is too well geared toward your training data and won't perform well with unseen input data. This is a common pitfall in machine learning. You need some variation in the training dataset and adjust the parameters so that it doesn't just learn all input data by heart, making the classification based on that as you rather want the model to learn the concept of an object or sound.
57
+
If a machine learning model is overfitting, it means that it is too well geared toward your training data and it will not perform well with unseen input data. This is a common pitfall in machine learning. You need some variation in the training dataset and adjust the parameters, so that the model does not just learn all input data by heart, making the classification based on that, but it will rather learn the concept that makes up an object or a sound.
58
58
59
59
Finding the proper configuration for your application often requires trial and error. Edge Impulse® shows in [this article](https://docs.edgeimpulse.com/docs/tips-and-tricks/increasing-model-performance) how to improve poorly performing machine learning models.
60
60
61
61
### Creating a Custom Edge Impulse® Model
62
62
63
-
With the Nicla Voice, it is possible to train your own models for voice recognition and use them with the board. This will allow the Nicla Voice to detect words or phrases based on your recordings. First, if you do not already have an Arduino Cloud account, please go [here and create one](https://cloud.arduino.cc/home/). You can then access Edge Impulse® via the Arduino Cloud, as shown in the image below.
63
+
With the Nicla Voice, it is possible to train your own models for voice recognition and use them with the board. This will allow the Nicla Voice to detect words or phrases based on your recordings. First, if you do not already have an Arduino Cloud account, please go [here and create one](https://cloud.arduino.cc/home/). You can then access Edge Impulse® via the Arduino Cloud, as shown in the image below. Otherwise, if you already have an Edge Impulse® account, you can directly login from the [Edge Impulse website®](https://studio.edgeimpulse.com/login).
64
64
65
65

66
66
@@ -78,13 +78,13 @@ On the data acquisition page, press the "Let's collect some data" button. Now se
78
78
79
79

80
80
81
-
Scan the QR code with your phone and it will automatically connect. Set the options as shown below and you are ready to start recording audio for the Machine Learning model. On your phone select the option for recording audio and give the appropriate permissions, there should now be a button on the screen that says "Start recording". Before recording, set the label of the recordings to match the phrase you want to have recognized, this will make it easier to sort the data later.
81
+
Scan the QR code with your phone and it will automatically connect. Set the options as shown below and you are ready to start recording audio for the Machine Learning model. On your phone, select the option for recording audio and give the appropriate permissions. There should now be a button on the screen that says "Start recording". Before recording, set the label of the recordings to match the phrase you want to have recognized, this will simplify the data sorting.
When a recording is made on the phone, it will automatically show up on the webpage. First start by recording around five minutes of the phrase you want to have recognized, for this tutorial "Ciao Nicla" will be used. Try to vary the distance from the microphone, the pronunciation and the inflection when speaking the phrase to give the model a wider definition of the phrase that should be recognized.
85
+
When a recording is made on the phone, it will automatically show up on the webpage. First start by recording around five minutes of the phrase you want to have recognized, for this tutorial "Ciao Nicla" will be used. Try to vary the distance from the microphone, the pronunciation and the inflection when speaking the phrase to give the model a wider definition of the phrase that should be recognized. If you want to make the model more accurate have multiple people recording the same phrase, going through the same process of varying the distance from the microphone, the pronunciation and the inflection.
86
86
87
-
Once this is done, record for another five minutes of random words that are not the desired phrase and set the label for these recordings as "unknown". This will help with the training of the model later. And to give the model a better understanding of what sounds not to recognize as the trigger, also record five minutes of background and ambient noise. Set the label of these recordings as "noise". The more data collected, the better the model can be trained to recognize the phrase required. Feel free to collect as many of these three different categories as needed.
87
+
Once this is done, record another five minutes of random words that are not the desired phrase and set the label for these recordings as "unknown". This will help you with the training of the model later. And, to give the model a better understanding of what sounds not to recognize as the trigger, also record five minutes of background and ambient noise. Set the label of these recordings as "noise". The more data collected, the better the model can be trained to recognize the required sentence. Feel free to collect as many of these three different categories as needed.
88
88
89
89

90
90
@@ -94,6 +94,8 @@ Make sure to have a good training/test data split ratio of around 80/20. The tes
94
94
95
95
### Create an Impulse
96
96
97
+
Now that we acquired the data samples, we can move on to designing the Impulse. In a nutshell, an Impulse is a pipeline that the model will use for training and it consists of an input block, a processing block and a learning block. The input block indicates the type of data being used in the model, which will be audio in this case. The processing block extracts meaningful features from your data. The Audio Syntiant processing block we are using in this tutorial extracts time and frequency features from the audio used in the model. The learning block uses a neural network classifier that will take the input data and the audio that was captured in the previous step and provides a probability that indicates how likely it is that the input data belongs to a particular class as its output.
98
+
97
99
Now that we have the data samples, we can move on to designing the Impulse. An Impulse is in a nutshell the pipeline that the model will use for training. Consisting of an input block, processing block and a learning block. The input block indicates the type of data being used in the model, which will be audio in this case. The processing block extracts meaningful features from your data. The Audio Syntiant processing block we are using in this tutorial extracts time and frequency features from the audio used in the model. The learning block uses a neural network classifier that will take the input data, the audio that was captured in the previous step, then give us a probability that indicates how likely it is that the input data belongs to a particular class.
98
100
99
101
In the menu navigate to "Create Impulse" under "Impulse Design" and add an Audio processing block, which will be "Syntiant" in this case, as well as a Classification block. The page should now look like the image below.
@@ -104,7 +106,7 @@ Under "Impulse Design" go to the "Syntiant" page. In the "Parameters" settings t
104
106
105
107

106
108
107
-
Now select the "Generate features" tab on the "Syntiant" page. On this page press the green "Generate features" button. If you have collected a total of fifteen minutes of data as suggested in the previous step, this will take some time to complete. Now a visualization of the data can be seen on the right. Here you can easily see if the different classes of data collected separate into clear groups in respect to their different classes, this can help you figure out if the desired phrase will be easily differentiated from the noise and random words recorded.
109
+
Now select the "Generate features" tab on the "Syntiant" page. On this page, press the green "Generate features" button. If you have collected a total of fifteen minutes of data as suggested in the previous step, this will take some time to complete. Now the data can be visualized on the right. Here you can easily see if the different classes of collected data are separated into clear groups with respect to their different classes; this can help you figure out if the desired phrase will be easily differentiated from the noise and random words recorded.
108
110
109
111

110
112
@@ -136,7 +138,7 @@ To make it easy to flash any Machine Learning model created with Edge Impulse®
136
138
137
139
### Uploading the Model
138
140
139
-
Now that everything needed for flashing the firmware and the model to the Nicla Voice is installed, we can finally flash the board with our model. Extract the files that were packed into the .zip file received from Edge Impulse® when the model was built into a folder. Now run the "flash" file that corresponds with the OS on the machine you are using. As shown in this list:
141
+
Now that everything needed for flashing the firmware and the model to the Nicla Voice is installed, you can finally flash the board with your model. Extract the files that were packed into the .zip file received from Edge Impulse® when the model was built into a folder. At this point, run the right "flash" file for your OS on the machine you are using, as reported in this list:
140
142
141
143
- Use **flash_windows.bat** if you are using a PC
142
144
- Use **flash_mac.command** if you are using a MAC
0 commit comments