Skip to content
This repository was archived by the owner on Aug 28, 2024. It is now read-only.

Commit f5578a4

Browse files
committed
updated README, Podfile and script to use with torch 1.9.0 and torchaudio 0.9.0 for the wav2vec2 SpeechRecognition app
1 parent 60a26e6 commit f5578a4

File tree

3 files changed

+28
-17
lines changed

3 files changed

+28
-17
lines changed

SpeechRecognition/Podfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,5 @@ target 'SpeechRecognition' do
66
use_frameworks!
77

88
# Pods for SpeechRecognition
9-
pod 'LibTorch', '~>1.8.0'
9+
pod 'LibTorch', '~>1.9.0'
1010
end

SpeechRecognition/README.md

Lines changed: 26 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,52 +2,63 @@
22

33
## Introduction
44

5-
Facebook AI's [wav2vec 2.0](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec) is one of the leading models in speech recognition. It is also available in the [Huggingface Transformers](https://github.com/huggingface/transformers) library, which is also used in another PyTorch iOS demo app for [Question Answering](https://github.com/pytorch/ios-demo-app/tree/master/QuestionAnswering).
5+
Facebook AI's [wav2vec 2.0](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec) is one of the leading models in speech recognition. It is also available in the [Hugging Face Transformers](https://github.com/huggingface/transformers) library, which is also used in another PyTorch iOS demo app for [Question Answering](https://github.com/pytorch/ios-demo-app/tree/master/QuestionAnswering).
66

7-
In this demo app, we'll show how to quantize, trace, and optimize the wav2vec2 model for mobile and how to use the converted model on an iOS demo app to perform speech recognition.
7+
In this demo app, we'll show how to quantize, trace, and optimize the wav2vec2 model, powered by the newly released torchaudio 0.9.0, and how to use the converted model on an iOS demo app to perform speech recognition.
88

99
## Prerequisites
1010

11-
* PyTorch 1.9 (Optional)
11+
* PyTorch 1.9.0 and torchaudio 0.9.0 (Optional)
1212
* Python 3.8 or above (Optional)
13-
* iOS PyTorch pod library 1.9
14-
* Xcode 12 or later
13+
* iOS PyTorch Cocoapods library LibTorch 1.9.0
14+
* Xcode 12.4 or later
1515

1616
## Quick Start
1717

18-
### 1. Prepare the Model
18+
### 1. Get the Repo
19+
20+
Simply run the commands below:
1921

20-
First, run the following commands on a Terminal:
2122
```
2223
git clone https://github.com/pytorch/ios-demo-app
2324
cd ios-demo-app/SpeechRecognition
2425
```
2526

26-
If you don't have PyTorch 1.9 installed or want to have a quick try of the demo app, you can download the quantized scripted wav2vec2 model file [here](https://drive.google.com/file/d/1RcCy3K3gDVN2Nun5IIdDbpIDbrKD-XVw/view?usp=sharing), then drag and drop to the project, and continue to Step 2.
27+
If you don't have PyTorch 1.9.0 and torchaudio 0.9.0 installed or want to have a quick try of the demo app, you can download the quantized scripted wav2vec2 model file [here](https://drive.google.com/file/d/1RcCy3K3gDVN2Nun5IIdDbpIDbrKD-XVw/view?usp=sharing), then drag and drop to the project, and continue to Step 3.
28+
29+
Be aware that the downloadable model file was created with PyTorch 1.9.0 and torchaudio 0.9.0, matching the iOS LibTorch library 1.9.0 specified in the `Podfile`. If you use a different version of PyTorch to create your model by following the instructions below, make sure you specify the same iOS LibTorch version in the `Podfile` to avoid possible errors caused by the version mismatch. Furthermore, if you want to use the latest prototype features in the PyTorch master branch to create the model, follow the steps at [Building PyTorch iOS Libraries from Source](https://pytorch.org/mobile/ios/#build-pytorch-ios-libraries-from-source) on how to use the model in iOS.
30+
2731

28-
Be aware that the downloadable model file was created with PyTorch 1.9 (and torchaudio 0.9), matching the iOS LibTorch library 1.9 specified in the `Podfile`. If you use a different version of PyTorch to create your model by following the instructions below, make sure you specify the same iOS LibTorch version in the `Podfile` to avoid possible errors caused by the version mismatch. Furthermore, if you want to use the latest prototype features in the PyTorch master branch to create the model, follow the steps at [Building PyTorch iOS Libraries from Source](https://pytorch.org/mobile/ios/#build-pytorch-ios-libraries-from-source) on how to use the model in iOS.
32+
### 2. Prepare the Model
33+
34+
To install PyTorch 1.9.0 and torchvision 0.10.0, you can do something like this:
35+
36+
```
37+
conda create -n wav2vec2 python=3.8.5
38+
conda activate wav2vec2
39+
pip install torch torchvision
40+
```
41+
42+
Now with PyTorch 1.9.0 and torchaudio 0.9.0 installed, run the following commands on a Terminal:
2943

30-
With PyTorch 1.9 and torchaudio 0.9 installed, run the following commands on a Terminal:
3144
```
32-
git clone https://github.com/pytorch/ios-demo-app
33-
cd ios-demo-app/SpeechRecognition
3445
python create_wav2vec2.py
3546
```
36-
This will create the model file `wav2vec2.pt`.
47+
48+
This will create the model file `wav2vec2.pt` and save to the `SpeechRecognition` folder.
3749

3850
### 2. Use LibTorch
3951

4052
Run the commands below:
4153

4254
```
43-
cd SpeechRecognition
4455
pod install
4556
open SpeechRecognition.xcworkspace/
4657
```
4758

4859
### 3. Build and run with Xcode
4960

50-
After the app runs, tap the Start button and start saying something; after 6 seconds, the model will infer to recognize your speech. Only basic decoding of the recognition result, in the form of an array of floating numbers of logits, to a list of tokens is provided in this demo app, but it is easy to see, without further post-processing, whether the model can recognize your utterances. Some example results are as follows:
61+
After the app runs, tap the Start button and start saying something; after 12 seconds (you can change `private let AUDIO_LEN_IN_SECOND = 12` in `ViewController.swift` for the recording length), the model will infer to recognize your speech. Some example results are as follows:
5162

5263
![](screenshot1.png)
5364
![](screenshot2.png)

SpeechRecognition/create_wav2vec2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,4 +62,4 @@ def forward(self, waveforms: Tensor) -> str:
6262
print(waveform.size())
6363
print('Result:', optimized_model(waveform))
6464

65-
optimized_model.save("wav2vec2.pt")
65+
optimized_model.save("SpeechRecognition/wav2vec2.pt")

0 commit comments

Comments
 (0)