Android開發之聲網即時通訊與訊飛語音識別相結合
聲網是一家提供語音、視訊即時通訊服務的公司,他的服務大多基於WebRTC開源專案並進行一些優化和修改。而訊飛語音識別應該不用多說了,老羅在釋出會上介紹得已經夠詳細了。
那麼下面進入今天的主題,就是讓聲網和訊飛識別同時使用,之前可能有朋友沒遇到過這樣的需求,那先說一下讓兩者同時使用會出現啥問題,為什麼要做修改呢?其實原因很簡單,即時通訊過程中毫無疑問肯定會用到麥克風和揚聲器的,而語音識別呢,麥克風當然也是必須的了,好,那問題來了,同時有兩個地方需要呼叫麥克風,Android系統到底要分配給誰呢?經測試,這問題對於Android5.0和5.1一點問題都沒有,他們好像對麥克風這個硬體資源進行了抽象和封裝,所有呼叫者其實拿的都是實際音訊流的一份拷貝。但是其他系統一旦同時使用這兩者,就肯定會報出AudioRecord -38的錯誤,而且每次都是訊飛識別報出,因為聲網每次啟動通訊時都會把麥克風資源給搶了。。。好,既然這樣,我們就得另闢蹊徑了。
經過思考,由於訊飛提供自定義音訊源的方式,因此我們決定從改變訊飛音訊源的方式入手,但是由於聲網的加入通訊和退出通訊是隨時都可能發生的,因此,如果每次切換都要改變訊飛的配置,那麼兩者的耦合性太大了,如果以後音訊源不止原生AudioRecord和聲網,那麼又得修改訊飛了,這顯然是不符合軟體工程開發的思想的。所以我們最後決定用釋出/訂閱者模式進行設計,首先弄一個manager管理所有訂閱者和當前釋出者,這裡釋出和訂閱者之間的關係顯然是1對多的,因此訂閱者是一個列表,而釋出者就應該是一個成員物件。然後定義釋出者和訂閱者兩者的介面,其中釋出者的介面就應該包括開啟錄音和關閉錄音,而訂閱者的介面就更簡單,通知有音訊源到來就行。廢話不再多說,先上程式碼。
可以從上面程式碼中看到,該管理還維護了一個內部的音訊源釋出者,其實就是原生的AudioRecord,這樣外部也不需要知道沒有聲網介入時音訊流從何而來了。OK,下面可以通過看一下這個XLAudioRecord瞭解釋出者是怎麼實現的。public class XLAudioRecordManager { private static XLAudioRecordManager instance = null; private List<XLAudioRecordSubscriberCallback> subscribors = new ArrayList<>(); private final static String TAG = XLAudioRecordManager.class.getSimpleName(); private XLAudioRecord internalAudioPublisher; // 內部的音訊提供者 private XLAudioRecordPublisherCallback curPublisher; // 只需要一個釋出者 public void setCurPublisher(XLAudioRecordPublisherCallback curPublisher) { this.curPublisher = curPublisher; } public void initCurPublisher() { curPublisher = internalAudioPublisher; } private XLAudioRecordManager() { internalAudioPublisher = new XLAudioRecord(); initCurPublisher(); } public static XLAudioRecordManager getInstance() { if (instance == null) { instance = new XLAudioRecordManager(); } return instance; } public void writeAudio(byte[] audioBuffer, int offset, int length) { for (XLAudioRecordSubscriberCallback callback : subscribors) { callback.onAudio(audioBuffer, offset, length); } } public void subscribe(XLAudioRecordSubscriberCallback callback) { this.subscribors.add(callback); } public void unSubscribe(XLAudioRecordSubscriberCallback callback) { this.subscribors.remove(callback); } // 訂閱者介面 public interface XLAudioRecordSubscriberCallback { void onAudio(byte[] audioData, int offset, int length); } // 釋出者介面 public interface XLAudioRecordPublisherCallback { void onStartRecording(); void onStopRecording(); } public void startRecording() { // 通知釋出者開始採集音訊流 curPublisher.onStartRecording(); } public void stopRecording() { // 通知釋出者停止採集音訊流 curPublisher.onStopRecording(); } }
public class XLAudioRecord implements XLAudioRecordManager.XLAudioRecordPublisherCallback {
private AudioRecord mAudioRecord = null;
private boolean isRecording = false; // 判斷AudioRecord是否需要開啟
public static final int SAMPLE_RATE = 16000;
private int mMinbuffer = 1000;
public static final int CHANNEL_CONFIG = AudioFormat.CHANNEL_IN_MONO;
public static final int AUDIO_FORMAT = AudioFormat.ENCODING_PCM_16BIT;
private final static String TAG = XLAudioRecord.class.getSimpleName();
// private AcousticEchoCanceler acousticEchoCanceler;
public XLAudioRecord() {
mMinbuffer = AudioRecord.getMinBufferSize(SAMPLE_RATE, CHANNEL_CONFIG, AUDIO_FORMAT);
if (mMinbuffer != AudioRecord.ERROR_BAD_VALUE) {
// initAudioRecord();
// acousticEchoCanceler = AcousticEchoCanceler.create(mAudioRecord.getAudioSessionId());
// acousticEchoCanceler.setEnabled(true);
} else {
Log.e(TAG, "AudioRecord getMinBuffer error");
}
}
private void initAudioRecord() {
int trytimes = 0;
while (true) {
try {
mAudioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, SAMPLE_RATE,
CHANNEL_CONFIG, AUDIO_FORMAT, mMinbuffer);
if (mAudioRecord.getRecordingState() == AudioRecord.STATE_INITIALIZED) {
break;
}
} catch (Exception e) {
e.printStackTrace();
}
if (trytimes >= 5) {
Log.e(TAG, "AudioRecord initialize error");
break;
}
trytimes++;
}
}
@Override
public void onStartRecording() {
isRecording = true;
new Thread(new Runnable() {
@Override
public void run() {
initAudioRecord();
mAudioRecord.startRecording();
while (isRecording) {
byte[] audioData = new byte[mMinbuffer];
int bufferSize = mAudioRecord.read(audioData, 0, mMinbuffer);
try {
Thread.sleep(40);
} catch (InterruptedException e) {
e.printStackTrace();
}
XLAudioRecordManager.getInstance().writeAudio(audioData, 0, bufferSize);
}
if (mAudioRecord != null && mAudioRecord.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING) {
mAudioRecord.stop();
mAudioRecord.release();
mAudioRecord = null;
}
}
}).start();
}
@Override
public void onStopRecording() {
isRecording = false;
}
}
注意一下,onStopRecording中不能直接stop AudioRecord,而是將錄音迴圈停止,使錄音迴圈作為一個原子操作。
接下來,看一下聲網的這個釋出者是如何接入的,我們需要設定rtcengine的AudioFrame引數。
mRtcEngine.setRecordingAudioFrameParameters(SAMPLE_RATE, 1, 0, 1024);
mRtcEngine.registerAudioFrameObserver(audioFrameObserver);
其中AudioFrameObserver定義如下: private IAudioFrameObserver audioFrameObserver = new IAudioFrameObserver() {
@Override
public boolean onRecordFrame(byte[] bytes, int i, int i1, int i2, int i3) {
if (isListening) {
XLAudioRecordManager.getInstance().writeAudio(bytes, 0, bytes.length);
}
return true;
}
@Override
public boolean onPlaybackFrame(byte[] bytes, int i, int i1, int i2, int i3) {
return false;
}
};
還有,釋出者介面的實現如下:
@Override
public void onStartRecording() {
isListening = true;
}
@Override
public void onStopRecording() {
isListening = false;
}
最後,介紹一下訂閱者訊飛的實現了:
public class IFlyRecognizer extends RecognizerAdapter implements XLAudioRecordManager.XLAudioRecordSubscriberCallback {
private com.iflytek.cloud.RecognizerListener recognizerListener;
private SpeechRecognizer speechRecognizer;
private String userAudioPath = null;
private static Context mContext;
public IFlyRecognizer(Context context) {
mContext = context;
XLAudioRecordManager.getInstance().subscribe(this);
speechRecognizer = SpeechRecognizer.createRecognizer(context, null);
//2.設定聽寫引數
speechRecognizer.setParameter(SpeechConstant.DOMAIN, "iat");
speechRecognizer.setParameter(SpeechConstant.LANGUAGE, "zh_cn");
speechRecognizer.setParameter(SpeechConstant.ACCENT, "mandarin");
speechRecognizer.setParameter(SpeechConstant.PARAMS, null);
speechRecognizer.setParameter(SpeechConstant.SAMPLE_RATE, "16000");
//設定返回多個結果
speechRecognizer.setParameter(SpeechConstant.ASR_NBEST, "5");
// 設定語音前端點:靜音超時時間,即使用者多長時間不說話則當做超時處理
speechRecognizer.setParameter(SpeechConstant.VAD_BOS, "8000");
// 設定語音後端點:後端點靜音檢測時間,即使用者停止說話多長時間內即認為不再輸入, 自動停止錄音
speechRecognizer.setParameter(SpeechConstant.VAD_EOS, "1000");
speechRecognizer.setParameter(SpeechConstant.ASR_PTT, "0");
speechRecognizer.setParameter(SpeechConstant.AUDIO_SOURCE, "-1");
}
@Override
public void setRecognizerListener(RecognizerListener listener) {
this.recognizerListener = new IFlyRecognizerListener(listener);
}
@Override
public void startRecognize() {
//如果獲取使用者ID失敗,則不保存錄音檔案
// if (!Config.USER_ID.equals("")) {
// speechRecognizer.setParameter(SpeechConstant.AUDIO_FORMAT, "wav");
// speechRecognizer.setParameter(SpeechConstant.ASR_AUDIO_PATH, getAudioPathName());
// }
XLAudioRecordManager.getInstance().startRecording();
speechRecognizer.startListening(recognizerListener);
if (recognizerListener != null) {
recognizerListener.onBeginOfSpeech();
}
}
@Override
public void stopRecognize() {
speechRecognizer.stopListening();
XLAudioRecordManager.getInstance().stopRecording();
}
@Override
public void cancelRecognize() {
speechRecognizer.cancel();
XLAudioRecordManager.getInstance().stopRecording();
}
@Override
public void onAudio(byte[] audioData, int offset, int length) {
int res = speechRecognizer.writeAudio(audioData, offset, length);
// if (res == ErrorCode.SUCCESS) {
// Log.e("IFlyRecognizer", "寫入成功");
// } else {
// Log.e("IFlyRecognizer", "寫入失敗");
// }
}
}
可以看到,初始化AUDIO_SOURCE時要設定為-1,這樣才可以在onAudio中writeAudio到訊飛的Recognizer中。
好了,聲網與訊飛的結合工作差不多講完了,真心覺得當初學的設計模式對現在的程式碼編寫有潛移默化的作用,希望對大家有所幫助吧。