AISHELL-DMASH 中文普通话麦克风阵列家居场景语音数据库

Distributed Microphone Arrays in Smart Home (DMASH) Dataset

开源时间:2022年4月

 

 

       AISHELL-DMASH数据集是在真实家庭场景下,两个不同房间采集完成。该数据集共包含30,000小时的语音数据。录制设备包括1个近讲麦克风和7个远讲点位,每个远讲点位包含一部iPhone、一部Android手机、一部iPad、一个麦克风,以及一个半径为5厘米的圆形麦克风阵列。本数据集共涵盖511名说话人,每位说话人分三次录制,相邻两次间隔为7至15天。所有语音数据均由专业语音标注员转写,并经过严格的质量审核流程,文本准确率达到98%。本数据集适用于说话人识别、语音识别、唤醒词识别等相关领域的研究与应用。

 

The AISHELL-DMASH dataset is recorded in real smart home scenarios with two different rooms. The dataset contains 30000 hours speech data. The recording devices include one close-talking microphone and seven groups of devices at seven different positions of the room. A group of recording devices include one iPhone, one Android phone, one iPad, one microphone, and one circular microphone array with a radius of 5cm. The dataset includes 511 speakers and each speaker visits three times with a gap of 7-15 days. AISHELL-DMASH dataset was transcribed by the professional speech annotators with high QA process, and the accuracy rate of word is 98%, which could be used in research of voiceprint recognition, speech recognition, wake-up words recognition and so on.

The FFSVC 2020 challenge is designed to boost the speaker verification research with special focus on far-field distributed microphone arrays under noisy conditions in real scenes. The objectives of this challenge are to: 1) benchmark the current speech verification technology under this challenging condition, 2) promote the development of new ideas and technologies in speaker verification, 3) provide an open, free, and large scale speech database to the community that exhibits the far-field characteristics in real scenes.

 

 

The FFSVC20 challenge dataset is part of the DMASH dataset. It includes the recordings from the close-talking microphone, the iPhone at 25cm distance, and three randomly selected circular microphone arrays. In FFSVC20, the training partition includes 120 speakers and the development partition includes 35 speakers. For each task, the evaluation data includes 80 speakers.

 

 

If you want to download full challenge data and trial files, please email to aishell.foundation@gmail.com. And please indicate "Apply for the FFSVC2020 Challenge data" on the email subject.

 

The setup of the recording environment.

数据样例

 

Sample

数据介绍

 

Readme

File Structure

训练集样例下载

 

Train/Dev Sets Sample

测试集样例下载

 

Test Set Sample

论 文

 

Paper

Non-Open Source

 

数据使用申请

 

Company:bd@aishelldata.com      

 

 

Academic Institution:

点击申请