AISHELL-DMASH
开 源 数 据 ,助 力 人 工 智 能 发 展
AISHELL-DMASH 中文普通话麦克风阵列家居场景语音数据库
Distributed Microphone Arrays in Smart Home (DMASH) Dataset
The AISHELL-DMASH dataset is recorded in real smart home scenarios with two different rooms. The dataset contains 30000 hours speech data. The recording devices include one close-talking microphone and seven groups of devices at seven different positions of the room. A group of recording devices include one iPhone, one Android phone, one iPad, one microphone, and one circular microphone array with a radius of 5cm. The dataset includes 511 speakers and each speaker visits three times with a gap of 7-15 days. AISHELL-DMASH dataset was transcribed by the professional speech annotators with high QA process, and the accuracy rate of word is 98%, which could be used in research of voiceprint recognition, speech recognition, wake-up words recognition and so on.
The FFSVC 2020 challenge is designed to boost the speaker verification research with special focus on far-field distributed microphone arrays under noisy conditions in real scenes. The objectives of this challenge are to: 1) benchmark the current speech verification technology under this challenging condition, 2) promote the development of new ideas and technologies in speaker verification, 3) provide an open, free, and large scale speech database to the community that exhibits the far-field characteristics in real scenes.
The FFSVC20 challenge dataset is part of the DMASH dataset. It includes the recordings from the close-talking microphone, the iPhone at 25cm distance, and three randomly selected circular microphone arrays. In FFSVC20, the training partition includes 120 speakers and the development partition includes 35 speakers. For each task, the evaluation data includes 80 speakers.
If you want to download full challenge data and trial files, please email to aishell.foundation@gmail.com. And please indicate "Apply for the FFSVC2020 Challenge data" on the email subject.
The setup of the recording environment.
数据样例
数据介绍
File Structure
训练集样例下载
测试集样例下载
论 文
Non-Open Source
数据使用申请 Company:bd@aishelldata.com
Service Application Academic Institution:aishell.foundation@gmail.com
微信公众号
联系我们
商务合作:bd@aishelldata.com
技术服务:tech@aishelldata.com
联系电话:+86-010-80225006
公司地址:
北京市海淀区中关村大街32号中关村智能制造创新中心4层4008室