AISHELL-6-B
开 源 数 据 ,助 力 人 工 智 能 发 展
开源时间:2022年4月
MDSC 数据集包含 18,630 条录音,共 17 小时。其中,来自非构音障碍说话人的录音有 10,125 条(时长为 7.6 小时),来自构音障碍说话人的录音有 8,505 条(时长为 9.4 小时)。数据采集自 21 名构音障碍说话人(12名女性,9名男性)和 25 名非构音障碍说话人(13名女性,12名男性)。参与的构音障碍说话人符合以下特征:
· 母语为普通话;
· 年龄分布广泛(18至48岁),性别均衡;
· 导致构音障碍的病因多样,包括脑性瘫痪、肝豆状核变性等。
录音内容包括:10 个唤醒词,每个词以不同速度重复 5 次;以及 355 个非唤醒词,涵盖固定指令词、自由指令词、家居指令及其他短语,单人不重复文本为295条。所有录音均在安静的室内环境中进行,采样率为 16kHz。录音时,参与者位于移动麦克风前方约 20cm 处。
MDSC includes 18,630 recordings totaling 17 hours, of which 10,125 are from non-dysarthric recordings (Control) totaling 7.6 hours, and 8,505 are from dysarthric recordings (Dysarthria) totaling 9.4 hours. We record utterances from 21 dysarthric (12 females, 9 males) and 25 non-dysarthric (13 females, 12 males) speakers. The participants with dysarthric speakers have the following characteristics:
• Native Mandarin speakers;
• Broad age distribution (from 18 to 48) and gender balance
• Diverse etiologies contribute to dysarthria, including cerebral palsy and hepatolenticular degeneration
The recordings consist of 10 wake-up words repeated five times at varying speeds. MDSC also includes 355 non-wake-up words, encompassing fixed command words, free command words, household instructions, and other phrases. The single person text list has 295 non-repeated sentences. The recordings, sampled at 16kHz, take place in a quiet indoor environment, with the participants positioned approximately 20cm away from the mobile microphone.
MDSC 中文构音障碍数据库
A Mandarin Dysarthria Speech Corpus
数据下载

基线系统

论 文

License: CC BY NC 4.0

LRDWWS 挑战赛旨在解决构音障碍人群的唤醒词检测任务,目标是推动该技术在现实应用中的使用落地。
本次挑战赛的数据使用 MDSC 数据库作为训练集和开发集,并新增录制了包含 20 位构音障碍发音人的测试集 MDSC-Eval。MDSC-Eval 包含 8,760 条录音,总时长 9 小时。其录制方式与 MDSC 数据库保持一致,区别在于每位发音人额外录制了 11 个负例词,每个负例词朗读 3 遍。详情请参阅以下链接:https://lrdwws.org/
The LRDWWS Challenge is designed to tackle the wake-up word spotting task for individuals with dysarthria, with the ultimate goal of facilitating broader integration in real-world applications.
The challenge data uses the MDSC database as the training and development sets, and a new test set with 20 dysarthric was recorded, named MDSC-Eval. MDSC-Eval includes 8,760 recordings totaling 9 hours. The recording method for the MDSC-Eval is consistent with the MDSC, with the difference being that single person in the set has 11 additional negative words, with each negative word read 3 times. You can refer this link for details: https://lrdwws.org/
测试集下载

训练集下载

验证集下载

测试集下载

微信公众号
联系我们
商务合作:bd@aishelldata.com
技术服务:tech@aishelldata.com
联系电话:+86-010-80225006
公司地址:
北京市海淀区海淀大悦信息科技园D5-A501
开源数据
