RealMAN

A Real-Recorded and Annotated Microphone Array Dataset

 

开源时间:2024年6月

 

 

       RealMAN 是一个专为动态语音增强与声源定位研究设计的多通道麦克风阵列数据集。数据集采用一个由高保真麦克风组成的 32通道阵列进行录制,并使用一个扬声器播放源语音信号。语音数据总时长达 83小时,其中静态说话人语音 48小时,移动说话人语音 35小时,录制场景覆盖 32种 不同环境。噪声数据总时长达 144小时,录制场景覆盖 31种 不同环境。语音与噪声的录制场景广泛涵盖了常见的室内、室外、半室外及交通环境。

       本数据集提供了两个关键标注以支持相关模型训练:

       方位角标注:通过全向鱼眼摄像头获取扬声器的方位角信息,用于训练声源定位网络。

       直达路径信号:通过估计的直达路径传播滤波器对播放的语音信号进行滤波得到,用于训练语音增强网络。

 

The RealMAN dataset is a multi-channel microphone array dataset for dynamic speech enhancement and localization. Specifically, a 32-channel array with high-fidelity microphones is used for recording. A loudspeaker is used for playing source speech signals. A total of 83-hour speech signals (48 hours for static speaker and 35 hours of moving speaker) are recorded in 32 different scenes, and 144 hours of background noise are recorded in 31 different scenes. Both speech and noise recording scenes cover various common indoor, outdoor, semi-outdoor and transportation environments. See Figure 1 for the recording devices. The azimuth angle of the loudspeaker is annotated with an omni-direction fisheye camera, and is used for the training of source localization networks. The direct-path signal is obtained by filtering the played speech signal with an estimated direct-path propagation filter, and is used for the training of speech enhancement networks.

数据下载

 

License: CC BY NC 4.0

Dataset

论 文

 

arxiv

基线系统

 

Recipe