我正在用XAudio2制作一个音频播放器。我们以640字节的数据包传输数据,采样率为8000 We,采样深度为16字节。我们正在使用SlimDX访问XAudio2。
但在播放声音时,我们注意到音质很差。例如,这是使用Audacity捕获的3 3KHz正弦曲线。

我已经将音频播放器压缩到最基本的部分,但音频质量仍然很差。这是XAudio2、SlimDX还是我的代码中的错误,或者这只是一个从8 8KHz到44.1 8KHz时出现的工件?最后一个似乎不合理,因为我们还生成了PCM wav文件,这些文件可以通过Windows Media Player完美地播放。
下面是生成破碎正弦的基本实现。
public partial class MainWindow : Window
{
private XAudio2 device = new XAudio2();
private WaveFormatExtensible format = new WaveFormatExtensible();
private SourceVoice sourceVoice = null;
private MasteringVoice masteringVoice = null;
private Guid KSDATAFORMAT_SUBTYPE_PCM = new Guid("00000001-0000-0010-8000-00aa00389b71");
private AutoResetEvent BufferReady = new AutoResetEvent(false);
private PlayBufferPool PlayBuffers = new PlayBufferPool();
public MainWindow()
{
InitializeComponent();
Closing += OnClosing;
format.Channels = 1;
format.BitsPerSample = 16;
format.FormatTag = WaveFormatTag.Extensible;
format.BlockAlignment = (short)(format.Channels * (format.BitsPerSample / 8));
format.SamplesPerSecond = 8000;
format.AverageBytesPerSecond = format.SamplesPerSecond * format.BlockAlignment;
format.SubFormat = KSDATAFORMAT_SUBTYPE_PCM;
}
private void OnClosing(object sender, CancelEventArgs cancelEventArgs)
{
sourceVoice.Stop();
sourceVoice.Dispose();
masteringVoice.Dispose();
PlayBuffers.Dispose();
}
private void button_Click(object sender, RoutedEventArgs e)
{
masteringVoice = new MasteringVoice(device);
PlayBuffer buffer = PlayBuffers.NextBuffer();
GenerateSine(buffer.Buffer);
buffer.AudioBuffer.AudioBytes = 640;
sourceVoice = new SourceVoice(device, format, VoiceFlags.None, 8);
sourceVoice.BufferStart += new EventHandler<ContextEventArgs>(sourceVoice_BufferStart);
sourceVoice.BufferEnd += new EventHandler<ContextEventArgs>(sourceVoice_BufferEnd);
sourceVoice.SubmitSourceBuffer(buffer.AudioBuffer);
sourceVoice.Start();
}
private void sourceVoice_BufferEnd(object sender, ContextEventArgs e)
{
BufferReady.Set();
}
private void sourceVoice_BufferStart(object sender, ContextEventArgs e)
{
BufferReady.WaitOne(1000);
PlayBuffer nextBuffer = PlayBuffers.NextBuffer();
nextBuffer.DataStream.Position = 0;
nextBuffer.AudioBuffer.AudioBytes = 640;
GenerateSine(nextBuffer.Buffer);
Result r = sourceVoice.SubmitSourceBuffer(nextBuffer.AudioBuffer);
}
private void GenerateSine(byte[] buffer)
{
double sampleRate = 8000.0;
double amplitude = 0.25 * short.MaxValue;
double frequency = 3000.0;
for (int n = 0; n < buffer.Length / 2; n++)
{
short[] s = { (short)(amplitude * Math.Sin((2 * Math.PI * n * frequency) / sampleRate)) };
Buffer.BlockCopy(s, 0, buffer, n * 2, 2);
}
}
}
public class PlayBuffer : IDisposable
{
#region Private variables
private IntPtr BufferPtr;
private GCHandle BufferHandle;
#endregion
#region Constructors
public PlayBuffer()
{
Index = 0;
Buffer = new byte[640 * 4]; // 640 = 30ms
BufferHandle = GCHandle.Alloc(this.Buffer, GCHandleType.Pinned);
BufferPtr = new IntPtr(BufferHandle.AddrOfPinnedObject().ToInt32());
DataStream = new DataStream(BufferPtr, 640 * 4, true, false);
AudioBuffer = new AudioBuffer();
AudioBuffer.AudioData = DataStream;
}
public PlayBuffer(int index)
: this()
{
Index = index;
}
#endregion
#region Destructor
~PlayBuffer()
{
Dispose();
}
#endregion
#region Properties
protected int Index { get; private set; }
public byte[] Buffer { get; private set; }
public DataStream DataStream { get; private set; }
public AudioBuffer AudioBuffer { get; private set; }
#endregion
#region Public functions
public void Dispose()
{
if (AudioBuffer != null)
{
AudioBuffer.Dispose();
AudioBuffer = null;
}
if (DataStream != null)
{
DataStream.Dispose();
DataStream = null;
}
}
#endregion
}
public class PlayBufferPool : IDisposable
{
#region Private variables
private int _currentIndex = -1;
private PlayBuffer[] _buffers = new PlayBuffer[2];
#endregion
#region Constructors
public PlayBufferPool()
{
for (int i = 0; i < 2; i++)
Buffers[i] = new PlayBuffer(i);
}
#endregion
#region Desctructor
~PlayBufferPool()
{
Dispose();
}
#endregion
#region Properties
protected int CurrentIndex
{
get { return _currentIndex; }
set { _currentIndex = value; }
}
protected PlayBuffer[] Buffers
{
get { return _buffers; }
set { _buffers = value; }
}
#endregion
#region Public functions
public void Dispose()
{
for (int i = 0; i < Buffers.Length; i++)
{
if (Buffers[i] == null)
continue;
Buffers[i].Dispose();
Buffers[i] = null;
}
}
public PlayBuffer NextBuffer()
{
CurrentIndex = (CurrentIndex + 1) % Buffers.Length;
return Buffers[CurrentIndex];
}
#endregion
}一些额外的细节:
用于重放录制的语音,压缩方式有ALAW、µLAW、TrueSpeech等。数据以小包的形式发送,解码后发送给该播放器。这就是为什么我们使用如此低的采样率和如此小的缓冲区的原因。然而,我们的数据没有问题,因为用数据生成WAV文件会导致WMP或VLC的完美重播。
编辑:我们现在通过在NAudio中重写播放器来“解决”这个问题。我仍然对任何关于这里发生的事情的输入感兴趣。这是我们在PlayBuffers中的方法,还是仅仅是DirectX中的错误/限制,或者包装器?我尝试使用SharpDX而不是SlimDX,但这并没有改变结果。
发布于 2012-09-26 21:29:22
看起来上采样似乎是在没有适当的反走样(重建)滤波器的情况下完成的。截止频率太高(高于原始奈奎斯特频率),因此保留了大量混叠,导致在8000 Hz的采样之间产生类似于分段线性插值的输出。
尽管您的所有不同选项都在进行从8 8kHz到44.1 8kHz的上变频,但它们实现上变频的方式很重要,而且一个库做得很好并不能证明上变频不是另一个库中错误的来源。
发布于 2012-09-26 18:05:01
我已经有一段时间没有处理声音和频率了,但我记得:你有一个8000 It的采样率,你想要3000 It的正弦频率。所以在1秒内,你有8000个样本,在这一秒内,你想要你的正弦振荡3000次。这低于奈奎斯特频率(采样率的一半),但仅略低于奈奎斯特频率(参见Nyquist–Shannon sampling theorem)。所以我不期望这里有好的质量。
事实上:单步执行GenerateSine-method,您将看到s[0]将包含值0、5792、-8191、5792、0、-5792、8191、-5792、0、5792……
然而,这并不能解释你记录的奇怪的正弦波,而且我不确定人的耳朵需要多少样本才能听到“好的”正弦波。
https://stackoverflow.com/questions/12258495
复制相似问题