文章/答案/技术大牛

发布

社区首页 >问答首页 >使用加速框架，没有明显的加速

问使用加速框架，没有明显的加速
EN

Stack Overflow用户

提问于 2015-02-26 01:07:55

回答 1查看 172关注 0票数 1

我有下面的音频代码，我认为这将是在加速框架中使用vDSP的一个很好的候选。

// --- get pointers for buffer lists
float* left = (float*)audio->mBuffers[0].mData;
float* right = numChans == 2 ? (float*)audio->mBuffers[1].mData : NULL;

float dLeftAccum = 0.0;
float dRightAccum = 0.0;

float fMix = 0.25; // -12dB HR per note

// --- the frame processing loop
for(UInt32 frame=0; frame<inNumberFrames; ++frame)
{
    // --- zero out for each trip through loop
    dLeftAccum = 0.0;
    dRightAccum = 0.0;
    float dLeft = 0.0;
    float dRight = 0.0;

    // --- synthesize and accumulate each note's sample
    for(int i=0; i<MAX_VOICES; i++)
    {
        // --- render
        if(m_pVoiceArray[i]) 
            m_pVoiceArray[i]->doVoice(dLeft, dRight);

        // --- accumulate and scale
        dLeftAccum += fMix*(float)dLeft;
        dRightAccum += fMix*(float)dRight;

    }

    // --- accumulate in output buffers
    // --- mono
    left[frame] = (float)dLeftAccum;

    // --- stereo
    if(right) right[frame] = (float)dRightAccum;
}

// needed???
//  mAbsoluteSampleFrame += inNumberFrames;

return noErr;

因此，我将其修改为使用vDSP，在帧块的末尾乘以fMix。

// --- the frame processing loop
for(UInt32 frame=0; frame<inNumberFrames; ++frame)
{
    // --- zero out for each trip through loop
    dLeftAccum = 0.0;
    dRightAccum = 0.0;
    float dLeft = 0.0;
    float dRight = 0.0;

    // --- synthesize and accumulate each note's sample
    for(int i=0; i<MAX_VOICES; i++)
    {
        // --- render
        if(m_pVoiceArray[i]) 
            m_pVoiceArray[i]->doVoice(dLeft, dRight);

        // --- accumulate and scale
        dLeftAccum += (float)dLeft;
        dRightAccum += (float)dRight;

    }

    // --- accumulate in output buffers
    // --- mono
    left[frame] = (float)dLeftAccum;

    // --- stereo
    if(right) right[frame] = (float)dRightAccum;
}
vDSP_vsmul(left, 1, &fMix, left, 1, inNumberFrames);
vDSP_vsmul(right, 1, &fMix, right, 1, inNumberFrames);
// needed???
//  mAbsoluteSampleFrame += inNumberFrames;

return noErr;

但是，我的CPU使用率仍然保持不变。在这里，我看不出使用vDSP有什么好处。我做得对吗？非常感谢。

对向量运算还不熟悉，请给我简单一点:)

如果有一些明显的优化，我应该做(加速框架之外)，请随时向我指出，谢谢！

audio

accelerate-framework

vdsp

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-02-26 04:10:49

你的向量调用是在音频采样率下，每个样本执行两个乘法。如果你的采样率是192千赫，那么你说的仅仅是每秒384000倍--不足以在现代CPU上注册。而且，您正在将现有的乘法器移到另一个地方。如果您查看生成的程序集，我猜编译器对原始代码进行了很好的优化，vDSP调用中的任何速度都将被需要第二个循环这一事实所抵消。

另外要注意的是，当向量数据在16字节的边界上对齐时，所有的vDSP函数都会更好地工作。如果您查看一下SSE2指令集(我相信vDSP会大量使用它)，您将看到许多指令都有一个版本用于对齐数据，另一个版本用于未对齐数据。

gcc的数据对齐方式如下：

float inVector[8] = {1, 2, 3, 4, 5, 6, 7, 8} __attribute__ ((aligned(16)));

或者，如果要在堆上分配，请查看aligned_malloc是否可用。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/28732771

复制

相似问题

问使用加速框架，没有明显的加速
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用加速框架，没有明显的加速EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用加速框架，没有明显的加速
EN