我正在使用IFilter来索引一些MS Office文档。从文件加载是可以的,所有工作都很好,就像在所有手册和示例中一样:
HRESULT hr_f = LoadIFilter(filename, 0, (void **)&pFilter);
但是,使用BindIFilterFromStream接口失败了,我不知道如何正确使用它。
HRESULT hr_ss = BindIFilterFromStream(spStream/*my IStream* impl*/, 0, (void **)&pFilter);
我实现了IStream接口,只有在初始化期间调用的方法(IUnknown除外)是:
HRESULT StreamFilter::Stat(STATSTG * pstatstg, DWORD grfStatFlag)
{
//Microsoft Office Ifilter from Windows Registry
const IID CLSID_IFilter = {
0xf07f3920,
0x7b8c,
0x11cf,
{ 0x9b, 0xe8, 0x00, 0xaa, 0x00, 0x4b, 0x99, 0x86 }
//{f07f3920-7b8c-11cf-9be8-00aa004b9986}
};
LARGE_INTEGER pSize;
int fl = GetFileSizeEx(_hFile, &pSize);
memset(pstatstg, 0, sizeof(STATSTG));
pstatstg->clsid = CLSID_IFilter;
pstatstg->type = STGTY_STREAM;
pstatstg->cbSize.QuadPart = pSize.QuadPart;
return S_OK;
}在那之后,hr_ss是E_FAIL,IFilter是NULL。
有case Using IFilter in C#,这些方法也只适用于c++中的*.pdf,但不适用于MSO文档...
发布于 2016-06-05 19:13:29
我想出了如何正确初始化IFilter,下面是代码:
HRESULT hr = LoadIFilter(L".doc", 0, (void **)&pFilter);
IPersistStream *stream;
HRESULT hr_qi = pFilter->QueryInterface(&stream);
std::ifstream ifs(filename, ios::binary);
std::string content((std::istreambuf_iterator<char>(ifs)),
(std::istreambuf_iterator<char>()));
IStream *comStream;
HGLOBAL hMem = ::GlobalAlloc(GMEM_MOVEABLE, content.size());
LPVOID pDoc = ::GlobalLock(hMem);
memcpy(pDoc, content.c_str(), content.size());
::GlobalUnlock(hMem);
HRESULT hr_mem = ::CreateStreamOnHGlobal(hMem, true, &comStream);
HRESULT hr_stream_load = stream->Load(comStream);从文档中获取文本就像从MSDN中获取通常的示例一样
if (SUCCEEDED(hr))
{
DWORD flags = 0;
HRESULT hr = pFilter->Init(IFILTER_INIT_INDEXING_ONLY |
IFILTER_INIT_APPLY_INDEX_ATTRIBUTES |
IFILTER_INIT_APPLY_CRAWL_ATTRIBUTES |
IFILTER_INIT_FILTER_OWNED_VALUE_OK |
IFILTER_INIT_APPLY_OTHER_ATTRIBUTES,
0, 0, &flags);
if (FAILED(hr))
{
pFilter->Release();
throw exception("IFilter::Init() failed");
}
Start();
STAT_CHUNK stat;
while (SUCCEEDED(hr = pFilter->GetChunk(&stat)))
{
if ((stat.flags & CHUNK_TEXT) != 0)
ProcessTextChunk(pFilter, stat);
if ((stat.flags & CHUNK_VALUE) != 0)
ProcessValueChunk(pFilter, stat);
}
Finish();
pFilter->Release();
}
else
{
throw exception("LoadIFilter() failed");
}你不需要实现你自己的IStream,只需要从你的缓冲区初始化它就可以了。
https://stackoverflow.com/questions/37630598
复制相似问题