blocks|key|1762033|text|shelve|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|1762034|1762035|目前，我正在使用python+pickle模块存储字典并从中加载，但它会将整个索引一次性带到内存中(或者是这样吗？)|blockquote|1762036|1762037|1762038|是的，它确实带来了所有的东西。|1762039|这有问题吗？如果这不是一个实际的问题，那就坚持下去。|1762040|如果这是一个问题，你有什么问题？太慢了？太快了？太多彩了？是否使用了太多内存？你有什么问题？|1762041|entityMap|0|LINK|mutability|MUTABLE|url|http://docs.python.org/library/shelve.html^0|0|6|0|0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|X|8|@]|9|@$A|Y|B|Z|1|10]]|C|$]]|$1|D|3|-4|5|6|7|11|8|@]|9|@]|C|$]]|$1|E|3|F|5|G|7|12|8|@]|9|@]|C|$]]|$1|H|3|-4|5|6|7|13|8|@]|9|@]|C|$]]|$1|I|3|-4|5|6|7|14|8|@]|9|@]|C|$]]|$1|J|3|K|5|6|7|15|8|@]|9|@]|C|$]]|$1|L|3|M|5|6|7|16|8|@]|9|@]|C|$]]|$1|N|3|O|5|6|7|17|8|@]|9|@]|C|$]]|$1|P|3|-4|5|6|7|18|8|@]|9|@]|C|$]]]|Q|$R|$5|S|T|U|C|$V|W]]]]

<a href="http://docs.python.org/library/shelve.html" rel="noreferrer">shelve</a>

<blockquote>
 At present I am storing the dictionary using python pickle module and loading from it but it brings the whole of index into memory at once (or does it?).
</blockquote>

Yes it does bring it all in.

Is that a problem? If it's not an actual problem, then stick with it.

If it's a problem, what kind of problem do you have? Too slow? Too fast? Too colorful? Too much memory used? What problem do you have?

blocks|key|1017014|text|我会使用Lucene。为什么要重新发明轮子呢？|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|1017015|entityMap|0|LINK|mutability|MUTABLE|url|http://lucene.apache.org/pylucene/^0|4|6|0|0^^$0|@$1|2|3|4|5|6|7|L|8|@]|9|@$A|M|B|N|1|O]]|C|$]]|$1|D|3|-4|5|6|7|P|8|@]|9|@]|C|$]]]|E|$F|$5|G|H|I|C|$J|K]]]]

I would use <a href="http://lucene.apache.org/pylucene/" rel="nofollow noreferrer">Lucene</a>. Why reinvent the wheel?

blocks|key|1764740|text|只需将其存储在如下所示的字符串中：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1764741|<entry1>,<entry2>,<entry3>,...,<entryN>|code-block|syntax|javascript|1764742|如果<entry*>包含'，‘字符，请使用其他分隔符，如'\t’。这比等效的酸洗字符串的大小要小。|offset|length|style|CODE|1764743|如果您想加载它，只需执行以下操作：|1764744|L+=+s.split(delimiter)|1764745|entityMap^0|0|0|2|8|0|0|0^^$0|@$1|2|3|4|5|6|7|S|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|T|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|U|8|@$I|V|J|W|K|L]]|9|@]|A|$]]|$1|M|3|N|5|6|7|X|8|@]|9|@]|A|$]]|$1|O|3|P|5|D|7|Y|8|@]|9|@]|A|$E|F]]|$1|Q|3|-4|5|6|7|Z|8|@]|9|@]|A|$]]]|R|$]]

Just store it in a string like this:

<pre><code>&lt;entry1&gt;,&lt;entry2&gt;,&lt;entry3&gt;,...,&lt;entryN&gt;
</code></pre>

If <code>&lt;entry*&gt;</code> contains ',' character, use some other delimiter like '\t'.
This is smaller in size than an equivalent pickled string.

If you want to load it, just do:

<pre><code>L = s.split(delimiter)
</code></pre>

blocks|key|2999203|text|您可以存储字典的repr()，并使用它重新创建字典。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2999204|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

You could store the repr() of the dictionary and use that to re-create it.

blocks|key|1764768|text|如果需要很长时间加载或使用太多内存，您可能需要一个数据库。您可能会用到很多方法；我可能会从SQLite开始。那么你的问题就是“简化”;-)简单地制定正确的查询来从数据库中获取所需的内容。这样，您将只加载所需的内容。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|1764769|entityMap|0|LINK|mutability|MUTABLE|url|http://docs.python.org/library/sqlite3.html^0|19|6|0|0^^$0|@$1|2|3|4|5|6|7|L|8|@]|9|@$A|M|B|N|1|O]]|C|$]]|$1|D|3|-4|5|6|7|P|8|@]|9|@]|C|$]]]|E|$F|$5|G|H|I|C|$J|K]]]]

If it's taking a long time to load or using too much memory, you might need a database. There are many you might use; I would probably start with <a href="http://docs.python.org/library/sqlite3.html" rel="nofollow noreferrer">SQLite</a>. Then your problem is "reduced" ;-) to simply formulating the right query to get what you need out of the database. This way you will only load what you need.

blocks|key|1762110|text|为此，我使用了anydmb。Anydbm提供了与字典类似的接口，只是它只允许字符串作为键和值。但这不是一个约束，因为您可以使用cPickle的加载/转储在索引中存储更复杂的结构。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1762111|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

I am using anydmb for that purpose. Anydbm provides the same dictionary-like interface, except it allow only strings as keys and values. But this is not a constraint since you can use cPickle's loads/dumps to store more complex structures in the index.

I am working on a project on Info Retrieval.
I have made a Full Inverted Index using Hadoop/Python. 
Hadoop outputs the index as (word,documentlist) pairs which are written on the file.
For a quick access, I have created a dictionary(hashtable) using the above file.
My question is, how do I store such an index on disk that also has quick access time.
At present I am storing the dictionary using python pickle module and loading from it
but it brings the whole of index into memory at once (or does it?). 
Please suggest an efficient way of storing and searching through the index.

My dictionary structure is as follows (using nested dictionaries)

{word : {doc1:[locations], doc2:[locations], ....}}

so that I can get the documents containing a word by
dictionary[word].keys() ... and so on.

Storing an inverted index

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我正在做一个关于信息检索的项目。我已经使用Hadoop/Python创建了一个完整的倒排索引。Hadoop将索引输出为(word，documentlist)对，并将其写入文件。为了快速访问，我使用上面的文件创建了一个字典(哈希表)。我的问题是，我如何在磁盘上存储这样的索引，同时又具有快速的访问时间。目前，我正在使用python pickle模块存储字典并从中加载，但它会立即将整个索引放入内存(或者

问存储倒排索引
EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问存储倒排索引EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问存储倒排索引
EN