首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无法使用WAL日志并在持续运行时释放OS缓冲区。

无法使用WAL日志并在持续运行时释放OS缓冲区。
EN

Stack Overflow用户
提问于 2018-07-05 22:32:58
回答 1查看 357关注 0票数 0

无法使用WAL日志并在持续运行时释放OS缓冲区。

我有一台具有128 G内存的点火器服务器,并且启用了持久化以确保数据的安全。

正如我从正式文档中得到的,我的理解是:当Persitent启用时,Ignite将首先将数据更改保存到OS缓冲区中(我在linux命令free -mh中将其作为buff/cache检查),然后写入WAL日志,并通过检查点进程定期分析WAL日志,释放解析的WAL日志磁盘空间,并释放使用的OS缓冲区,如果我错了,请纠正我。

但是在我的测试中,当Ignite开始处理流量时,我发现OS缓冲区迅速增加并检查了WAL日志目录,有大量的wal日志按顺序生成,几乎与buff/cache的大小相同。

代码语言:javascript
复制
[root@Redis1 apache-ignite]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         14G        109G        995M        1.7G        109G
Swap:          127G          0B        127G
      127G

仅几分钟,空闲列就会迅速减少,而buff/cache却会增加。

代码语言:javascript
复制
[root@Redis1 apache-ignite]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         15G         85G        995M         25G        108G
Swap:          127G          0B        127G

而WAL日志的大小和分段数也在不断增加,与buff/cache的大小几乎相同。

我检查了点火器日志,检查点进程每3分钟进行一次审核:

代码语言:javascript
复制
[05:30:05,818][INFO][db-checkpoint-thread-#107][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=9428aebc-f2b0-4d33-bed6-fb9a1ad49848, startPtr=FileWALPointer [idx=341, fileOff=50223036, len=420491], checkpointLockWait=0ms, checkpointLockHoldTime=860ms, walCpRecordFsyncDuration=245ms, pages=89627, reason='timeout']
[05:30:22,429][INFO][db-checkpoint-thread-#107][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=9428aebc-f2b0-4d33-bed6-fb9a1ad49848, pages=89627, markPos=FileWALPointer [idx=341, fileOff=50223036, len=420491], walSegmentsCleared=0, markDuration=1288ms, pagesWrite=844ms, fsync=15767ms, total=17899ms]

但是对于“空闲-mh”命令的输出,“空闲”列不能释放,仍然随着流量的增加而增加,即使当我停止通信时,它也没有减少,如果保持发送流量,可用内存不断减少,最终可用内存减少到大约100 MegaBytes,

代码语言:javascript
复制
[root@Redis1 apache-ignite]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         16G        370M        971M        108G        107G
Swap:          127G          0B        127G

当这种情况发生(空闲内存耗尽?),我所有的服务基于点燃停止处理我的新请求,点火器,它挂。

我还注意到检查点日志中包含了理由=‘超时值’,我不知道这是否能正确解析WAL日志和空闲OS缓存缓冲区?有没有让检查点正常工作来释放记忆呢?

我的问题是,我如何才能做一些事情来防止点燃耗尽可用内存,并使我的服务持续开放,我发现如果我关闭持久性,很快点燃句柄,在相同流量下缓存使用少于1G,但当启用持久标志时,OS缓存内存迅速增加,耗尽所有可用的内存,然后点燃无法从这个状态恢复和挂起。

我尝试过许多参数,使用WALMODE、LOG_ONLY或后台、在JVM中设置-DIGNITE_WAL_MMAP=false、设置checkpointPageBufferSize,但它们都无法保存我的点火器服务,它仍然占用操作系统缓存并耗尽它。

https://apacheignite.readme.io/docs/write-ahead-log https://apacheignite.readme.io/docs/durable-memory-tuning#section-checkpointing-buffer-size

代码语言:javascript
复制
    <property name="dataStorageConfiguration">
        <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
            <property name="defaultDataRegionConfiguration">
                <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                    <!-- 10 GB initial size. -->
                    <property name="initialSize" value="#{10L * 1024 * 1024 * 1024}"/>
                    <!-- 50 GB maximum size. -->
                    <property name="maxSize" value="#{50L * 1024 * 1024 * 1024}"/>
                    <property name="persistenceEnabled" value="true"/>

                    <property name="checkpointPageBufferSize" value="#{1024L * 1024 * 1024}"/>
                </bean>
            </property>
          <property name="writeThrottlingEnabled" value="true"/>
          <property name="walMode" value="LOG_ONLY"/>
          <property name="walPath" value="/wal/ebc"/>
          <property name="walArchivePath" value="/wal/ebc"/>
        </bean>
    </property>

Bellow是我的缓存配置:

代码语言:javascript
复制
public void createLvOneTxCache() {

    CacheConfiguration<String, OrderInfo> cacheCfg =
            new CacheConfiguration<>("LvOneTxCache");

    cacheCfg.setCacheMode(CacheMode.REPLICATED);
    //cacheCfg.setStoreKeepBinary(true);
    cacheCfg.setAtomicityMode(ATOMIC);
    ebcLvOneTxCache = ignite.getOrCreateCache(cacheCfg);
}

我尝试修改参数,但是OS缓存仍然在增加:

代码语言:javascript
复制
    <!-- Enabling Apache Ignite native persistence. -->
    <property name="dataStorageConfiguration">
        <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
            <property name="defaultDataRegionConfiguration">
                <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                    <!-- 10 GB initial size. -->
                    <property name="initialSize" value="#{4L * 1024 * 1024 * 1024}"/>
                    <!-- 50 GB maximum size. -->
                    <property name="maxSize" value="#{4L * 1024 * 1024 * 1024}"/>
                    <property name="persistenceEnabled" value="true"/>

                    <property name="checkpointPageBufferSize" value="#{4L * 1024 * 1024 * 1024}"/>
                </bean>
            </property>
          <property name="checkpointFrequency" value="6000"/>
          <property name="checkpointThreads" value="32"/>
          <property name="writeThrottlingEnabled" value="true"/>
          <property name="walMode" value="LOG_ONLY"/>
          <property name="walPath" value="/wal/ebc"/>
          <property name="walArchivePath" value="/wal/ebc"/>
        </bean>
    </property>

并且快速启动日志显示审核,但是缓存也不会被释放。

代码语言:javascript
复制
[07:51:20,165][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=fd0c7e68-564a-4b40-9516-bb2a451869e7, startPtr=FileWALPointer [idx=23, fileOff=47849256, len=420491], checkpointLockWait=0ms, checkpointLockHoldTime=77ms, walCpRecordFsyncDuration=233ms, pages=7744, reason='timeout']
[07:51:20,219][INFO][sys-stripe-0-#1][PageMemoryImpl] Throttling is applied to page modifications [percentOfPartTime=0.36, markDirty=16378 pages/sec, checkpointWrite=3322 pages/sec, estIdealMarkDirty=673642 pages/sec, curDirty=0.00, maxDirty=0.40, avgParkTime=21501 ns, pages: (total=7744, evicted=0, written=7744, synced=229, cpBufUsed=0, cpBufTotal=1036430)]
[07:51:22,303][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=fd0c7e68-564a-4b40-9516-bb2a451869e7, pages=7744, markPos=FileWALPointer [idx=23, fileOff=47849256, len=420491], walSegmentsCleared=0, markDuration=317ms, pagesWrite=24ms, fsync=2114ms, total=2456ms]
[07:51:26,117][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=d64991bc-3d2f-4f2c-8175-d7e92f46f0bf, startPtr=FileWALPointer [idx=25, fileOff=35951286, len=420491], checkpointLockWait=0ms, checkpointLockHoldTime=49ms, walCpRecordFsyncDuration=200ms, pages=7605, reason='timeout']
[07:51:28,612][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=d64991bc-3d2f-4f2c-8175-d7e92f46f0bf, pages=7605, markPos=FileWALPointer [idx=25, fileOff=35951286, len=420491], walSegmentsCleared=0, markDuration=266ms, pagesWrite=23ms, fsync=2472ms, total=2761ms]
[07:51:32,118][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=07246861-57ae-4ef5-8419-cb7710d2f72d, startPtr=FileWALPointer [idx=27, fileOff=38042090, len=420491], checkpointLockWait=6ms, checkpointLockHoldTime=60ms, walCpRecordFsyncDuration=185ms, pages=7186, reason='timeout']
[07:51:32,121][INFO][service-#232][PageMemoryImpl] Throttling is applied to page modifications [percentOfPartTime=0.24, markDirty=10738 pages/sec, checkpointWrite=2757 pages/sec, estIdealMarkDirty=310976 pages/sec, curDirty=0.00, maxDirty=0.07, avgParkTime=358945 ns, pages: (total=7186, evicted=0, written=896, synced=0, cpBufUsed=565, cpBufTotal=1036430)]
[07:51:34,534][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=07246861-57ae-4ef5-8419-cb7710d2f72d, pages=7186, markPos=FileWALPointer [idx=27, fileOff=38042090, len=420491], walSegmentsCleared=0, markDuration=257ms, pagesWrite=29ms, fsync=2387ms, total=2679ms]
[07:51:38,169][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=44e6870a-e370-4bd3-8ad9-8252abb0acd3, startPtr=FileWALPointer [idx=29, fileOff=44462293, len=420491], checkpointLockWait=0ms, checkpointLockHoldTime=76ms, walCpRecordFsyncDuration=210ms, pages=7529, reason='timeout']
[07:51:40,668][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=44e6870a-e370-4bd3-8ad9-8252abb0acd3, pages=7529, markPos=FileWALPointer [idx=29, fileOff=44462293, len=420491], walSegmentsCleared=0, markDuration=303ms, pagesWrite=24ms, fsync=2475ms, total=2802ms]


[root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         14G        107G        995M        3.5G        109G
Swap:          127G          0B        127G
[root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         14G        107G        995M        3.5G        109G
Swap:          127G          0B        127G
[root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         14G        107G        995M        3.5G        109G
Swap:          127G          0B        127G
[root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         14G        105G        995M        5.6G        109G
Swap:          127G          0B        127G

当我停止更新缓存的流量时,我发现OS缓存恢复了,但速度非常慢,需要很长时间才能被释放,使用快速的checkpointFrequency 6s。如何才能迅速解决这一问题?

代码语言:javascript
复制
[root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         14G        104G        995M        6.5G        109G
Swap:          127G          0B        127G
[root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         14G        104G        995M        6.3G        109G
Swap:          127G          0B        127G
[root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         14G        104G        995M        6.3G        109G
Swap:          127G          0B        127G
[root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         14G        106G        995M        4.6G        109G
Swap:          127G          0B        127G
[root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         14G        106G        995M        4.4G        109G
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-07-06 07:27:11

操作系统缓存磁盘数据是完全可以的,这是非常好的解释在这里linux吃了我的ram。如果内核支持,则始终可以设置空闲内存量,这样可以减少Ignite分配新内存块时的暂停。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51200730

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档