自从我们从ColdFusion 8企业升级到ColdFusion 9企业之后,我们一直在处理事件网关的问题。
我们有一个事件网关设置来建立到第三方的连接。他们至少每10秒给我们更新一次,有时每秒钟更新很多次。
我们有一个配置为事件网关侦听器的Java类,它将事件和数据推送给CFC函数。在这个函数中,我们实际上使用了一个命名的<cflock>来确保按顺序处理请求,并且请求将在此时排队以获得对命名锁的独占访问权。这个锁有30秒的超时时间。
我还在这个函数中进行了大量调试,并且注意到了以下几点:
<cflock>标记前排队,等待锁,此队列可能超过40个事件。所以问题是,有时,我会有一个锁定超时,而且是在30秒之后。首先,如果请求正在等待锁,则进行日志记录。看起来是这样的:
"Information","Thread-23","06/23/10","15:45:18","APP1","F4B42A5A-F34D-C614-DE01B150E6906F78 (1277304318606) : PRE LOCK"然后在日志中,我看到了同样的请求:
"Error","Thread-23","06/23/10","15:45:48","APP1","F4B42A5A-F34D-C614-DE01B150E6906F78 (1277304348607) : LOCK ERROR: A timeout occurred while attempting to lock lock_ResponseDispatcher."他们之间还有30秒的时间。此时将丢失与其相关的请求和任何事件数据。对我不好。
所以我想我应该看看排队的速度是否足够快。我不确定<cflock>是如何排队的。有严格的限制吗?
不管怎么说,在这次比赛中,我看到了:
我不敢相信我所看到的!这就好像<cflock>队列不是以先入先出(FIFO)的方式处理的,而是以先入先出(FILO)来处理的!
这样的事情有可能吗?还有其他人见过这种行为吗?
事先向任何有想法的人表示感谢。
恰兰
发布于 2010-06-25 11:13:40
我认为这里的关键是,我使用的是异步的事件网关。事实上,在进行了一个实验之后,我的问题的原因似乎很明显:)
我已经指定了用于在CF管理中处理事件网关请求的线程数量(参见事件网关->设置下的内容)。此设置在上停留在1,但在Enterprise中可以增加。为了这个实验,我把它增加到了5。只有当它增加时,你才会看到这种奇怪的行为。
所以我的事件网关代码非常简单,它只创建一个UUID (这样我就可以在日志中跟踪请求),然后锁定线程5秒来模拟实际处理。这个sleep发生在一个cflock调用中,所以只有一个线程可以同时进行处理。我们需要这样做,以避免在实际代码中处理重复项。
以下是CFC的全部内容:
component {
public void function onIncomingMessage (required struct msg) {
var sys = createObject("java", "java.lang.System");
var tag = createUUID();
var logFile = "test\gatewaytest";
writelog (file=logFile, text="#tag# - about to queue");
try {
lock name="myTestLock" timeout="120" {
writelog (file=logFile, text="#tag# - got lock");
thread action="sleep" duration="5000"; //ms
}
writelog (file=logFile, text="#tag# - released lock");
} catch (any e) {
writelog (file=logFile, text="#tag# - ERROR - #e.message#");
}
}
}注意锁上很长的超时值(2分钟)。这是为了处理事件网关异步处理所带来的问题。
事件网关是一个简单的内置CFML类型,ID为'TestGW‘,我将其链接到上面的CFC。
我设置了一个简单的脚本将事件发送到事件网关,这里是完整的:
<cfset msg = {mymessage = "hello gateway"} />
<cfset sendGatewayMessage("TestGW", msg) />场景1-单线程:
如果事件网关处理线程的数量设置为1,并且我敲击网关,则会看到以下日志输出:
"Information","Thread-17","06/25/10","10:32:09",,"50805BB4-1C23-9073-67A70A86CA6F8E54 - about to queue"
"Information","Thread-17","06/25/10","10:32:09",,"50805BB4-1C23-9073-67A70A86CA6F8E54 - got lock"
"Information","Thread-17","06/25/10","10:32:14",,"50805BB4-1C23-9073-67A70A86CA6F8E54 - released lock"
"Information","Thread-17","06/25/10","10:32:14",,"50811F1A-1C23-9073-67AD3E9C0BF2000C - about to queue"
"Information","Thread-17","06/25/10","10:32:14",,"50811F1A-1C23-9073-67AD3E9C0BF2000C - got lock"
"Information","Thread-17","06/25/10","10:32:19",,"50811F1A-1C23-9073-67AD3E9C0BF2000C - released lock"
"Information","Thread-17","06/25/10","10:32:19",,"5081E27F-1C23-9073-67B5D2EF6AED8426 - about to queue"
"Information","Thread-17","06/25/10","10:32:19",,"5081E27F-1C23-9073-67B5D2EF6AED8426 - got lock"
"Information","Thread-17","06/25/10","10:32:24",,"5081E27F-1C23-9073-67B5D2EF6AED8426 - released lock"
"Information","Thread-17","06/25/10","10:32:24",,"5082A5E1-1C23-9073-674E9467F395686F - about to queue"
"Information","Thread-17","06/25/10","10:32:24",,"5082A5E1-1C23-9073-674E9467F395686F - got lock"
"Information","Thread-17","06/25/10","10:32:29",,"5082A5E1-1C23-9073-674E9467F395686F - released lock"这里要注意的关键是它是单线程的。这都是一次一个事件排队的问题,每件事都是按顺序发生的。
场景2-更多线程:
如果事件网关处理线程的数量增加到5,并且我敲击网关,则会看到以下日志输出:
"Information","Thread-18","06/25/10","11:26:01",,"526CC05B-C9E1-FADE-73CE3426BC0A3F92 - about to queue"
"Information","Thread-18","06/25/10","11:26:01",,"526CC05B-C9E1-FADE-73CE3426BC0A3F92 - got lock"
"Information","Thread-27","06/25/10","11:26:01",,"526CD0EB-049E-D382-2C3A7E3C0DBF8ED3 - about to queue"
"Information","Thread-21","06/25/10","11:26:02",,"526CDEED-C2B3-3C92-0F57CFA317AC02F8 - about to queue"
"Information","Thread-20","06/25/10","11:26:02",,"526CEE25-F25B-890C-F7501B5489C6BB21 - about to queue"
"Information","Thread-25","06/25/10","11:26:02",,"526CFD3C-EAFD-40E7-EBA2BE59B87D5936 - about to queue"
"Information","Thread-24","06/25/10","11:26:03",,"526D0FC5-E5E2-642E-452636C8838ADE33 - about to queue"
"Information","Thread-26","06/25/10","11:26:03",,"526D1096-C82E-535B-36D57D3A431D1436 - about to queue"
"Information","Thread-23","06/25/10","11:26:03",,"526D1F9C-9A9C-FA84-E153A944123E77BE - about to queue"
"Information","Thread-19","06/25/10","11:26:04",,"526D2EDC-EA54-4D83-3F6BB681A5CCAA89 - about to queue"
"Information","Thread-22","06/25/10","11:26:04",,"526D3F09-073F-2B0C-E94652D1C95B09CB - about to queue"
"Information","Thread-18","06/25/10","11:26:06",,"526CC05B-C9E1-FADE-73CE3426BC0A3F92 - released lock"
"Information","Thread-22","06/25/10","11:26:06",,"526D3F09-073F-2B0C-E94652D1C95B09CB - got lock"
"Information","Thread-22","06/25/10","11:26:11",,"526D3F09-073F-2B0C-E94652D1C95B09CB - released lock"
"Information","Thread-27","06/25/10","11:26:11",,"526CD0EB-049E-D382-2C3A7E3C0DBF8ED3 - got lock"
"Information","Thread-27","06/25/10","11:26:16",,"526CD0EB-049E-D382-2C3A7E3C0DBF8ED3 - released lock"
"Information","Thread-19","06/25/10","11:26:16",,"526D2EDC-EA54-4D83-3F6BB681A5CCAA89 - got lock"
"Information","Thread-19","06/25/10","11:26:21",,"526D2EDC-EA54-4D83-3F6BB681A5CCAA89 - released lock"特别注意UUID 526D3F09-073F-2B0C-E94652D1C95B09CB的请求。这是最后记录的请求,队列末尾也是如此。Hoever,一旦锁可用,就会跳转并抓住锁--而不是第一个在那里的526CD0EB-049E-D382-2C3A7E3C0DBF8ED3请求。
结论:
因为当我们使用多个线程时,我们不能保证在使用事件网关时等待cflock时线程的处理顺序。我们需要确保锁的超时值足够高,以便在繁忙时间,在任何一个请求超过锁定超时之前,都可以完整地处理事件队列。
我想这可能为我们在多线程事件网关中使用cflock提供了正确的服务!
我希望这能对其他经历过这个问题的人有所帮助。
干杯,恰兰。
发布于 2010-06-23 18:18:04
我不知道该如何看待你们的FIFO与LIFO问题,但我可以对此提出建议:
他们之间还有30秒的时间。此时将丢失与其相关的请求和任何事件数据。对我不好。
CFLock标记有一个名为throwOnTimeout的属性,该属性默认为true。如果将其设置为false,而不是在超时时抛出异常,处理过程将跳过锁定的代码块并继续正常运行。你可以利用这个来发挥你的优势,就像这样:
<cfset made_it_through_lock = false />
<cflock name="single_threaded_lock_name" throwOnTimeout="false">
<!--- ... do stuff ... --->
<cfset made_it_through_lock = true />
</cflock>
<cfif made_it_through_lock eq false>
<!---
log the event data that you don't want to lose, then abort,
setting the necessary http status code & headers
--->
</cfif>https://stackoverflow.com/questions/3103572
复制相似问题