Kong is known as the world’s most popular API gateway. It’s lightweight, fast, and flexible.

I have chosen and been using it in production for a long time besides these other ones as Tyk, KrakenD,… because it’s easy to scale, easy to extend and have a nice RESTful Admin API.

It seems everything is good but in fact, I met a lot of challenges, it made me spend many time to debug and fix it.

Last year, after updating Kong to the version 2.5.0, I got some reports about active health checks weren’t working properly when users updated upstream or add/remove targets.

I reported this issue to Kong team in #7652 then tried to debug it.

After a lot of debug time, I found the issue was in healthcheck.lua

...

-- checker objects (weak) table
local hcs = setmetatable({}, {
  __mode = "v",
})

...

expire = function()
  self:renew_periodic_lock()
  local cur_time = ngx_now()
  for _, checker_obj in ipairs(hcs) do
    if checker_obj.checks.active.healthy.active and
      (checker_obj.checks.active.healthy.last_run +
      checker_obj.checks.active.healthy.interval <= cur_time)
    then
      checker_obj.checks.active.healthy.last_run = cur_time
      checker_callback(checker_obj, "healthy")
    end

    if checker_obj.checks.active.unhealthy.active and
      (checker_obj.checks.active.unhealthy.last_run +
      checker_obj.checks.active.unhealthy.interval <= cur_time)
    then
      checker_obj.checks.active.unhealthy.last_run = cur_time
      checker_callback(checker_obj, "unhealthy")
    end
  end
end,

...

table.insert(hcs, self)

...

Kong health check core is using a weak table hcs to store upstreams information. When we update upstream information or add/remove targets, Kong will insert new information into hcs table => old upstream information marked as weak referrences and it will be removed by Lua GC. It means when old upstream removed, we have a gap (nil) in table (array).

In code above, we are using for with ipairs to iterate over a weak table. ipairs is mostly used for numeric tables, non numeric keys in an array are ignored. This means iteration is end at the first nil value.

We have another way to iterate over a table is by using pairs, it iterates over all elements in table, even if they aren’t positive whole numbers or in order.

To fix this issue, just simple change from ipairs to pairs.

for _, checker_obj in pairs(hcs) do

After applying this fix to our Kong clusters, active health checks start working properly again.

I created a pull request (#78) to lua-resty-healthcheck but seems no one of Maintainer reviews it. :(

Update 2022-03-30: Finally, the issue has been fixed (#93).

Referrences: