Using monit to monitor totalmem for delayed_job processes in rails app
I have monit set up to keep an eye on delayed_job processes running to
ensure they are not using up too much memory. This was initially set up to
restart if the process went over 700MB so the relevant part of the monitrc
file looked like this.
check process delayed_job_worker
with pidfile /path/to/pidfile.pid
start program = "/usr/bin/dj start" with timeout 60 seconds
stop program = "/usr/bin/dj stop" with timeout 60 seconds
if totalmem is greater than 700 MB then restart
This was working as expected but the delayed_job was not removed so after
the worker was restarted another one would come along and pick up the job,
delayed_job handles this but the job would be worked a few times using up
resources so I want to remove the delayed job in this case. My solution
looks something like this.
check process delayed_job_worker
with pidfile /path/to/pidfile.pid
start program = "/usr/bin/dj start" with timeout 60 seconds
stop program = "/usr/bin/dj stop" with timeout 60 seconds
if totalmem is greater than 700 MB then
exec "/path/to/script/kill_and_remove_dj /path/to/pidfile.pid"
So now the script is executed, this script gets the pid from the pidfile,
kills it and runs a ruby script through rails runner to find the
delayed_job from the pid and removes it from the database. This all seems
to be working as expected, however what I notice when i do ps aux | grep
kill_and_remove_dj after creating a job I know will hit the memory limit
is that the command is being executed over and over again.
My understanding is that totalmem in this case will be the memory of the
process from the pid and all child processes. When the exec runs the
script it would be under a completely separate process. Just looking for
any pointers or info from anyone who has an idea on what could be causing
this problem. Happy to provide more info if required.
No comments:
Post a Comment