Pgpool-II: empty parameters in failover.sh

Recently I was asked to implement a HA solution with automatic fail-over for a PostgreSQL 9.3 database.I decided to use repmgr for easier creation ,monitoring & controlling of replication between two PostgreSQL instances and Pgpool-II  to trigger automatic fail-over in case it is needed.

Pgpool-II was setup in case of failure  to call  default script 

#!/bin/sh -x
failed_node=$1
new_master=$2
(
date
echo "Failed node: $failed_node "
  /usr/bin/ssh -T -l postgres $new_master "/opt/postgres/9.3/bin/repmgr -f /opt/postgres/repmgr/repmgr.conf standby promote 2>/dev/null 1>/dev/null <&-"
) 2>&1 | tee -a /opt/postgres/pgpool/log/pgpool_failover.log

Passing along as parameters  node id & new master hostname  in order to promote the slave database to new master.

While testing  the setup I noticed in the logs that new master’s hostname was empty, as  below.

Thu Sep 21 11:36:18 EEST 2017
Failed node: 1
+ /usr/bin/ssh -T postgres@ ‘/app/postgres/9.3/bin/repmgr -f /app/postgres/repmgr/repmgr.conf standby promote 2>/dev/null 1>/dev/null <&-‘
ssh: Could not resolve hostname : Name or service not known^M
+ exit 0
Thu Sep 21 11:39:09 EEST 2017
Failed node:
+ /usr/bin/ssh -T postgres@ ‘/app/postgres/9.3/bin/repmgr -f /app/postgres/repmgr/repmgr.conf standby promote 2>/dev/null 1>/dev/null <&-‘
ssh: Could not resolve hostname : Name or service not known^M
+ exit 0

When I was trying to troubleshoot this, I came up to Bug #133 where someone mentions that by installing debug libraries solved this issue.I tried this workaround but didn’t work in my case.I had to come up with a solution  as the  automatic fail-over was  a key requirement in this set up. I modified  Pgpool-II to pass only node id as a parameter, which worked anyway, and do the hostname handling for the fail-over within the script.

So my fail-over script now looks like this

 

#!/bin/sh -x
failed_node=$1
WRONG_NODE_ID_EXIT_STATUS = 99
(
date
echo "Failed node: $failed_node "
if [[ $failed_node = 1 ]];
then /usr/bin/ssh -T -l postgres dbrepltst1 "/opt/postgres/9.3/bin/repmgr -f /opt/postgres/repmgr/repmgr.conf standby promote 2&amp;amp;gt;/dev/null 1&amp;amp;gt;/dev/null &amp;amp;lt;&amp;amp;amp;-"
elif [[ $failed_node = 0 ]];
then /usr/bin/ssh -T -l postgres dbrepltst2 "/opt/postgres/9.3/bin/repmgr -f /opt/postgres/repmgr/repmgr.conf standby promote 2&amp;amp;gt;/dev/null 1&amp;amp;gt;/dev/null &amp;amp;lt;&amp;amp;amp;-"
else
echo "Unknown failed_node id"
exit $WRONG_NODE_ID_EXIT_STATUS
fi
) 2&amp;amp;gt;&amp;amp;amp;1 | tee -a /opt/postgres/pgpool/log/pgpool_failover.log

And now fail-over works as expected

Thu Sep 21 14:24:56 EEST 2017
+ echo ‘Failed node: 0 ‘
Failed node: 0
+ [[ 0 = 1 ]] + [[ 0 = 0 ]] + /usr/bin/ssh -T -l postgres jiradbrepltst2 ‘/app/postgres/9.3/bin/repmgr -f /app/postgres/repmgr/repmgr.conf standby promote 2>/dev/null 1>/dev/null <&-‘
+ date
Thu Sep 21 16:18:38 EEST 2017
+ echo ‘Failed node: 1 ‘
Failed node: 1
+ [[ 1 = 1 ]] + /usr/bin/ssh -T -l postgres jiradbrepltst1 ‘/app/postgres/9.3/bin/repmgr -f /app/postgres/repmgr/repmgr.conf standby promote 2>/dev/null 1>/dev/null <&-‘
+ date

Leave a Reply

Your email address will not be published. Required fields are marked *