pg_receivewal实践 2023-10-18 22:19 280阅读 0赞 ## [pg\_receivewal实践][pg_receivewal] ## [https://www.cnblogs.com/kuang17/p/10868556.html][pg_receivewal] **测试从****pg\_receivewal****的日志中恢复从库为主库:** 主从配置async模式,配置pg\_receivewal接收日志pg\_receivewal -D /dbaas/pg/data/pg\_receivewal\_data -v -h 10.9.10.202 主插入1000万数据,当插入一半时,停止从库 主库插完数据,停止 将pg\_receivewal的日志拷贝到从库/dbaas/pg/data/pg\_receivewal\_data下 修改从库recovery.conf文件,添加内容: restore\_command = 'cp /dbaas/pg/data/pg\_receivewal\_data/%f "%p"' 启动数据库,数据库成功恢复到10000000行。 另外一种用法: restore\_command = 'cp /dbaas/pg/data/pg\_receivewal\_data/%f "%p"' standby\_mode = on primary\_conninfo = 'host=10.9.10.203 port=5432 user=postgres connect\_timeout=60' recovery\_target\_timeline = 'latest' trigger\_file = '/dbaas/pg/data/.tfile' **这种把恢复和备库设置放一起,会先去从归档恢复,当恢复完成后,直接建立流复制关系,这样就相当方便了,当有归档时,不用重做从库。** **测试****pg\_receivewal****的同步机制:** \-S ***slotname*** \--slot=***slotname*** Require pg\_receivewal to use an existing replication slot (see [Section 26.2.6][]). When this option is used, pg\_receivewal will report a flush position to the server, indicating when each segment has been synchronized to disk so that the server can remove that segment if it is not otherwise needed. When the replication client of pg\_receivewal is configured on the server as a synchronous standby, then using a replication slot will report the flush position to the server, but only when a WAL file is closed. **Therefore, that configuration will cause transactions on the primary to wait for a long time and effectively not work satisfactorily.** The option --synchronous (see below) must be specified in addition to make this work correctly. \--synchronous Flush the WAL data to disk immediately after it has been received. Also send a status packet back to the server immediately after flushing, regardless of --status-interval. This option should be specified if the replication client of pg\_receivewal is configured on the server as a synchronous standby, to ensure that timely feedback is sent to the server. 在这个说明中,使用了--slot=for\_pgreceivewal --synchronous后,理论上是有点像sync同步一样,数据库会等待接收端进行回放,回放结束,再进行commit。 **1****)使****slot****和****synchronous****参数:** 数据库端: 修改postgresql.conf参数synchronous\_standby\_name=‘pg\_receivewal’ 重启数据库 postgres=\# select pg\_create\_physical\_replication\_slot('for\_pgreceivewal'); pg\_create\_physical\_replication\_slot \------------------------------------- (for\_pgreceivewal,) (1 row) postgres=\# select pg\_get\_replication\_slots(); pg\_get\_replication\_slots \---------------------------------------- (for\_pgreceivewal,,physical,,f,f,,,,,) (1 row) postgres=\# select pg\_get\_replication\_slots(); pg\_get\_replication\_slots \-------------------------------------------------------- (for\_pgreceivewal,,physical,,f,t,629488,,,0/9DD77B00,) (1 row) 接收端: pg\_receivewal -D /dbaas/pg/data/pg\_receivewal\_data --slot=for\_pgreceivewal --synchronous -v -h 10.9.10.202 数据库端: postgres=\# select \* from pg\_stat\_replication ; pid | usesysid | usename | application\_name | client\_addr | client\_hostname | client\_port | backend\_start | backend\_xmin | state | sent\_lsn | write\_lsn | flush\_lsn | replay\_lsn | write\_lag | flush\_lag | replay\_lag | sync\_priority | sync\_state \--------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------------+-----------------+-----------------+---------------+------------ 912703 | 10 | postgres | pg\_receivewal | 10.9.10.203 | | 63103 | 2019-05-15 02:52:30.029814+00 | | streaming | 0/D9E44528 | 0/D9E44528 | 0/D9E44528 | | 00:00:00.010972 | 00:00:00.010972 | 00:06:49.964729 | 1 | sync (1 row) **在数据库端,使用****pgbench****测试:** \[postgres@90027282-dd74-4642-b5dd-359b6cc67aaf /dbaas/pg/data\]$ pgbench -c 10 -j 5 -r -T 100 pgbench starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 100 query mode: simple number of clients: 10 number of threads: 5 duration: 100 s number of transactions actually processed: 15290 latency average = 65.426 ms tps = 152.845128 (including connections establishing) tps = 152.855112 (excluding connections establishing) statement latencies in milliseconds: 0.002 \\set aid random(1, 100000 \* :scale) 0.001 \\set bid random(1, 1 \* :scale) 0.001 \\set tid random(1, 10 \* :scale) 0.000 \\set delta random(-5000, 5000) 0.046 BEGIN; 0.300 UPDATE pgbench\_accounts SET abalance = abalance + :delta WHERE aid = :aid; 0.111 SELECT abalance FROM pgbench\_accounts WHERE aid = :aid; 0.417 UPDATE pgbench\_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; 2.878 UPDATE pgbench\_branches SET bbalance = bbalance + :delta WHERE bid = :bid; 0.092 INSERT INTO pgbench\_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT\_TIMESTAMP); 61.539 END; **2****)不使用****slot****和****synchronous****参数** **数据库端,删除****slot** postgres=\# select pg\_drop\_replication\_slot('for\_pgreceivewal'); pg\_drop\_replication\_slot \-------------------------- (1 row) postgres=\# select pg\_get\_replication\_slots(); pg\_get\_replication\_slots \-------------------------- (0 rows) **接收端:** pg\_receivewal -D /dbaas/pg/data/pg\_receivewal\_data -v -h 10.9.10.202 **压测:** \[postgres@90027282-dd74-4642-b5dd-359b6cc67aaf ~\]$ pgbench -c 10 -j 5 -r -T 100 pgbench starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 100 query mode: simple number of clients: 10 number of threads: 5 duration: 100 s number of transactions actually processed: 27015 latency average = 37.021 ms tps = 270.120226 (including connections establishing) tps = 270.138206 (excluding connections establishing) statement latencies in milliseconds: 0.002 \\set aid random(1, 100000 \* :scale) 0.001 \\set bid random(1, 1 \* :scale) 0.001 \\set tid random(1, 10 \* :scale) 0.000 \\set delta random(-5000, 5000) 0.042 BEGIN; 0.242 UPDATE pgbench\_accounts SET abalance = abalance + :delta WHERE aid = :aid; 0.110 SELECT abalance FROM pgbench\_accounts WHERE aid = :aid; 0.291 UPDATE pgbench\_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; 1.778 UPDATE pgbench\_branches SET bbalance = bbalance + :delta WHERE bid = :bid; 0.091 INSERT INTO pgbench\_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT\_TIMESTAMP); 34.396 END; **同步模式下,停止****pg\_receivewal****,会导致主****hang****住:** postgres=\# create table test(id int, info varchar(8), crt\_time timestamp); ^CCancel request sent WARNING: canceling wait for synchronous replication due to user request DETAIL: The transaction has already committed locally, but might not have been replicated to the standby. CREATE TABLE **pg\_receivewal****的使用场景思考:** 1)当配置高可用集群时,一个备库很危险,容易出现单点故障,当同步节点死掉后,主节点会hang住(这是没有专业高可用工具的情况下,ecox这种高可用工具,会自动处理为单节点可写)。而部署两个同步节点又会浪费资源(不做读写分离的情况下),这个时候,可以创建一个pg\_receivewal节点,使用同步模式,当一个同步节点停止后,它可以充当sync节点,给主节点报告LSN。 为了减少网络开销,还可以将pg\_receivewal节点设置为主库本地,即起到了归档作用,又起到了同步节点的功能。唯一注意的是,要定期清理日志空间。 2)第二种情况,数据库集群不使用sync模式,想使用更加高性能的async模式。这时,使用pg\_receivewal来做日志归档,至少比aync节点同步的日志要多。这样基于他的恢复,也将丢更少的数据。当数据库更新插入极端的大时,pg\_receivewal归档的日志,可能还没有归档命令归档的多。归档是到本地,如果服务器挂了,那么归档也没有了,可以使用pgbackrest或者rync工具将归档同步到其他服务器。 用更加精炼的话说: 1)避免单点故障。这点比较鸡肋,某些高级的高可用工具本身就能解决单点故障。 2)用来做日志归档,归档到块级别,且能归档到远端。比archive\_command要更加多(默认归档是完成一个wal文件才归档一个,还未写满的wal是不会归档的)。主从之间保持sync模式,那没必要用pg\_receivewal来归档;如果是async模式追求性能,pg\_receivewal也不能配成同步来影响性能啊(让步:将pg\_receivewal放到本地,这样减少网络传输开销,保持同步,能否接受,**答案是同步下性能还是很差,而且差距很大,**下面是测试数据)。 <table> <tbody> <tr> <td> <p>1</p> <p>2</p> <p>3</p> <p>4</p> <p>5</p> <p>6</p> <p>7</p> <p>8</p> </td> <td> <p><code>postgres=# </code><code>select</code> <code>* </code><code>from</code> <code>pg_stat_replication ;</code></p> <p><code>pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state</code></p> <p><code>--------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------------+-----------------+-----------------+---------------+------------</code></p> <p><code> </code><code>745643 | 10 | <strong>postgres | pg_receivewal | ::1 </strong> | | 58465 | 2019-05-16 03:53:06.628954+00 | | streaming | 1/CB059FC8 | 1/CB059FC8 | 1/CB059FC8 | | 00:00:00.191089 | 00:00:00.191089 | 00:00:32.187898 | 1 | <strong>sync</strong></code></p> <p><code> </code><code>668818 | 10 | postgres | walreceiver | 10.9.10.203 | | 42007 | 2019-05-16 03:37:21.983067+00 | | streaming | 1/CB059FC8 | 1/CB059FC8 | 1/CB059FC8 | 1/CB059FC8 | | | | 0 | async</code></p> <p><code>(2 rows)</code></p> <p> </p> <p><code>(END)</code></p> </td> </tr> </tbody> </table> 在本地同步性能: <table> <tbody> <tr> <td> <p>1</p> <p>2</p> <p>3</p> <p>4</p> <p>5</p> <p>6</p> <p>7</p> <p>8</p> <p>9</p> <p>10</p> <p>11</p> <p>12</p> <p>13</p> <p>14</p> <p>15</p> <p>16</p> <p>17</p> <p>18</p> <p>19</p> <p>20</p> <p>21</p> <p>22</p> <p>23</p> <p>24</p> </td> <td> <p><code>[postgres@90027282-dd74-4642-b5dd-359b6cc67aaf /dbaas/pg]$ pgbench -c 10 -j 5 -r -T 100 pgbench</code></p> <p><code>starting vacuum...end.</code></p> <p><code>transaction type: <builtin: TPC-B (sort of)></code></p> <p><code>scaling factor: 100</code></p> <p><code>query mode: simple</code></p> <p><code>number of clients: 10</code></p> <p><code>number of threads: 5</code></p> <p><code>duration: 100 s</code></p> <p><code>number of transactions actually processed: 17313</code></p> <p><code>latency average = 57.785 ms</code></p> <p><code>tps = 173.054591 (including connections establishing)</code></p> <p><code>tps = 173.066373 (excluding connections establishing)</code></p> <p><code>statement latencies </code><code>in</code> <code>milliseconds:</code></p> <p><code> </code><code>0.002 \</code><code>set</code> <code>aid random(1, 100000 * :scale)</code></p> <p><code> </code><code>0.001 \</code><code>set</code> <code>bid random(1, 1 * :scale)</code></p> <p><code> </code><code>0.001 \</code><code>set</code> <code>tid random(1, 10 * :scale)</code></p> <p><code> </code><code>0.000 \</code><code>set</code> <code>delta random(-5000, 5000)</code></p> <p><code> </code><code>0.048 BEGIN;</code></p> <p><code> </code><code>0.268 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;</code></p> <p><code> </code><code>0.111 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;</code></p> <p><code> </code><code>0.457 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;</code></p> <p><code> </code><code>2.694 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;</code></p> <p><code> </code><code>0.092 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);</code></p> <p><code> </code><code>53.995 END;</code></p> </td> </tr> </tbody> </table> 异地同步性能更低: <table> <tbody> <tr> <td> <p>1</p> <p>2</p> <p>3</p> <p>4</p> <p>5</p> <p>6</p> <p>7</p> <p>8</p> <p>9</p> <p>10</p> <p>11</p> <p>12</p> <p>13</p> <p>14</p> <p>15</p> <p>16</p> <p>17</p> <p>18</p> <p>19</p> <p>20</p> <p>21</p> <p>22</p> <p>23</p> <p>24</p> </td> <td> <p><code>[postgres@90027282-dd74-4642-b5dd-359b6cc67aaf /dbaas/pg]$ pgbench -c 10 -j 5 -r -T 100 pgbench</code></p> <p><code>starting vacuum...end.</code></p> <p><code>transaction type: <builtin: TPC-B (sort of)></code></p> <p><code>scaling factor: 100</code></p> <p><code>query mode: simple</code></p> <p><code>number of clients: 10</code></p> <p><code>number of threads: 5</code></p> <p><code>duration: 100 s</code></p> <p><code>number of transactions actually processed: 14289</code></p> <p><code>latency average = 70.130 ms</code></p> <p><code>tps = 142.591560 (including connections establishing)</code></p> <p><code>tps = 142.601721 (excluding connections establishing)</code></p> <p><code>statement latencies </code><code>in</code> <code>milliseconds:</code></p> <p><code> </code><code>0.002 \</code><code>set</code> <code>aid random(1, 100000 * :scale)</code></p> <p><code> </code><code>0.001 \</code><code>set</code> <code>bid random(1, 1 * :scale)</code></p> <p><code> </code><code>0.001 \</code><code>set</code> <code>tid random(1, 10 * :scale)</code></p> <p><code> </code><code>0.000 \</code><code>set</code> <code>delta random(-5000, 5000)</code></p> <p><code> </code><code>0.045 BEGIN;</code></p> <p><code> </code><code>0.244 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;</code></p> <p><code> </code><code>0.111 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;</code></p> <p><code> </code><code>0.388 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;</code></p> <p><code> </code><code>2.975 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;</code></p> <p><code> </code><code>0.091 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);</code></p> <p><code> </code><code>65.978 END;</code></p> </td> </tr> </tbody> </table> 异地异步,性能高很多: <table> <tbody> <tr> <td> <p>1</p> <p>2</p> <p>3</p> <p>4</p> <p>5</p> <p>6</p> <p>7</p> <p>8</p> <p>9</p> <p>10</p> <p>11</p> <p>12</p> <p>13</p> <p>14</p> <p>15</p> <p>16</p> <p>17</p> <p>18</p> <p>19</p> <p>20</p> <p>21</p> <p>22</p> <p>23</p> <p>24</p> </td> <td> <p><code>[postgres@90027282-dd74-4642-b5dd-359b6cc67aaf /dbaas/pg]$ pgbench -c 10 -j 5 -r -T 100 pgbench</code></p> <p><code>starting vacuum...end.</code></p> <p><code>transaction type: <builtin: TPC-B (sort of)></code></p> <p><code>scaling factor: 100</code></p> <p><code>query mode: simple</code></p> <p><code>number of clients: 10</code></p> <p><code>number of threads: 5</code></p> <p><code>duration: 100 s</code></p> <p><code>number of transactions actually processed: 26333</code></p> <p><code>latency average = 37.987 ms</code></p> <p><code>tps = 263.246388 (including connections establishing)</code></p> <p><code>tps = 263.264004 (excluding connections establishing)</code></p> <p><code>statement latencies </code><code>in</code> <code>milliseconds:</code></p> <p><code> </code><code>0.002 \</code><code>set</code> <code>aid random(1, 100000 * :scale)</code></p> <p><code> </code><code>0.001 \</code><code>set</code> <code>bid random(1, 1 * :scale)</code></p> <p><code> </code><code>0.001 \</code><code>set</code> <code>tid random(1, 10 * :scale)</code></p> <p><code> </code><code>0.000 \</code><code>set</code> <code>delta random(-5000, 5000)</code></p> <p><code> </code><code>0.042 BEGIN;</code></p> <p><code> </code><code>0.228 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;</code></p> <p><code> </code><code>0.109 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;</code></p> <p><code> </code><code>0.310 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;</code></p> <p><code> </code><code>1.764 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;</code></p> <p><code> </code><code>0.090 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);</code></p> <p><code> </code><code>35.349 END;</code></p> </td> </tr> </tbody> </table> 本地异步呢,更快: <table> <tbody> <tr> <td> <p>1</p> <p>2</p> <p>3</p> <p>4</p> <p>5</p> <p>6</p> <p>7</p> <p>8</p> <p>9</p> <p>10</p> <p>11</p> <p>12</p> <p>13</p> <p>14</p> <p>15</p> <p>16</p> <p>17</p> <p>18</p> <p>19</p> <p>20</p> <p>21</p> <p>22</p> <p>23</p> <p>24</p> </td> <td> <p><code>[postgres@90027282-dd74-4642-b5dd-359b6cc67aaf /dbaas/pg]$ pgbench -c 10 -j 5 -r -T 100 pgbench</code></p> <p><code>starting vacuum...end.</code></p> <p><code>transaction type: <builtin: TPC-B (sort of)></code></p> <p><code>scaling factor: 100</code></p> <p><code>query mode: simple</code></p> <p><code>number of clients: 10</code></p> <p><code>number of threads: 5</code></p> <p><code>duration: 100 s</code></p> <p><code>number of transactions actually processed: 29572</code></p> <p><code>latency average = 33.821 ms</code></p> <p><code>tps = 295.670234 (including connections establishing)</code></p> <p><code>tps = 295.691390 (excluding connections establishing)</code></p> <p><code>statement latencies </code><code>in</code> <code>milliseconds:</code></p> <p><code> </code><code>0.002 \</code><code>set</code> <code>aid random(1, 100000 * :scale)</code></p> <p><code> </code><code>0.001 \</code><code>set</code> <code>bid random(1, 1 * :scale)</code></p> <p><code> </code><code>0.001 \</code><code>set</code> <code>tid random(1, 10 * :scale)</code></p> <p><code> </code><code>0.000 \</code><code>set</code> <code>delta random(-5000, 5000)</code></p> <p><code> </code><code>0.042 BEGIN;</code></p> <p><code> </code><code>0.219 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;</code></p> <p><code> </code><code>0.108 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;</code></p> <p><code> </code><code>0.253 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;</code></p> <p><code> </code><code>1.592 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;</code></p> <p><code> </code><code>0.091 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);</code></p> <p><code> </code><code>31.412 END;</code></p> </td> </tr> </tbody> </table> **接下来要思考的场景:** 那如果遇到一个落后很久的备库,想不重建的情况下恢复。有归档日志,我们应该怎么做? 1)先从归档恢复,再建立同步?不行,恢复后,日志线就+1了,不能再和主建立同步了。 2)将归档日志,从库缺少的地方开始,拷贝到主库pg\_wal下,再进行启动从库。待验证。 **3)直接在从库的recovery.conf文件配置如下,先从归档恢复,再恢复流复制关系,这样最靠谱:** restore\_command = 'cp /dbaas/pg/data/pg\_receivewal\_data/%f "%p"' standby\_mode = on primary\_conninfo = 'host=10.9.10.203 port=5432 user=postgres connect\_timeout=60' recovery\_target\_timeline = 'latest' trigger\_file = '/dbaas/pg/data/.tfile' 严以律己、宽以待人 [pg_receivewal]: https://www.cnblogs.com/kuang17/p/10868556.html [Section 26.2.6]: https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS
还没有评论,来说两句吧...