[Neo-report] [Bug #1853] Verification step loop forever with a huge amount of data

Grégory Wisniewski gregory at nexedi.com
Fri Oct 1 17:05:47 CEST 2010


  Bug     : Verification step loop forever with a huge amount of data
  Status  : Resolved
  Date    : 2010/06/28
  Link    : https://www.tiolive.com/nexedi/bug_module/1853/view
  Reporter   : Grégory Wisniewski
  Request Project  : NEO R&D
  Assigned Project : NEO R&D

  Description:

With a lots of data (500'000 documents, 20 millions rows in a single object table) the verification process cannot succeed and the cluster never switch to the running state.
The 'AskOIDs' packet trigger a MySQL request with a 'order by' and 'limit' clauses on 'obj' table. This request can takes too long time to complete and trigger a timeout on the replicating storage which causes the replication to restart from zero.

Here is the traceback on the out-of-date storage side:
2010-06-28 08:32:16,778 DEBUG     STORAGE_02 [        logger: 41] #0x002c0309 AskOIDs                         to  53d6564c27c0468a3585aa42ad0deab1 (212.85.154.248:3004)
2010-06-28 08:32:22,782 DEBUG     STORAGE_02 [        logger: 41] #0x002c030a Ping                            to  53d6564c27c0468a3585aa42ad0deab1 (212.85.154.248:3004)
2010-06-28 08:32:27,786 INFO      STORAGE_02 [    connection:282] timeout with <ClientConnection(uuid=53d6564c27c0468a3585aa42ad0deab1, address=212.85.154.248:3004, closed=False) at 7f2974fddb90>
2010-06-28 08:32:27,786 DEBUG     STORAGE_02 [        logger: 41] #0x002c030b Notify                          to  53d6564c27c0468a3585aa42ad0deab1 (212.85.154.248:3004)
2010-06-28 08:32:27,786 DEBUG     STORAGE_02 [    connection:454] aborting a connector for <ClientConnection(uuid=53d6564c27c0468a3585aa42ad0deab1, address=212.85.154.248:3004, closed=False) at 7f2974fddb90>
2010-06-28 08:32:27,787 DEBUG     STORAGE_02 [       handler:112] timeout expired for <ClientConnection(uuid=53d6564c27c0468a3585aa42ad0deab1, address=212.85.154.248:3004, closed=False) at 7f2974fddb90>
2010-06-28 08:32:27,787 ERROR     STORAGE_02 [   replication: 32] replication is stopped due to a connection lost
2010-06-28 08:32:27,787 DEBUG     STORAGE_02 [    replicator:224] asking unfinished tids

 Messages :

++++++ Message #2 submitted by Grégory Wisniewski on 2010/10/01 15:05:46 UTC ++++++
Solved with r 2221 (askOIDs improved for replication)

++++++ Message #1 submitted by Grégory Wisniewski on 2010/06/28 11:49:03 UTC ++++++


More information about the Neo-report mailing list