[Erp5-dev] performance of reindex object

Jean-Paul Smets jp at nexedi.com
Fri Feb 12 10:40:04 CET 2010


Hi,

Some hints:
    - we have sites with > 10,000 K lines in various tables and this 
does not happen
    - reindexing speed is tested by unit test, fluctuating, but under 
control

Questions
    - what are those documents for which "sometimes reindexing four of 
five docs takes more then a minute"
    - are there any extensions to catalog ? (ex. many columns ? scripts 
in catalog which parse objects resursively)
    - are you using MySQL ?

Reindexing speed should be between 10 and 30 simple documents / second / 
core. If your document is complex, made for example of 100 subdocuments, 
it will take 3 to 10 seconds for reindexing the root document, which is 
normal, since you are actually reindexing 100 documents. If your root 
document is made of 1000 subdocuments, changing the way to recursively 
reindex subdocuments could be considered. If your root document is made 
of 10,000 subdocuments, changing the way to recursively reindex 
subdocuments is required.

Another possibility for slow reindexing is abuse of indices of MySQL (or 
any other DB). The more indices you add, the slower INSERT. In large 
sites, we usually remove some indices and add others, but this really 
depends on the application and the nature of data, so there are no 
universal rules here besides "optimize your indices in MySQL based on 
your data".

Another possiblity is locking problems. One process of indexing is 
waiting for another to finish. You must study what happens in MySQL to 
track that (there are many tools for that purpose).

Anyway, optimizing "pure" reindexing speed is not so easy because this 
is very often an issue of optimizing python method calls and the way 
data is accessed. We are for example currently improving the speed of 
catalog by caching some values related to the filters. This will provide 
a few % improvement.

Regards,

JPS.



Bartek Gorny wrote:
> Hello,
>
> I'm running a production instance of ERP5, and I have a performance
> problem - reindexing some documents consumes a lot of CPU power.
> Sometimes reindexing four of five docs takes more then a minute, with
> mysql consuming up to 200% CPU and python processes eating up another
> 50% (this is a virtual machine running on three CPU cores, using ZEO,
> with three processing nodes). Something is definitely wrong - my
> question is, where should I begin to look for a problem. I read
> "performance crimes", and I don't seem to have committed any of those
> (at least not outright). Any advice, how to trace and where the
> problem may arise, would be most welcome.
>
> The dbase is not very big - count of objects in tables are:
>
> catalog: 380K
> category:950K
> delivery:4K
> movement: 130K
> predicate:160K
> predicate_category:160K
> roles_and_users:2K
> stock:40K
>
> So, is there a problem, have I done something wrong, or is it just too much?
>
> Bartek
>
>
>   


-- 
Jean-Paul Smets-Solanes, Nexedi CEO - Tel. +33(0)6 29 02 44 25
ERP5 Enterprise: Open Source ERP/CRM for Mission Critical Applications
http://www.erp5.com
TioLive SaaS: run your business online, with more freedom
http://www.tiolive.com
Nexedi: Consulting and Development of Free / Open Source Software 
http://www.nexedi.com




More information about the Erp5-dev mailing list