[Erp5-dev] how indexing works

Yoshinori Okuji yo at nexedi.com
Wed Sep 21 22:17:45 CEST 2005


According to a request from Sébastien, I describe how indexing works in the 
current implementation briefly.

In the past, ERP5 catalogged objects in one-by-one basis. For each object, 
portal_catalog called Z SQL Methods to insert rows into tables. This was 
slow, because MySQL invoked its SQL query interpreter each time and needed to 
rebuild indices each time. This was slow, also because the cache efficiency 
in ZODB was bad.

Now ERP5 groups multiple objects for indexing, using the new functionality in 
CMFActivity. The activity SQLDict implements support for group methods and 
expand methods. First, I explain group methods.

When we make an active object, this looks like this:

obj.activate().immediateReindexObject()

CMFActivity can be extended arbitrarily by passing optional parameters to 
activate:

obj.activate(group_method_id='portal_catalog/catalogObjectList').immediateReindexObject()

This parameter "group_method_id" is simply ignored when an activity does not 
recognize it. But SQLDict recognizes it, and applies a special handling for 
this active object. In the case of this example, SQLDict tries to gather 
active objects which has the same group method id. In the current setting, 
SQLDict collects up to 100 objects at a time, and validates each active 
object (e.g. checking an after method id). Then, SQLDict obtains objects from 
ZODB and calls the group method with the list of those object. So, in 
SQLDict, immediateReindexObject is not used at all any longer, while keeping 
compatibility.

The method "catalogObjectList" in portal_catalog calls Z SQL Methods with the 
list of objects (after filtering). This reduces the number of SQL queries to 
MySQL significantly, and so performs better. Also, if objects are related (in 
most cases, yes), the ZODB cache hits the same objects with a higher 
probability, so this also reduces the load of Zope.

Now, about expand methods. There are some ways to implement 
recursiveReindexObject. In the past implementation, recursiveReindexObject 
called immediateReindexObject with recursively traversed objects. So, one way 
was to call catalogObjectList with a list of traversed objects. However, this 
does not allow grouping a recursiveReindexObject call with another or 
reindexObject. So I decided to add a new parameter into SQLDict: 
expand_method_id.

As you can see in ERP5Type/Document/Folder.py, recursiveReindexObject is like 
this:

obj.activate(group_method_id='portal_catalog/catalogObjectList', 
expand_method_id='getIndexableChildValueList').recursiveImmediateReindexObject()

As you understand above, when an activity does not recognize group_method_id 
or expand_method_id, this just calls recursiveImmediateReindexObject as 
before. But SQLDict deals with this in a different way. Because this uses the 
same group method as reindexObject, this is grouped with reindexObject. Then, 
SQLDict finds an expand method "getIndexableChildValueList" and calls this 
method with the object. The result is a list of all child objects, including 
the object itself, which are indexable. This result is taken into account for 
the group method, and the rest is the same as reindexObject.

Due to this change, portal_catalog does not use Z SQL Methods for one object, 
such as z_catalog_category, any longer. Instead, it uses methods for multiple 
objects, such as z_catalog_object_list. These methods make use of the 
extended inserts specific to MySQL, which can insert multiple rows by a 
single query. Although this is specific to MySQL, we can do similar 
optimization for PostgreSQL as well (e.g. dropping indices, inserting rows, 
and rebuilding indices).

Is this enough?

YO
-- 
Yoshinori Okuji, Nexedi Research Director
Nexedi: Consulting and Development of Free / Open Source Software
http://www.nexedi.com
ERP5: Free / Open Source ERP Software for small and medium companies
http://www.erp5.org
Storever: OpenBrick, WiFi infrastructure, notebooks and servers
http://www.storever.com



More information about the Erp5-dev mailing list