[Erp5-dev] how indexing works

Thu Sep 22 09:09:37 CEST 2005

Thank you very much Yoshinori.

Also, this new way of reindexing objects require sometimes to change the 
configuration of mysql. Indeed, by default the value of 
"max_allowed_packet" in the my.cnf file can be too small, and reindexing 
many objects generates big packets. For example the value of 1Mo is too 
small. Personnaly, I have :

[mysqld]
max_allowed_packet = 64M

The value is already fine in the live CD.

  Seb.

Le mercredi 21 Septembre 2005 22:17, Yoshinori Okuji a écrit :
> According to a request from Sébastien, I describe how indexing works in
> the current implementation briefly.
>
> In the past, ERP5 catalogged objects in one-by-one basis. For each
> object, portal_catalog called Z SQL Methods to insert rows into tables.
> This was slow, because MySQL invoked its SQL query interpreter each
> time and needed to rebuild indices each time. This was slow, also
> because the cache efficiency in ZODB was bad.
>
> Now ERP5 groups multiple objects for indexing, using the new
> functionality in CMFActivity. The activity SQLDict implements support
> for group methods and expand methods. First, I explain group methods.
>
> When we make an active object, this looks like this:
>
> obj.activate().immediateReindexObject()
>
> CMFActivity can be extended arbitrarily by passing optional parameters
> to activate:
>
> obj.activate(group_method_id='portal_catalog/catalogObjectList').immedi
>ateReindexObject()
>
> This parameter "group_method_id" is simply ignored when an activity
> does not recognize it. But SQLDict recognizes it, and applies a special
> handling for this active object. In the case of this example, SQLDict
> tries to gather active objects which has the same group method id. In
> the current setting, SQLDict collects up to 100 objects at a time, and
> validates each active object (e.g. checking an after method id). Then,
> SQLDict obtains objects from ZODB and calls the group method with the
> list of those object. So, in SQLDict, immediateReindexObject is not
> used at all any longer, while keeping compatibility.
>
> The method "catalogObjectList" in portal_catalog calls Z SQL Methods
> with the list of objects (after filtering). This reduces the number of
> SQL queries to MySQL significantly, and so performs better. Also, if
> objects are related (in most cases, yes), the ZODB cache hits the same
> objects with a higher probability, so this also reduces the load of
> Zope.
>
> Now, about expand methods. There are some ways to implement
> recursiveReindexObject. In the past implementation,
> recursiveReindexObject called immediateReindexObject with recursively
> traversed objects. So, one way was to call catalogObjectList with a
> list of traversed objects. However, this does not allow grouping a
> recursiveReindexObject call with another or reindexObject. So I decided
> to add a new parameter into SQLDict: expand_method_id.
>
> As you can see in ERP5Type/Document/Folder.py, recursiveReindexObject
> is like this:
>
> obj.activate(group_method_id='portal_catalog/catalogObjectList',
> expand_method_id='getIndexableChildValueList').recursiveImmediateReinde
>xObject()
>
> As you understand above, when an activity does not recognize
> group_method_id or expand_method_id, this just calls
> recursiveImmediateReindexObject as before. But SQLDict deals with this
> in a different way. Because this uses the same group method as
> reindexObject, this is grouped with reindexObject. Then, SQLDict finds
> an expand method "getIndexableChildValueList" and calls this method
> with the object. The result is a list of all child objects, including
> the object itself, which are indexable. This result is taken into
> account for the group method, and the rest is the same as
> reindexObject.
>
> Due to this change, portal_catalog does not use Z SQL Methods for one
> object, such as z_catalog_category, any longer. Instead, it uses
> methods for multiple objects, such as z_catalog_object_list. These
> methods make use of the extended inserts specific to MySQL, which can
> insert multiple rows by a single query. Although this is specific to
> MySQL, we can do similar optimization for PostgreSQL as well (e.g.
> dropping indices, inserting rows, and rebuilding indices).
>
> Is this enough?
>
> YO

-- 
Sebastien Robin, Nexedi Technical Director
Nexedi: Consulting and Development of Free / Open Source Software
http://www.nexedi.com
ERP5: Free / Open Source ERP Software for small and medium companies
http://www.erp5.org
Storever: OpenBrick, WiFi infrastructure, notebooks and servers
http://www.storever.com