[Erp5-report] r41979 nicolas - /erp5/trunk/products/PortalTransforms/transforms/safe_html.py

nobody at svn.erp5.org nobody at svn.erp5.org
Tue Jan 4 11:32:55 CET 2011


Author: nicolas
Date: Tue Jan  4 11:32:54 2011
New Revision: 41979

URL: http://svn.erp5.org?rev=41979&view=rev
Log:
Workaround bug in HTMLParser (2.5<= v <=2.7) which is impossible to fix due
lack of HTMLParser API which does not accept encoding parameter.
So decoding strings on the fly can not be ensured in all cases.
Python3 solve the problem by accepting only unicode bytes.

The fix consist to pass unicode content to the parser.


Modified:
    erp5/trunk/products/PortalTransforms/transforms/safe_html.py

Modified: erp5/trunk/products/PortalTransforms/transforms/safe_html.py
URL: http://svn.erp5.org/erp5/trunk/products/PortalTransforms/transforms/safe_html.py?rev=41979&r1=41978&r2=41979&view=diff
==============================================================================
--- erp5/trunk/products/PortalTransforms/transforms/safe_html.py [utf8] (original)
+++ erp5/trunk/products/PortalTransforms/transforms/safe_html.py [utf8] Tue Jan  4 11:32:54 2011
@@ -279,6 +279,16 @@ def scrubHTML(html, valid=VALID_TAGS, na
                              remove_javascript=remove_javascript,
                              raise_error=raise_error,
                              default_encoding=default_encoding)
+    # HTMLParser is affected by a known bug referenced
+    # by http://bugs.python.org/issue3932 
+    # As suggested by python developpers:
+    # "Python 3.0 implicitly rejects non-unicode strings"
+    # We try to decode strings against provided codec first
+    if isinstance(html, str):
+      try:
+        html = html.decode(default_encoding)
+      except UnicodeDecodeError:
+        pass
     parser.feed(html)
     parser.close()
     result = parser.getResult()



More information about the Erp5-report mailing list