[Erp5-report] r42362 kazuhiko - in /erp5/trunk/products: ERP5/tests/ ERP5Type/

nobody at svn.erp5.org nobody at svn.erp5.org
Sun Jan 16 22:37:57 CET 2011


Author: kazuhiko
Date: Sun Jan 16 22:37:56 2011
New Revision: 42362

URL: http://svn.erp5.org?rev=42362&view=rev
Log:
reapply r42160 "try to parse latin-1 encoded url (even though that is invalid according to RFC 3986)." with modifying testWebCrawler.py. tested on both Zope-2.8 and Zope-2.12.

Modified:
    erp5/trunk/products/ERP5/tests/testWebCrawler.py
    erp5/trunk/products/ERP5Type/Utils.py

Modified: erp5/trunk/products/ERP5/tests/testWebCrawler.py
URL: http://svn.erp5.org/erp5/trunk/products/ERP5/tests/testWebCrawler.py?rev=42362&r1=42361&r2=42362&view=diff
==============================================================================
--- erp5/trunk/products/ERP5/tests/testWebCrawler.py [utf8] (original)
+++ erp5/trunk/products/ERP5/tests/testWebCrawler.py [utf8] Sun Jan 16 22:37:56 2011
@@ -205,7 +205,7 @@ class TestWebCrawler(ERP5TypeTestCase):
           Funny link</a></p>
       <p><a href="http://www.example.com/section">Internal link</a></p>
       <p><a href="section2">Relative Internal link</a></p>
-      <p><a href="http://www.example.com/?title=%E9+crit">With Encoding issue
+      <p><a href="http://www.example.com/?title=%E9crit">With Encoding issue
       This link will be discarded</a></p>
       <img src="my_image_link"/>
       <script src="should_not_be_followed.js"/>
@@ -217,7 +217,8 @@ class TestWebCrawler(ERP5TypeTestCase):
     self.assertEquals(web_page.getContentNormalisedURLList(),
                     ["http://www.example.com/I%20don't%20care%20I%20put%20what/%20I%20want/",
                      'http://www.example.com/section',
-                     'http://www.example.com/section2',])
+                     'http://www.example.com/section2',
+                     'http://www.example.com/?title=\xc3\xa9crit',])
     # relative links without base tag
     text_content = """<html>
     <head>

Modified: erp5/trunk/products/ERP5Type/Utils.py
URL: http://svn.erp5.org/erp5/trunk/products/ERP5Type/Utils.py?rev=42362&r1=42361&r2=42362&view=diff
==============================================================================
--- erp5/trunk/products/ERP5Type/Utils.py [utf8] (original)
+++ erp5/trunk/products/ERP5Type/Utils.py [utf8] Sun Jan 16 22:37:56 2011
@@ -3187,6 +3187,7 @@ class ScalarMaxConflictResolver(persiste
 ###################
 #  URL Normaliser #
 ###################
+from Products.PythonScripts.standard import url_unquote
 try:
   import urlnorm
 except ImportError:
@@ -3258,6 +3259,11 @@ def urlnormNormaliseUrl(url, base_url=No
   """
   try:
     url = urlnorm.norm(url)
+  except UnicodeDecodeError:
+    try:
+      url = urlnorm.norm(url_unquote(url).decode('latin1'))
+    except UnicodeDecodeError:
+      raise urlnorm.InvalidUrl
   except (AttributeError, urlnorm.InvalidUrl):
     # This url is not valid, a better Exception will
     # be raised



More information about the Erp5-report mailing list