[Erp5-report] r42362 kazuhiko - in /erp5/trunk/products: ERP5/tests/ ERP5Type/
nobody at svn.erp5.org
nobody at svn.erp5.org
Sun Jan 16 22:37:57 CET 2011
Author: kazuhiko
Date: Sun Jan 16 22:37:56 2011
New Revision: 42362
URL: http://svn.erp5.org?rev=42362&view=rev
Log:
reapply r42160 "try to parse latin-1 encoded url (even though that is invalid according to RFC 3986)." with modifying testWebCrawler.py. tested on both Zope-2.8 and Zope-2.12.
Modified:
erp5/trunk/products/ERP5/tests/testWebCrawler.py
erp5/trunk/products/ERP5Type/Utils.py
Modified: erp5/trunk/products/ERP5/tests/testWebCrawler.py
URL: http://svn.erp5.org/erp5/trunk/products/ERP5/tests/testWebCrawler.py?rev=42362&r1=42361&r2=42362&view=diff
==============================================================================
--- erp5/trunk/products/ERP5/tests/testWebCrawler.py [utf8] (original)
+++ erp5/trunk/products/ERP5/tests/testWebCrawler.py [utf8] Sun Jan 16 22:37:56 2011
@@ -205,7 +205,7 @@ class TestWebCrawler(ERP5TypeTestCase):
Funny link</a></p>
<p><a href="http://www.example.com/section">Internal link</a></p>
<p><a href="section2">Relative Internal link</a></p>
- <p><a href="http://www.example.com/?title=%E9+crit">With Encoding issue
+ <p><a href="http://www.example.com/?title=%E9crit">With Encoding issue
This link will be discarded</a></p>
<img src="my_image_link"/>
<script src="should_not_be_followed.js"/>
@@ -217,7 +217,8 @@ class TestWebCrawler(ERP5TypeTestCase):
self.assertEquals(web_page.getContentNormalisedURLList(),
["http://www.example.com/I%20don't%20care%20I%20put%20what/%20I%20want/",
'http://www.example.com/section',
- 'http://www.example.com/section2',])
+ 'http://www.example.com/section2',
+ 'http://www.example.com/?title=\xc3\xa9crit',])
# relative links without base tag
text_content = """<html>
<head>
Modified: erp5/trunk/products/ERP5Type/Utils.py
URL: http://svn.erp5.org/erp5/trunk/products/ERP5Type/Utils.py?rev=42362&r1=42361&r2=42362&view=diff
==============================================================================
--- erp5/trunk/products/ERP5Type/Utils.py [utf8] (original)
+++ erp5/trunk/products/ERP5Type/Utils.py [utf8] Sun Jan 16 22:37:56 2011
@@ -3187,6 +3187,7 @@ class ScalarMaxConflictResolver(persiste
###################
# URL Normaliser #
###################
+from Products.PythonScripts.standard import url_unquote
try:
import urlnorm
except ImportError:
@@ -3258,6 +3259,11 @@ def urlnormNormaliseUrl(url, base_url=No
"""
try:
url = urlnorm.norm(url)
+ except UnicodeDecodeError:
+ try:
+ url = urlnorm.norm(url_unquote(url).decode('latin1'))
+ except UnicodeDecodeError:
+ raise urlnorm.InvalidUrl
except (AttributeError, urlnorm.InvalidUrl):
# This url is not valid, a better Exception will
# be raised
More information about the Erp5-report
mailing list