Here's an interesting bug, which may be related to some DDOS-prevention tool on "bitcointalk.org". Our SiteTruth site rating system keeps reporting that "bitcointalk.org" has no web site. This is because, if you make certain HTTP requests more than twice to "bitcointalk.org", the site blocks you for a minute. At the bottom of this post is a Python 2.7 program you can use to demonstrate this. The output of the program looks like this:
>\python27\python timeoutbugtest2.py
Try 0:
MyURLOpener opening request: http://bitcointalk.org [('Accept', '*/*'), ('User-agent', 'SiteTruth.com site rating system')]
Opened OK.
Try 1:
MyURLOpener opening request: http://bitcointalk.org [('Accept', '*/*'), ('User-agent', 'SiteTruth.com site rating system')]
Opened OK.
Try 2:
MyURLOpener opening request: http://bitcointalk.org [('Accept', '*/*'), ('User-agent', 'SiteTruth.com site rating system')]
Open FAILED: ('http://bitcointalk.org', u'HTTP error - timed out.')
Waiting 60 seconds before retry.
Try 3:
MyURLOpener opening request: http://bitcointalk.org [('Accept', '*/*'), ('User-agent', 'SiteTruth.com site rating system')]
Opened OK.
Try 4:
MyURLOpener opening request: http://bitcointalk.org [('Accept', '*/*'), ('User-agent', 'SiteTruth.com site rating system')]
Opened OK.
Try 5:
MyURLOpener opening request: http://bitcointalk.org [('Accept', '*/*'), ('User-agent', 'SiteTruth.com site rating system')]
Open FAILED: ('http://bitcointalk.org', u'HTTP error - timed out.')
Waiting 60 seconds before retry.
Try 6:
MyURLOpener opening request: http://bitcointalk.org [('Accept', '*/*'), ('User-agent', 'SiteTruth.com site rating system')]
Opened OK.
Try 7:
MyURLOpener opening request: http://bitcointalk.org [('Accept', '*/*'), ('User-agent', 'SiteTruth.com site rating system')]
Opened OK.
Try 8:
MyURLOpener opening request: http://bitcointalk.org [('Accept', '*/*'), ('User-agent', 'SiteTruth.com site rating system')]
Open FAILED: ('http://bitcointalk.org', u'HTTP error - timed out.')
Waiting 60 seconds before retry.
This continues indefinitely - two successful opens, then a timeout, wait 1 minute, repeat.
It's not clear what sets this off. Browsers don't seem to trigger it. Our site rating system does, though. When it starts rating a site, it makes a few requests ("example.com", "
www.example.com", an HTTPS request, etc., checking for redirects and trying to find the front door to the site.) That's enough to trigger this.
Anyone associated with the site know what's going on, and what's in the path to the site? This could be some load-balancer or firewall problem.
How do you contact the people behind "bitcointalk", anyway.
The code:
#
# Test for SiteTruth URL timeout bug.
#
import urlparse
import urllib2
import time
import encodings
kuseragent = "SiteTruth.com site rating system" # USER-AGENT sent when crawling
kdefaultsockettimeout = 15.0 # allow this much time seconds for socket timeout
# Class InfoException -- used for exceptions related to a page or URL
#
# Usage: InfoException(url, message)
#
class InfoException(Exception) :
"Information from external website was not as expected"
def __init__(self, *args) : # Initializer
self.url = args[0] # save troubled URL
self.errmsg = unicode(args[1]) # save problem
Exception.__init__(self,args) # initialize parent
def __unicode__(self) : # convert to string
msg = u'Problem with page "%s": %s.' % (self.url, self.errmsg)
return(msg)
def open(purl) :
try: # catch only "Unicode error" in URL
headers = { "User-agent" : kuseragent } # set our user agent
req = urllib2.Request(purl, None, headers) # build request
# Workaround for Coyote Point load-balancer bug.
# If the last field is User-agent, and it ends with "m" but doesn't otherwise contain "m",
# a Coyote Point load balancer will drop the packet. So we add an extra header
# that really isn't necessary.
req.add_header('Accept', '*/*') # add unnecessary header
print("MyURLOpener opening request: %s %s" % (purl, repr(req.header_items()))) ## ***TEMP***
result = urllib2.urlopen(req, None, kdefaultsockettimeout) # do the open
except UnicodeError: # bad domain name syntax in Unicode format
raise socket.gaierror("Syntax error in domain name") # treat as get-address-error error
except urllib2.HTTPError as message :
raise InfoException(purl, u'HTTP error - %s.' % (unicode(message.code)))
except urllib2.URLError as message :
message = getattr(message,'reason',message) # use "Reason" if available"
raise InfoException(purl, 'HTTP error - %s.' % (unicode(message)))
return(result) # return result of open
#
# Main program
#
def main() :
retrydelay = 60 # wait 60 seconds before retry
for tries in range(100) :
print("Try %d:" % (tries,))
try :
fd = open("http://bitcointalk.org") # URL causing problem
print(" Opened OK.")
fd.close()
except (InfoException,EnvironmentError,) as message:
print(" Open FAILED: %s " % (message,))
print("Waiting %d seconds before retry." % (retrydelay,))
time.sleep(retrydelay)
main()