[IPOL discuss] Time out

Jean-Michel Morel morel at cmla.ens-cachan.fr
Mon Feb 7 11:40:44 CET 2011


This makes it, thanks! I suggest that this instruction to authors about 
time out be added in the  IPOL "instructions to authors".
All the best,
Jean-Michel


Nicolas Limare a écrit :
>> We frequently habe "time out" on our demos. I suppose that many will
>> be fixed with the more powerful server, but however, I think that
>> users are not properly informed.
> 
> No, the new server will only improve the performance of parallel
> algorithms. There will be more CPUs, but each CPU won't be faster
> than a CPU on the current server, so a program using only 1 CPU won't
> have better performances on the new server.
> 
> And the situation won't improve. The processors on the next
> generations of machines won't go faster, there will only be more
> processors. This has been true for a few years now, and will probably
> last for a while.
> 
> For the record, only ASIFT really does real (OpenMP) parallel
> processing for the moment. Some demos use "quick and dirty" parallel
> processing by splitting the image, but I'd like to see these hacks
> gradually disappear.
> 
>> Once a demo is launched, in addition to the small spinning top, there
>> should de a message such as: "In case this execution takes more than
>> 60 seconds, it will be interrupted"
> 
> I just added this sentence.
> 
>> And the window that appears in case of time out with "Error 504: Gateway
>> Timeout. This part of IPOL Is Temporary unavailable ... " is
>> inaccurate since this is generally interrupted by time out and not a
>> server error. 
> 
> This page should never appear. The shortest answer is: «demos should
> properly use the timeout system». The detailed answer starts with a
> little explanation about the demo system network and timeout. See the
> end of this message for an answer in 3 sentences.
> 
> There are 2 layers between the network users and the demo:
> 
> users --- www.ipol.im --- demo.ipol.im --- demo 
>           proxy           proxy
> 
> These proxy are some transparent service redirections:
> * The proxy on www.ipol.im redirects all the requests to
>   http://www.ipol.im/pub/demo to the demo.ipol.im server. With this
>   redirection, the demos and other pages seem to be on the same
>   machine (ie on http://www.ipol.im/) even if they are hosted on
>   different servers, in different places: one in a datacenter in
>   Paris, one in the ENS server room
> * The proxy on demo.ipol.im connects the demo program to the external
>   network environment. It handles all the network load (and possible
>   abuses) while the demo programs run quietly.
> 
> Error 504 is a standard status code[1], sent by a proxy then it did
> not receive an answer from the backend server. In our case, the first
> proxy on www.ipol.im sends this page when it received no answer from
> demo.ipol.im. It is supposed to only happen when there is something
> gone wrong on the demo server (server off, rebooting, network
> disturbances, ...), and that's why the message says "temporarily
> unavailable".
> 
> [1]http://tools.ietf.org/html/rfc2616#section-10.5.5
> 
> Of course, we don't want to see this message when the demo are running
> but slow. Easy! The "Error 504" appears when the demo server didn't
> answer within 90 seconds (the proxy timeout), so we must ensure the
> demos reply before, and this is possible with the current code.
> 
> Most of the processing time comes from running the algorithm programs
> as subprocesses, so we control the execution time with the timeout
> option parameter of the wait_proc() function. If the process takes
> more than the timeout delay, it will be interrupted and a TimeoutError
> exception is raised:
> 
>     timeout = 60
>     p = self.run_proc([foo, bar])
>     try:
>         self.wait_proc(p, timeout)
>     except TimeoutError:
>         return self.error(errcode='timeout',
>                           errmsg="The program took too much time.")
> 
> A better way to do it is (see app/my_affine_sift/app.py):
> * use the self.timeout attribute defined for each app class
> * catch the timeout error from the self.run() function
> * pass the timeout parameter to the self.run_algo() function
> 
> 
>     def run(self):
>         try:
>             self.run_algo(timeout=self.timeout)
>         except TimeoutError:
>             return self.error(errcode='timeout',
>                               errmsg="The program took too much time.")
>         except RuntimeError:
>             return self.error(errcode='runtime',
> 			      errmsg="The program ended in an execution error")     
>         return self.tmpl_out("run.html")
> 
>     def run_algo(self, timeout=False)
>         p = self.run_proc([foo, bar])
>         self.wait_proc(p, timeout)
>         return
> 
> So, to avoid the infamous "Error 504" page, demos should always
> specify a non-null timeout when waiting for a process. And be careful
> if you run 3 processes sequentially, you should divide the timeout to
> ensure a global timeout, and the demo editor expertise is required to
> know which process is expected to need more time.
> 
>         p1 = self.run_proc([foo, bar])
>         self.wait_proc(p1, timeout/2)
>         p2 = self.run_proc([foo, bar])
>         self.wait_proc(p2, timeout/4)
>         p3 = self.run_proc([foo, bar])
>         self.wait_proc(p3, timeout/4)
> 
> Finally, to avoid this process timeout for most of the images, you
> should ensure the processed data size is within a reasonable limit,
> either via the input restrictions (max_size attributes) or via a
> custom control if you manipulate the input (crop, ...)
> 
> Another timeout is configurable at the Cherrypy level, but I thought
> it would be overkill.
> 
> And as usual, this is just the way I spontaneously implemented it; it
> can be modified, your contributions are welcome.
> 
>                                 -o-0-o-
> 
> In brief:
> * there is a hard external timeout of 90 seconds on all the demo pages,
>   always active : Error 504 
> * we avoid the hard timeout with a soft internal timeout on the
>   process executions; activated on demand in wait_proc(), the
>   suggested timeout configuration is 60 seconds
> * we avoid the soft timeout by controling and reducing the input size
> 
> Todolist:
> * notify by email the demo authors of any timeout triggered
> * gather statistics about these timeouts with other operation
>   infos[2]
> 
> [2]http://tools.ipol.im/munin/ipol/fuchsia.html#Ipol
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> discuss mailing list
> discuss at list.ipol.im
> http://tools.ipol.im/mailman/listinfo/discuss



More information about the discuss mailing list