Uploads temporarily diabled

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245204788
RAC: 13300
Topic 220051

The O2MDF search on GPU(s) is producing results at a rate that we didn't expect. As a result, the file system of our upload server ("einstein4") is completely flattened out, and a number of processes (including the validators) are standing on each others feet to get their share, further slowing things down. To make things worse, the file system is filling up, because also file deletion doesn't work as fast as it should. For now I disabled further uploads (and some other background processes) until things have settled and cooled down a bit.

BM

10esseeTony
10esseeTony
Joined: 8 Feb 15
Posts: 4
Credit: 783486485
RAC: 0

My case may be isolated, but

My case may be isolated, but I find I receive ONLY O2MDF tasks which my preferences do not request, but worse, my client is set to NOT RECEIVE any Einstein at all, yet I'm still getting tasks for GPU.

If this is a widespread bug, it may explain the flood of incoming tasks.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 987
Credit: 1433006846
RAC: 599808

It must really be borked to

It must really be borked to be down for over 3 hrs. 

Jan Vaclavik
Jan Vaclavik
Joined: 1 Sep 05
Posts: 10
Credit: 1877246
RAC: 264

Seems like its working

Seems like its working again.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109948293377
RAC: 31480079

10esseeTony wrote:My case may

10esseeTony wrote:
My case may be isolated, but I find I receive ONLY O2MDF tasks which my preferences do not request, but worse, my client is set to NOT RECEIVE any Einstein at all, yet I'm still getting tasks for GPU.

It would be really helpful if you used the available resources to check what the scheduler is actually doing when it decides what work to send to you.  You could then more properly describe the problem rather than making wild accusations.  The truth of the matter is that the scheduler is not defying your preferences and sending you GPU tasks that you did not request.

I've checked the scheduler log for the only currently active host in your list of hosts.  The scheduler is not sending you new work at all.  It is simply trying to deal with work previously allocated to you that has somehow become 'lost'.  If you don't want this work then just abort it, and return it, and it will never, never, never, come back to you again!!

At some point in the past, work has been allocated to you that has somehow become 'lost'.  There are two common ways this can happen.  I'll describe each one.  Firstly, a legitimate request arrived at the scheduler which then allocated some new tasks based on that request.  The scheduler sent the return message with all the details but because of some random network issue somewhere, your client didn't receive the response.  So there is a discrepancy between what the scheduler thinks and what your client can see.  No problem.  On the next contact with the scheduler, your client will receive the lost tasks.  It will be noted in the event log as "resending lost results" and, once your client has them, everything will be as it should.

The other common way is if the tasks were properly received in the first place but some subsequent event - eg a computer crash or the user deliberately or inadvertently deleting stuff the client needs, resulted in the discrepancy.  In other words, events at the client end have resulted in some sort of damage to the client information stored in the state file (client_state.xml).  Once again, the scheduler will notice and will try to rectify the problem by resending lost information if possible.

Irrespective of the actual reason why stuff became 'lost', if you really don't want these tasks (or any tasks for that matter) then you just need to abort and return what you don't want.

If a large number of tasks is involved, the scheduler will try to resend them in batches of 12 per scheduler contact.  So make sure you check the event log to see if there are any further messages about "resending lost tasks".  If there are, keep clicking 'update' until those messages stop.  Then click the first task as listed on the tasks page and 'shift-click' the very last task so that you highlight everything you wish to abort.  A single click of the 'abort' button will get rid of the lot.

10esseeTony wrote:
If this is a widespread bug, it may explain the flood of incoming tasks.

There is no "widespread bug" and your 'problem' would better have been directed to the Problems board rather than polluting Technical News.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.