No work

AT Hiker
AT Hiker
Joined: 30 Aug 12
Posts: 9
Credit: 1634069
RAC: 0
Topic 218498

Is Einstein sending work?

Have run any work in a while and now that I have returned I can't get work.

Whats up.

 

AT Hiker

 

 

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

I see you have two computers

I see you have two computers listed, but is that actually the same machine behind both of the computer IDs ? They  seem to have identical hardware specs.

One of those hosts has been in contact with the project server during last two days. It has received and crunched 10 tasks from 'Gamma-ray pulsar binary search #1 on GPUs v1.20 () windows_x86_64' app. Some of the results are already successfully validated, some are waiting for validation.

What applications have you enabled at the https://einsteinathome.org/account/prefs/project ? What apps (tasks) do you want your computer to run ?

Currently the project server status page https://einsteinathome.org/server_status.html shows there should be thousands of those FRPB1G available for example.

Michael
Michael
Joined: 7 Jul 17
Posts: 2
Credit: 54704929
RAC: 0

I seem to have no work, the

I seem to have no work, the only task I have is a GPU one.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5888
Credit: 119863679243
RAC: 26005018

Michael wrote:I seem to have

Michael wrote:
I seem to have no work, the only task I have is a GPU one.

You actually have 64 GPU tasks and none are being returned.

The three oldest were sent to your computer on 4th April and after the deadline passed on 18th April they were automatically canceled.  You can see all your GPU tasks through this link.  Is there any reason why you aren't crunching and returning GPU tasks?

If you use the drop down menu at the top right of the page and select FGRP #5, you can see all your CPU tasks.  You have 65 in total, all of which have been crunched and returned.  So why aren't you getting any more?

I believe it may be due to the fact that you aren't crunching and returning GPU work.  When your first 3 GPU tasks timed out, the system probably treated this as if those tasks took 14 days and so you probably have an enormous duration correction factor (DCF) as a result.  This would likely cause the server to suddenly think that not only can you not complete GPU tasks within the deadline, you also wouldn't be able to complete new CPU tasks.  As evidence of this, you just need to look at the scheduler logs that will tell you about the decisions the scheduler made when it worked out what new tasks (if any) to send to you.  There is a pinned post in the "Getting Started" that tells you how to see the scheduler logs.  This link takes you to the most recent log entry for your computer and I've extracted the relevant bit that gives you the answer.


2019-04-24 05:06:38.4247 [PID=19367] [version] Best version of app hsgamma_FGRP5 is 1.08 ID 1010 FGRPSSE (4.41 GFLOPS)
2019-04-24 05:06:38.4247 [PID=19367] [send] est. duration for WU 401687419: unscaled 23821.99 scaled 24349.27
2019-04-24 05:06:38.4247 [PID=19367] [send] [WU#401687419] deadline miss 4520019 > 1209600
2019-04-24 05:06:38.4247 [PID=19367] [send] [HOST#12545592] [WU#401687419 LATeah0056F_1224.0_498411_0.0] using delay bound 1209600 (opt: 1209600 pess: 1209600)
2019-04-24 05:06:38.4247 [PID=19367] [send] [HOST#12545592] [WU#401687419 LATeah0056F_1224.0_498411_0.0] WU is infeasible: CPU too slow

There is a lot more in the full log entry but the above bit tells the story.

The first line shows that the scheduler has selected to send tasks for exactly what you're after.  The 2nd line shows that the scheduler is going to check if a task for the the WU ID 401687419 has an estimated duration small enough for your computer to complete it before the deadline.  On the 3rd line, the deadline is 14 days (1209600secs) and the estimate is 4520019secs (52.3 days).  The scheduler sees a deadline miss so there is no way the scheduler will be sending that task to you.  The scheduler ultimately tells you the WU is infeasible because it thinks the CPU is too slow.

So how do you solve this problem?  Quite easily.  Get your GPU to crunch and return the GPU tasks.  The estimates for both CPU tasks and GPU tasks are linked through a single DCF so if one estimate has become enormous (I'm guessing we can blame the GPU estimate) then the other will be dragged way out of kilter as well.  If you crunch GPU tasks, the estimates will return to more sane values, all on their own.  You can hasten the process by stopping BOINC and manually correcting the bad DCF value in the state file (client_state.xml) - if you know how to properly edit xml files and you have a good plain text editor to do the job.  The safest way is to let the system do it automatically over time.

Have you installed a proper driver with the proper OpenCL compute libs so the GPU can crunch?  I'm guessing that this might be the real cause of the problem.  You need to get them from the nvidia website.  I don't use Windows so I can't give you specific details.  There are many others who can.  You should be able to find lots of previous sets of instructions if you search the forums.  A Windows user might point you in the right direction.

EDIT:  This whole thread should be on the "Problems ..." board.  I'll shift it there in a day or two.

 

Cheers,
Gary.

Michael
Michael
Joined: 7 Jul 17
Posts: 2
Credit: 54704929
RAC: 0

I think the heart of the

I think the heart of the problem was a WU was stuck in the GPU that was due back in 2018. I reset the project and I am crunching again.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.