Too many jobs

ohat
ohat
Joined: 6 Oct 05
Posts: 5
Credit: 27501687
RAC: 0
Topic 194738

In the last 10 days or so, I have got more than 500 jobs that are now ready to start. Jobs are still pouring into my computer (30 or 40 a day).
Is it the meaning that I shall crunch all jobs that are still present in E&H's database?
What is wrong?

I have been running E&H for several years now whithout any errors.

ohat

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 763069819
RAC: 1058006

Too many jobs

Quote:

In the last 10 days or so, I have got more than 500 jobs that are now ready to start. Jobs are still pouring into my computer (30 or 40 a day).
Is it the meaning that I shall crunch all jobs that are still present in E&H's database?
What is wrong?

I have been running E&H for several years now whithout any errors.

ohat

Hi!

There are two different searches now and therefore two different types of workunits on E@H these days. One search (ABP2) has very short WUs (around half an hour to one hour runtime) while the other one (S5R6) has WUs of around 5 hour runtime.

Your computer (dual core with CUDA enabled NVIDIA card) will most likely run exactly one short and one long WU in parallel, most of the time. The short ones take ~33 minutes on your PC, so if your PC runs 24/7, you can crunch more than 40 of them per day (!). Plus ca 4 or 5 WUs of the longer type, so you get close to 50 results per day. If your preferences are set to maintain a 10 day cache, the 500 results would be exactly what you asked for :-).

Having said that, the mix of short and long results within the same project seems to be quite a challenge for the BOINC scheduling logic (remember: it's the BOINC client software that ASKS for more work, it's not the server forcing downloads). Sometimes BOINC indeed seems to overcommit a PC in this situation.

So if you still feel that the client is downloading too many E@H WUs and it interferes with other projects' deadlines, you can either set E@H to "no new Work " for a while until the cache has emptied somewhat, or you might try to reduce E@H's resource share and see if that helps. You might also want to update the BOINC client to the latest version, the scheduling and the handling of CUDA tasks is improving from version to version (IMHO).

I hope this explains what you're seeing
Happy crunching
H-B

ohat
ohat
Joined: 6 Oct 05
Posts: 5
Credit: 27501687
RAC: 0

RE: RE: In the last 10

Message 96673 in response to message 96672

Quote:
Quote:

In the last 10 days or so, I have got more than 500 jobs that are now ready to start. Jobs are still pouring into my computer (30 or 40 a day).
Is it the meaning that I shall crunch all jobs that are still present in E&H's database?
What is wrong?

I have been running E&H for several years now whithout any errors.

ohat

Hi!

There are two different searches now and therefore two different types of workunits on E@H these days. One search (ABP2) has very short WUs (around half an hour to one hour runtime) while the other one (S5R6) has WUs of around 5 hour runtime.

Your computer (dual core with CUDA enabled NVIDIA card) will most likely run exactly one short and one long WU in parallel, most of the time. The short ones take ~33 minutes on your PC, so if your PC runs 24/7, you can crunch more than 40 of them per day (!). Plus ca 4 or 5 WUs of the longer type, so you get close to 50 results per day. If your preferences are set to maintain a 10 day cache, the 500 results would be exactly what you asked for :-).

Having said that, the mix of short and long results within the same project seems to be quite a challenge for the BOINC scheduling logic (remember: it's the BOINC client software that ASKS for more work, it's not the server forcing downloads). Sometimes BOINC indeed seems to overcommit a PC in this situation.

So if you still feel that the client is downloading too many E@H WUs and it interferes with other projects' deadlines, you can either set E@H to "no new Work " for a while until the cache has emptied somewhat, or you might try to reduce E@H's resource share and see if that helps. You might also want to update the BOINC client to the latest version, the scheduling and the handling of CUDA tasks is improving from version to version (IMHO).

I hope this explains what you're seeing
Happy crunching
H-B


Thanks for your excelent comment.
I will set E&H to "No new work" for a few days to see how it works out.
By the way, the fastest WU requires about two hours to complete.

ohat

Olaf
Olaf
Joined: 16 Sep 06
Posts: 26
Credit: 190763630
RAC: 0

My observation with two

Message 96674 in response to message 96673

My observation with two different computers with GPUs is, that 'too much
CPU WUs' appears sooner or later, if the additional work buffer is not
exactly set to zero. In this case BOINC requests jobs both for CPU and GPU
until it has enough for both - and the server sends randomly. Neither the
server nor the client take into account the number of CPUs and GPUs to get
a proper fraction of GPU tasks.
Because in the average e@h sends more work for CPUs than for GPUs currently,
one gets much more CPU jobs than indicated for the work buffer. One has the
choice either to manage the work fetch manually (no new tasks/allow new tasks)
or to set the additional work buffer to zero (in this case BOINC tries to fetch
work for CPU and GPU separately, depending on what is currently needed.
Even if one sets the additional work buffer to 0.1days, over night one gets
work for several days for an i7 (doing 7 cpu jobs and one CPU+GPU job at once).
And one gets enough work for a week or two for a dual core + GPU.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 763069819
RAC: 1058006

RE: By the way, the

Message 96675 in response to message 96673

Quote:


By the way, the fastest WU requires about two hours to complete.

ohat

No, it's much faster than that, see for yourself:

http://einsteinathome.org/task/156848921

A bit over 2000 seconds !!

@Olaf: Yes, that makes a lot of sense.

CU
H-B

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

This just hit me to. Boinc

This just hit me to.

Boinc (6.10.18) was asking for work all the time. It didn't follow the schedulers request to back of 1 min and it was asking for the same amount of CPU and GPU tasks.

I think the reason it was asking for work was to top of the cache for the GPU, but Einstein was out of work (*) for the GPU and as Boinc also requested the same amount for the CPU it got one new CPU-task for every request.

* Bernd reported in another thread that the ABP2 workunit generators can't keep up with demand.

A short example of my log: (Times are UTC +1)

2010-01-23 10:32:07 Einstein@Home [sched_op_debug] Starting scheduler request
2010-01-23 10:32:07 Einstein@Home Sending scheduler request: To fetch work.
2010-01-23 10:32:07 Einstein@Home Requesting new tasks for CPU and GPU
2010-01-23 10:32:07 Einstein@Home [sched_op_debug] CPU work request: 1745.36 seconds; 0 idle CPUs
2010-01-23 10:32:07 Einstein@Home [sched_op_debug] NVIDIA GPU work request: 1745.36 seconds; 0 idle GPUs
2010-01-23 10:32:12 Einstein@Home Scheduler request completed: got 1 new tasks
2010-01-23 10:32:12 Einstein@Home [sched_op_debug] Server version 611
2010-01-23 10:32:12 Einstein@Home Project requested delay of 60 seconds
2010-01-23 10:32:12 Einstein@Home [sched_op_debug] estimated total CPU job duration: 15679 seconds
2010-01-23 10:32:12 Einstein@Home [sched_op_debug] estimated total NVIDIA CPU job duration: 0 seconds
2010-01-23 10:32:12 Einstein@Home [sched_op_debug] Deferring communication for 1 min 0 sec
2010-01-23 10:32:12 Einstein@Home [sched_op_debug] Reason: requested by project
2010-01-23 10:33:12 Einstein@Home [sched_op_debug] Starting scheduler request
2010-01-23 10:33:12 Einstein@Home Sending scheduler request: To fetch work.
2010-01-23 10:33:12 Einstein@Home Requesting new tasks for CPU and GPU
2010-01-23 10:33:12 Einstein@Home [sched_op_debug] CPU work request: 1823.26 seconds; 0 idle CPUs
2010-01-23 10:33:12 Einstein@Home [sched_op_debug] NVIDIA GPU work request: 1823.26 seconds; 0 idle GPUs
2010-01-23 10:33:22 Einstein@Home Scheduler request completed: got 1 new tasks
2010-01-23 10:33:22 Einstein@Home [sched_op_debug] Server version 611
2010-01-23 10:33:22 Einstein@Home Project requested delay of 60 seconds
2010-01-23 10:33:22 Einstein@Home [sched_op_debug] estimated total CPU job duration: 15689 seconds
2010-01-23 10:33:22 Einstein@Home [sched_op_debug] estimated total NVIDIA CPU job duration: 0 seconds
2010-01-23 10:33:22 Einstein@Home [sched_op_debug] Deferring communication for 1 min 0 sec
2010-01-23 10:33:22 Einstein@Home [sched_op_debug] Reason: requested by project
2010-01-23 10:34:22 Einstein@Home [sched_op_debug] Starting scheduler request
2010-01-23 10:34:22 Einstein@Home Sending scheduler request: To fetch work.
and so on...

Three things strike me as odd:
1. Why is it asking the same amount for both the CPU and GPU? The cache for the CPU is overfilled. Setting at "connect every" 0 (zero) days and "maintain enough work for" 0.5 days. I currently have at least 5 days worth of CPU-tasks.

2. Why doesn't it wait 1 min between attempts as requested by the project?

3. Why doesn't the project refuse to give work as the client didn't back off the requested time?

Is there any log-flags I could set to help debug this?

[Edit] It's now set to no new tasks! =)

/Holmis

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 156
Credit: 2231772952
RAC: 621525

As regards paragraphs 2 and

As regards paragraphs 2 and 3, I can say that any error was not there. The client waits for exactly one minute, as the server is requesting.

Look:
2010-01-23 10:32:12 Einstein@Home Project requested delay of 60 seconds
...
2010-01-23 10:33:12 Einstein@Home Sending scheduler request: To fetch work.
and
2010-01-23 10:33:22 Einstein@Home Project requested delay of 60 seconds
...
010-01-23 10:34:22 Einstein@Home Sending scheduler request: To fetch work.

As regards paragraph 1 - I think no error in the mechanism of the query tasks, but to assess the speed of their implementation a lot. Scheduler thinks that he has in the cache for the CPU cycles of all tasks in 15500 seconds, this is less than 0.5 days, which you specify in the settings, so he continues to request new tasks. The problem is that in reality for their implementation will need 5 days (in your words) ...
As a temporary solution, you can simply reduce the size of "maintain enough work for" a few times (try to 0.1-0.2 for example). A complete problem can be solved only in the newer versions of BOINC client, when the estimate of the rate and time will be for the CPU and GPU separately.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Thanks Mad_Max! My bad on

Thanks Mad_Max!

My bad on nr 2.

As to the 15500, if that comes from my posted log then that's how much it got from the request, 1 unit estimated to take ~15500 sec or just over 4 hours.

I did overestimate the amount in my cache but it was at least 1,5 - 2 days worth of work.
By mistake I left it to it's own during the night here and now I have about 130 S5R6 tasks in my cache estimated to take from 3h 20min to 4h 30min per unit running 4 at a time it will last about 5 days.

It kept asking for the same amount of CPU and GPU work every minute right up until I reached the daily quota.

It's set to no new tasks again and will be unless I'm there to watch it.

/Holmis

CoM
CoM
Joined: 19 Feb 05
Posts: 4
Credit: 40860433
RAC: 8499

Every time I connect to the

Every time I connect to the Einstein Server I get new WU´s. But the project is set to "No new tasks". In the messages tab one can see, that my client is not requesting any new tasks, but its getting WU´s.

26/01/2010 08:47:08 Einstein@Home Sending scheduler request: Requested by user.
26/01/2010 08:47:08 Einstein@Home Reporting 5 completed tasks, not requesting new tasks
26/01/2010 08:47:18 Einstein@Home Scheduler request completed
26/01/2010 08:47:20 Einstein@Home Started download of h1_1053.35_S5R4
26/01/2010 08:47:20 Einstein@Home Started download of l1_1053.35_S5R4

Is this a problem of my client version (its 6.10.18) or of the Einstein Server?

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: 26/01/2010

Message 96680 in response to message 96679

Quote:

26/01/2010 08:47:20 Einstein@Home Started download of h1_1053.35_S5R4
26/01/2010 08:47:20 Einstein@Home Started download of l1_1053.35_S5R4

Is this a problem of my client version (its 6.10.18) or of the Einstein Server?


Neither one. That aren't new tasks downloaded but data files to compute already downloaded (assigned) tasks.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

RE: Neither one. That

Quote:
Neither one. That aren't new tasks downloaded but data files to compute already downloaded (assigned) tasks.

Yes, but he definitely got a lot of tasks at this time, see his list of tasks.

Michael

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.