Binary Radio Pulsar Search (Perseus Arm Survey) "BRP5"

tbret
tbret
Joined: 12 Mar 05
Posts: 2115
Credit: 4862288136
RAC: 104948

I have PMed Richard and I

I have PMed Richard and I don't know who else to tell.

The Perseus distribution has gone horribly wrong.

Because I cannot tell which of my computers may NOT have contacted the servers asking for work, I cannot be sure of what I am seeing.

However, it looks like computers with multiple NVIDIA rigs (mine, anyway) are downloading ridiculous amounts of work (as I posted in Cruncher's). Some in excess of 500, one in excess of 700 work units.

Obviously, with 3-5 hour tasks, that's a week or more worth of tasks and my caches are set to 0.5 days or 1 day.

I've invoked NNT on all of those machines that I've caught *and* have access-to.

I'm getting BOINC scheduler back-offs of something like 21 hours, as well.

I can think of a reason someone might do this on purpose, but from this end I can't tell if it is done on purpose or if the scheduler has gone haywire.

If anyone has an explanation or a theory, I'd be happy to hear it.

EDIT: By the way, I have *other* Einstein application tasks In Progress and those numbers seem to be behaving themselves. But since we were running out of BRP4 Arecibo tasks, I have un-checked to receive any of them.

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169699457
RAC: 0

The credit required to

The credit required to maintain the status quo seems to be around 5000, across my 7 active hosts (mix of slow/fast Intel & AMD, mostly nvidia). Processing the einstein job log file (as per this message and the next), the results are quite variable across hosts and so far I can't see much of a pattern. Some slow hosts improve, some fast hosts drop back. At 5k credit per task, there's still a small hit to my fleet.

N.B note that using the job log file in this way assumes all tasks will validate - so if there are problems the credit per day will be lower.

Here's a representative example for host 6564477:

The X axis is in days, and the Y axis is in points per day. This host has been averaging about 63k/day (as shown by the blue line, which also represents the results if BRP5 tasks are awarded 5K. The red line is the effect of awarding 4k/task, and the brown line is the effect of 6k/task. Although the data are limited, this does give a rough idea (other hosts are similar).

The source code (java) for the program is available here; to use it, compile using "javac jl.java", copy in the "job_log_einstein.phys.uwm.edu.txt" from the project directory and run it with "java jl -p [$FILE]". The output consists of one line per day, with three columns comprising day number, points for that day, and average points per day - redirect it into a file and plot using 'gnuplot' (or any spreadsheet). Points per task are set around line 120.

I aim to prepare an aggregate graph across all my hosts in the next day or so; of course combining job log files for a large number of users would be best but I assume these files are only available on the host (and not the servers).

The Xorcist
The Xorcist
Joined: 16 Aug 11
Posts: 16
Credit: 464281554
RAC: 0

RE: The Perseus

Quote:
The Perseus distribution has gone horribly wrong.


Horribly wrong is a simcity/ea distribution term isn't it ?

Quote:
I can think of a reason someone might do this on purpose, but from this end I can't tell if it is done on purpose or if the scheduler has gone haywire.

What ? "On purpose", Im really not sure what you mean there my friend. I haven't experienced or read anything regarding others with this issue. My additional work buffer set at 0.75 days was adjusted to the new computation times accordingly.

For me personally the PA series of work units has only effected my GPU usage as in my GPU is not being fully utilised anymore.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954603288
RAC: 714538

RE: I have PMed Richard and

Quote:
I have PMed Richard and I don't know who else to tell.


Replied in NC. From here, it looks like a client problem rather than a server problem, but I've noted that Bernd might like to consider capping the daily quota per host for a while.

Sid
Sid
Joined: 17 Oct 10
Posts: 164
Credit: 968977975
RAC: 404468

Have got 2 BRP5

Have got 2 BRP5 unsuccessfully validated WUs. Same with a "wingman". Waiting for more.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250362713
RAC: 35188

After a few problems we

After a few problems we finally got the validator running. For a start we'll grant 4000 per (valid) task.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954603288
RAC: 714538

Your WU 165131651 has status

Your WU 165131651 has status 'unknown' - that can't be right.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250362713
RAC: 35188

RE: Your WU 165131651 has

Quote:
Your WU 165131651 has status 'unknown' - that can't be right.

There were 430 tasks that ended up as validate errors at first. I reset these to be re-validated. This is most likely one of those.

Honestly I don't know what "status" the web interface shows as "unknown" - in the DB these look ok.

I am confident that the status will show up correctly when the new validator has touched this WU again.

BM

BM

S@NL - John van Gorsel
S@NL - John van...
Joined: 19 Feb 05
Posts: 5
Credit: 39664585
RAC: 29361

RE: Your WU 165131651 has

Quote:
Your WU 165131651 has status 'unknown' - that can't be right.

Here's another one
It's already sent to two new hosts.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954603288
RAC: 714538

RE: RE: Your WU 165131651

Quote:
Quote:
Your WU 165131651 has status 'unknown' - that can't be right.

There were 430 tasks that ended up as validate errors at first. I reset these to be re-validated. This is most likely one of those.

Honestly I don't know what "status" the web interface shows as "unknown" - in the DB these look ok.

I am confident that the status will show up correctly when the new validator has touched this WU again.

BM


And hopefully the unsent fifth replication will get changed to 'don't need' before it goes out?

(edit - assuming they validate)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.