S6BucketLVE validation

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1707
Credit: 1075495573
RAC: 1208762

Yeah I got a few of those on

Yeah I got a few of those on one of my AMD's and I guess I am glad I had switched back over to GRP's before I got more of them.

Don't see any of that on my Intels but haven't checked them all yet.

http://einsteinathome.org/host/4519028/tasks&offset=0&show_names=1&state=4&appid=0

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 500598371
RAC: 33844

RE: The SSE implementation

Quote:

The SSE implementation is not different between them, but sometimes the OS system calls have different behaviour related to XMM registers.
For example I have a - fortunately graphic - algorythm, which uses XMM5, XMM6 and XMM7 registers to store predefined contstants for calculation. During this lenghty calculation it calls the Windows MsgWaitForMultipleObjects system call to get next piece of source data (produced by another thread). On AMD machines this system call clears the XMM0-XMM5 registers but leaves XMM6 and XMM7 registers untouched (on XP, XP x64, Windows7 x64), so the result image is corrupted: the solution is that after every call the program must reload the constant to XMM5 register. This behaviour has been seen on K7, K8 and K10 CPUs, but not on either Intel machines (P3, P4, Core2 and newer).

I don't know whether you use hand-written asm code, but maybe this helps.

This raises another question: which of these results is correct?
Is there a need to redo all wu's validated on 2 AMD processors?

BTW, it's not only AMD against Intel,
http://einsteinathome.org/workunit/146229594
this one is Intel against Intel.

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169699457
RAC: 0

My AMD hosts seem to be

My AMD hosts seem to be absolutely fine - all Opteron (servers) though. Some invalid tasks like this one which appears to be a tough nut to crack(!), one interesting one that failed (very rare for the host) was validated between an FX8120 and an E3200.

ph2000
ph2000
Joined: 17 Mar 05
Posts: 7
Credit: 936499973
RAC: 0

Invalid tasks have always

Invalid tasks have always been, and will be always, for example due to failed CPU overclock; I think they can recognize and filter them correctly (it's a joke to crunch a CPU task in 242 seconds :) ).

I think we should wait the result of investigation - the validator has been stopped again - and the solution, shouldn't stress the project developers with this.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110023252188
RAC: 22560066

RE: My AMD hosts seem to be

Quote:
My AMD hosts seem to be absolutely fine - all Opteron (servers) though. Some invalid tasks like this one which appears to be a tough nut to crack(!), one interesting one that failed (very rare for the host) was validated between an FX8120 and an E3200.


The first link is an example of bad data that has slipped through the screening process. Unless pulled manually by the Devs, that quorum will grow to 20 before it is terminated automatically.

The second link is probably an example of what Bernd is trying to solve at the moment. I may have missed it but I don't think Bernd has said that it's necessarily to do with AMD vs Intel. BTW, the E3200 (a Wolfdale Celeron dual core) is one of my hosts and it's been crunching 24/7 for the last 4+ years. Currently it has no errors or invalids in its tasks list. It is overclocked and it's currently running in an environment where the ambient is around 35 C. It runs reliably and the crunch times aren't too shabby for a relatively old architecture either :-).

Cheers,
Gary.

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 500598371
RAC: 33844

RE: Invalid tasks have

Quote:

Invalid tasks have always been, and will be always, for example due to failed CPU overclock; I think they can recognize and filter them correctly (it's a joke to crunch a CPU task in 242 seconds :) ).


You are right, 242 sec is abnormal, but this particular PC is a standalone (no KB, mouse or monitor is attached, only power and net) and the pc is absolutely not overclocked. It's a live backup for a critical system and does nothing else.
It just happened.

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169699457
RAC: 0

RE: RE: My AMD hosts seem

Quote:
Quote:
My AMD hosts seem to be absolutely fine - all Opteron (servers) though. Some invalid tasks like this one which appears to be a tough nut to crack(!), one interesting one that failed (very rare for the host) was validated between an FX8120 and an E3200.

The second link is probably an example of what Bernd is trying to solve at the moment. I may have missed it but I don't think Bernd has said that it's necessarily to do with AMD vs Intel. BTW, the E3200 (a Wolfdale Celeron dual core) is one of my hosts and it's been crunching 24/7 for the last 4+ years. Currently it has no errors or invalids in its tasks list.

Yes, I cited the 2nd one as a counter-example to the suggestion that it's a problem with AMD/Intel cross-validation; here your Intel host and another AMD host agreed, and cast out my AMD host - thanks for looking at it.

Having said that, I've looked at all my hosts now, and FWIW the only two displaying errors like the 2nd link are also the only AMD units I have online at present (other 6 are mixed intel). They are both quad opteron Supermicro servers, ECC RAM, well cooled, known reliable systems, no overclock, yadda yadda. These are the 4 validation errors I found (including the one cited above):-

Host 6123309 Error 1
Host 6123309 Error 2
Host 6119246 Error 1
Host 6119246 Error 2

Happy to make these hosts available if the devs. want to see if the failure can be reproduced.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245272446
RAC: 12219

The problem didn't have

The problem didn't have anything to do with platforms or vendors, it was essentially a bug in the validator (for the curious: it had already been there since S6LV1, but went unnoticed there).

It has been found and fixed, the new validator is crunching through the backlog. All should be back to normal in a few days.

BM

BM

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 518
Credit: 417680156
RAC: 622260

Hallo BM! Congrats! Within

Hallo BM!
Congrats!
Within this time I got very little tasks of S6LVE. I believe, you reduced the output of this tasks very much to avoid an overflow of the database and bring them back to normal now. What will be the mean ratio of tasks between FRGP2 and S6LVE and by what ist this determined?

Kind regards and happy crunching
Martin

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169699457
RAC: 0

Great to hear, thanks for

Great to hear, thanks for letting us know.

Onwards and upwards!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.