can you tell me something about error rates in the various einstein tasks?

merle van osdol
merle van osdol
Joined: 1 Mar 05
Posts: 513
Credit: 60724446
RAC: 0

I've tried about everything

I've tried about everything and I can't get anything to work. One machine has zero errors the other 6%. Is there a way to tell from the data available which device, 0 or 1 or both is the culprit on that machine. I could sit there and stare at the screen all day but that is hard to do with old eyes like mine.

I haven't tried the removal of virus protection; that kind of puts the fear of god in me.

merle

What is freedom of expression? Without the freedom to offend, it ceases to exist.

— Salman Rushdie

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2694028
RAC: 0

RE: Is there a way to tell

Quote:
Is there a way to tell from the data available which device, 0 or 1 or both is the culprit on that machine.


Read the stderr.txt:

Quote:

7.2.42

Activated exception handling...
[20:45:49][3264][INFO ] Starting data processing...
[20:45:49][3264][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[20:45:49][3264][INFO ] Using OpenCL device "Pitcairn" by: Advanced Micro Devices, Inc.
[20:45:50][3264][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...

Or:

Quote:

7.2.42

Activated exception handling...
[16:37:06][3080][INFO ] Starting data processing...
[16:37:07][3080][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[16:37:07][3080][INFO ] Using OpenCL device "Tahiti" by: Advanced Micro Devices, Inc.
[16:37:07][3080][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...

Claggy

merle van osdol
merle van osdol
Joined: 1 Mar 05
Posts: 513
Credit: 60724446
RAC: 0

RE: RE: Is there a way to

Quote:
Quote:
Is there a way to tell from the data available which device, 0 or 1 or both is the culprit on that machine.

Read the stderr.txt:

Quote:

7.2.42

Activated exception handling...
[20:45:49][3264][INFO ] Starting data processing...
[20:45:49][3264][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[20:45:49][3264][INFO ] Using OpenCL device "Pitcairn" by: Advanced Micro Devices, Inc.
[20:45:50][3264][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...

Or:

Quote:

7.2.42

Activated exception handling...
[16:37:06][3080][INFO ] Starting data processing...
[16:37:07][3080][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[16:37:07][3080][INFO ] Using OpenCL device "Tahiti" by: Advanced Micro Devices, Inc.
[16:37:07][3080][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...

Claggy

Yes but both of mine are Tahiti 270x and a 280x. :-(

Wait I just looked it up and the 270x says its a curacao!
I need to check. Thanks Claggy

--edit
It seems that both of them are being reported to Einstein as Tahiti.
I guess I have to watch as they clear thru the system.
Thanks though.

merle

What is freedom of expression? Without the freedom to offend, it ceases to exist.

— Salman Rushdie

scole of TSBT
scole of TSBT
Joined: 2 Mar 05
Posts: 10
Credit: 632347682
RAC: 866828

Check more of your valid

Check more of your valid tasks. One GPU is reported as Pitcairn...http://einsteinathome.org/task/464681263

I think that is your 270.

See if there's any invalid on that one or just the Tahiti/280

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2694028
RAC: 0

RE: --edit It seems that

Quote:
--edit
It seems that both of them are being reported to Einstein as Tahiti.
I guess I have to watch as they clear thru the system.
Thanks though.


The server only shows the most Capable GPUs (for each vendor), so it'll only display that your host has Tahiti GPUs, even through the 2nd AMD/ATI GPU is a Pitcairn,

Claggy

merle van osdol
merle van osdol
Joined: 1 Mar 05
Posts: 513
Credit: 60724446
RAC: 0

RE: Check more of your

Quote:

Check more of your valid tasks. One GPU is reported as Pitcairn...http://einsteinathome.org/task/464681263

I think that is your 270.

See if there's any invalid on that one or just the Tahiti/280


I don't know why they would report a pitcairn on that computer. I have pitcairn's on my other computer??

merle

What is freedom of expression? Without the freedom to offend, it ceases to exist.

— Salman Rushdie

merle van osdol
merle van osdol
Joined: 1 Mar 05
Posts: 513
Credit: 60724446
RAC: 0

thanks Claggy, that's my

thanks Claggy,
that's my answer.

merle

What is freedom of expression? Without the freedom to offend, it ceases to exist.

— Salman Rushdie

merle van osdol
merle van osdol
Joined: 1 Mar 05
Posts: 513
Credit: 60724446
RAC: 0

I gave up with the watching

I gave up with the watching games after watching 8 valid in a row. I analyzed the times of the invalid ones and found that they all have to be from my fastest card the 280x. I increased the fan speed on the 280x alone to 75% (it was at 56%) to see if this helps or not. The fan speed is the only thing I can control. I can increase the core clock and the memory clock but it won't let me decrease them. It's a sapphire.

merle

What is freedom of expression? Without the freedom to offend, it ceases to exist.

— Salman Rushdie

merle van osdol
merle van osdol
Joined: 1 Mar 05
Posts: 513
Credit: 60724446
RAC: 0

This is the stderr from an

This is the stderr from an inconclusive that just passed thru. Can you tell if it's probably bad or not? Or can you tell me any info about it that might lead me to a reason for it being invalid? I know it's not yet invalid but it's from my 280x and I suspect it's invalid since I've had 8 in a row come thru as valid.

Stderr output
7.2.42

Activated exception handling...
[07:54:00][3912][INFO ] Starting data processing...
[07:54:00][3912][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[07:54:00][3912][INFO ] Using OpenCL device "Tahiti" by: Advanced Micro Devices, Inc.
[07:54:01][3912][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[07:54:01][3912][INFO ] Header contents:
------> Original WAPP file: ./PB0057_010B1_DM252.00
------> Sample time in microseconds: 1000
------> Observation time in seconds: 2097.152
------> Time stamp (MJD): 53843.193587979025
------> Number of samples/record: 0
------> Center freq in MHz: 1231.5
------> Channel band in MHz: 3
------> Number of channels/record: 96
------> Nifs: 1
------> RA (J2000): 63740.9799004
------> DEC (J2000): -51902.684
------> Galactic l: 0
------> Galactic b: 0
------> Name: G4383473
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 2097152
------> Trial dispersion measure: 252 cm^-3 pc
------> Scale factor: 1.82206
[07:54:02][3912][INFO ] Seed for random number generator is 1082949649.
[07:54:03][3912][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-008
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[07:59:28][3912][INFO ] Checkpoint committed!
[08:04:55][3912][INFO ] Checkpoint committed!
[08:10:23][3912][INFO ] Checkpoint committed!
[08:15:51][3912][INFO ] Checkpoint committed!
[08:21:18][3912][INFO ] Checkpoint committed!
[08:26:46][3912][INFO ] Checkpoint committed!
[08:32:13][3912][INFO ] Checkpoint committed!
[08:37:41][3912][INFO ] Checkpoint committed!
[08:42:17][3912][INFO ] OpenCL shutdown complete!
[08:42:18][3912][INFO ] Statistics: count dirty SumSpec pages 42974 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 1100505
[08:42:18][3912][INFO ] Data processing finished successfully!
[08:42:18][3912][INFO ] Starting data processing...
[08:42:18][3912][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[08:42:18][3912][INFO ] Using OpenCL device "Tahiti" by: Advanced Micro Devices, Inc.
[08:42:18][3912][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[08:42:18][3912][INFO ] Header contents:
------> Original WAPP file: ./PB0057_010B1_DM254.00
------> Sample time in microseconds: 1000
------> Observation time in seconds: 2097.152
------> Time stamp (MJD): 53843.193587937269
------> Number of samples/record: 0
------> Center freq in MHz: 1231.5
------> Channel band in MHz: 3
------> Number of channels/record: 96
------> Nifs: 1
------> RA (J2000): 63740.9799004
------> DEC (J2000): -51902.684
------> Galactic l: 0
------> Galactic b: 0
------> Name: G4383473
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 2097152
------> Trial dispersion measure: 254 cm^-3 pc
------> Scale factor: 1.819
[08:42:19][3912][INFO ] Seed for random number generator is 1084118061.
[08:42:20][3912][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-008
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[08:43:09][3912][INFO ] Checkpoint committed!
[08:48:36][3912][INFO ] Checkpoint committed!
[08:54:04][3912][INFO ] Checkpoint committed!
[08:59:32][3912][INFO ] Checkpoint committed!
[09:04:59][3912][INFO ] Checkpoint committed!
[09:10:27][3912][INFO ] Checkpoint committed!
[09:15:54][3912][INFO ] Checkpoint committed!
[09:21:22][3912][INFO ] Checkpoint committed!
[09:26:49][3912][INFO ] Checkpoint committed!
[09:30:35][3912][INFO ] OpenCL shutdown complete!
[09:30:35][3912][INFO ] Statistics: count dirty SumSpec pages 47746 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 1100505
[09:30:35][3912][INFO ] Data processing finished successfully!
09:30:35 (3912): called boinc_finish

]]>

merle

What is freedom of expression? Without the freedom to offend, it ceases to exist.

— Salman Rushdie

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 516
Credit: 414850319
RAC: 746609

Hallo Merle! If I remember

Hallo Merle!
If I remember coorectly, this looks very much the same as the error remarks I got, about a month ago , whe I had in the mean over 3 weeks and 200 tasks about 10% errorrate, before I changed the gpu card driver. See my thread. Since then I crunched more than 300 tasks without any invalids or inconlusivs. But I had to use the beta-version of the driver!
For applications running on cpu only I get very, very little errors, far less than 1 in 1000 seen over the last years. I believe all these applications became best optimized for Windows OS, as this is by far the most used one. Orther OS like Mac and Linux haver higher error rates and the rate you can see, and/or derive from the Server Status Page is the average over all, and so higher than that for Windows OS. But BM could tell you more about all this.

Kind regards and happy crunching
Martin

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.