This computer HATES the new work units

chablis
chablis
Joined: 18 Jan 05
Posts: 10
Credit: 1716171
RAC: 0
Topic 191426

This PC http://einsteinathome.org/host/5596

has been attached since the beginning of the project. As can be seen it has lately refused to run the app. Basically since S5. It is a dual athlon server. No changes recently. There is a dedicated HD for BOINC. It runs 5 projects cleanly + Einstein. I have tried a detach/reattach, I defragged with boinc off, I have done a chkdsk and no help.
Any ideas?

Bob

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250176678
RAC: 34797

This computer HATES the new work units

It looks like the CPU detection isn't properly working on this machine. Try to create a file named "CPU_TYPE_0" in your BOINC directory.

BM

BM

Pepperammi
Pepperammi
Joined: 20 Feb 05
Posts: 131
Credit: 437943
RAC: 0

RE: It looks like the CPU

Message 40125 in response to message 40124

Quote:

It looks like the CPU detection isn't properly working on this machine. Try to create a file named "CPU_TYPE_0" in your BOINC directory.

BM

Would getting the latest Boinc client help with this? or forceing it to redownload the app?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250176678
RAC: 34797

RE: Would getting the

Message 40126 in response to message 40125

Quote:
Would getting the latest Boinc client help with this? or forceing it to redownload the app?


I don't see how it should.

I don't see a sign for an incorrectly downloaded App. The signature must have been successfully verified once, so the App has been downloaded correctly. There might be the small possibility for a HD failure that occured lately, but it would much more likely lead to a hang or immediate crash or sth. than to an "illegal instruction" after successfully starting the App, reading the data files and detecting the CPU.

The CPU detection is built into the App, I don't see what the Client has to do with it.

The App detects this CPU as being capable of SSE ("Detected CPU type 1"), but gets an "illegal instruction" exception when executing the SSE code.

I have only seen very few machines where this went wrong. Some old, typically Distro-specific Linux kernels failed to enable SSE though the CPU was capable of it, and I got a PIII that refuses to execute SSE instructions after it had gotten too hot once, but you can count those cases on the fingers of one hand compared to the hundrets of thousands of machines where it works.

BM

BM

Ulrich Metzner
Ulrich Metzner
Joined: 22 Jan 05
Posts: 113
Credit: 963370
RAC: 0

I don't know, if that's the

I don't know, if that's the case here, but there are some older Athlon C (Thunderbird cores - some Mobile Athlons 4 for example) that are, because of their cpu-id, improperly detected as Athlon XP derivates by some CPU-detection routines, although they can't execute SSE instructions. Maybe this is a pair of cpus of this kind?

Aloha, Uli

Pepperammi
Pepperammi
Joined: 20 Feb 05
Posts: 131
Credit: 437943
RAC: 0

RE: RE: Would getting the

Message 40128 in response to message 40126

Quote:
Quote:
Would getting the latest Boinc client help with this? or forceing it to redownload the app?

I don't see how it should.

I don't see a sign for an incorrectly downloaded App. The signature must have been successfully verified once, so the App has been downloaded correctly. There might be the small possibility for a HD failure that occured lately, but it would much more likely lead to a hang or immediate crash or sth. than to an "illegal instruction" after successfully starting the App, reading the data files and detecting the CPU.

The CPU detection is built into the App, I don't see what the Client has to do with it.

The App detects this CPU as being capable of SSE ("Detected CPU type 1"), but gets an "illegal instruction" exception when executing the SSE code.

I have only seen very few machines where this went wrong. Some old, typically Distro-specific Linux kernels failed to enable SSE though the CPU was capable of it, and I got a PIII that refuses to execute SSE instructions after it had gotten too hot once, but you can count those cases on the fingers of one hand compared to the hundrets of thousands of machines where it works.

BM


Oh right Sorry. Just wondering if there where anyway to avoid that happening. Is there or could there be a way for it to recognise that something like that is happening? fall back to a safer instruction? Server already restricts the daily quota of failing machines doesn't it? so is there anyway it could send that "CPU_TYPE_0" advice you gave? but thats already sounding like lots of new work. I realise its only extreme few that would have this so probly negligable, i was surpised one machine has already returned two pages of errors. lucky he keeps an eye open and noticed it.

I forgot it checks the download sorry. Anyway hope the good advice worked for you Chablis

Pepperammi
Pepperammi
Joined: 20 Feb 05
Posts: 131
Credit: 437943
RAC: 0

RE: I don't know, if that's

Message 40129 in response to message 40127

Quote:
I don't know, if that's the case here, but there are some older Athlon C (Thunderbird cores - some Mobile Athlons 4 for example) that are, because of their cpu-id, improperly detected as Athlon XP derivates by some CPU-detection routines, although they can't execute SSE instructions. Maybe this is a pair of cpus of this kind?


Maybe sometime in the futre the way it detects could be changed for a quick test to check it can run it correctly (in case of OCs and over heating as you say too) instead of cpu-id. Probly so few cases there'd be no point.
cpu-id checks would have to be updated to properly detect new cores wouldn't it?
This has all probly been considered and regected for some reason before so ignore me if so.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250176678
RAC: 34797

The CPU detection has been in

The CPU detection has been in the Linux Apps for quite a while now, with S5R1 we're using it in the Windows Apps, too. So I'd consider this to be well tested.

There are ways to improve it a bit further, but I doubt that it will work without problems on all CPUs that have been overheated, re-labled or whatever. Modern CPUs shouldn't be a problem, as the mechanisms and instructions for detecting and reporting CPU capabilities are still the same.

I'll have another look at the detection code when I make the next generation of x86-Apps, but for now we have implemented this manual override capability in case of any problems.

BM

BM

chablis
chablis
Joined: 18 Jan 05
Posts: 10
Credit: 1716171
RAC: 0

The processors are old Athlon

The processors are old Athlon 1600MP. That is not a typo MP (multiprocessor).I just added the file so I will see what happens while I am at work today. I also have an Athlon 900 of about the same age that runs just fine so I guess it was detected properly.

Bob

[B^S] Paul@home
[B^S] Paul@home
Joined: 9 Feb 05
Posts: 62
Credit: 1734615
RAC: 0

Is there any info available

Is there any info available on what should be detected as the CPU type?

For example "CPU Type 1" does not mean a whole lot to me, but I guess from this thread that it means the app has detected an SSE capable CPU.

Should an SSE2 capable CPU report a different type?

In the future, it you use SSE3 or any other CPU instruction sets, do we know what type should be expected?

I guess I am asking if it would be possible to get a list of what CPUs / Instruction Sets come under each of the CPU Type 'umbrellas' as reported in the result.

cheers,

Paul.

ps... a little request! In future versions, would it be possible for the CPU type to also be output to the BOINC log at start / restart of the work unit?


Wanna visit BOINC Synergy site? Click my stats!

Join BOINC Synergy Team

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250176678
RAC: 34797

The "CPU type" is defined

The "CPU type" is defined w.r.t. the build, i.e. what features of the CPU the one who builds the App finds useful to distinguish in that build. The "CPU type 0" will always mean "generic code", i.e. code that runs on all CPUs without requiring special features. Apart from that, the meaning of the CPU types may vary between versions.

In the current x86 Apps there is only "CPU type 1" (other than 0), which corresponds to SSE.

In previous PPC Mac Apps "CPU type 1" meant "has AltiVec". However, in the latest version, "CPU type 1" means a G4 CPU, and for the first time there is a new "CPU type 2" which means a G5 CPU.

I don't see a reason why messages from the Apps should appear between the messages of the BOINC Client e.g. in what you see in the "Messages" Tab of the BOINC Manager, and from the top of my head I don't see a way of coding this in the App. You may write or modify a BOINC Client to also log the stderr messages from the Apps it is running, but no Client I'm aware of at the moment does this.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.