Just added Einstein, but only getting CPU tasks?

Steve
Steve
Joined: 15 Jan 13
Posts: 6
Credit: 68239
RAC: 0
Topic 196742

Hi. Apologies if this is answered elsewhere, but I don't see the answer and may be missing something. I have an old ATI Radeon 2400 that I'd like to get going and read that I could add Einstein as a project in addition to my Seti@home efforts and it would use the GPU. Seti is down right now so I added Einstein, but it looks like I'm running CPU only... Log file seems to infer I have no ATI tasks.
How do I know when a task is running CPU vs GPU?
How do I get the GPU working please?

Thanks in advance
Steve

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

Just added Einstein, but only getting CPU tasks?

Quote:
How do I know when a task is running CPU vs GPU?


The status column of BOINC manager's Tasks tab (advanced view) shows that.

Quote:
How do I get the GPU working please?


You can't here at Einstein@home. From your last-contact output:
2013-01-15 04:05:47.0966 [PID=28232] [version] ATI device (or driver) doesn't support OpenCL
The following is from the sticky thread for BRP CUDA requirements, but I think it also applies to your GPU:
- ca 300 MB of free RAM required on graphics card
Yours only has 256 MB total RAM.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: I have an old ATI

Quote:
I have an old ATI Radeon 2400 that I'd like to get going and read that I could add Einstein as a project in addition to my Seti@home efforts and it would use the GPU.

I think you need a HD4000 or better card for it to be OpenCL capable. So a Radeon 2400 won't work here at Einstein.

Steve
Steve
Joined: 15 Jan 13
Posts: 6
Credit: 68239
RAC: 0

Thanks to both. Multiple

Thanks to both. Multiple things learned...

The wiki and Google said the 2400 was OpenCL capable. I updated drivers and checked it with the benchmark tool. NOPE! Not OpenCL capable, so I appreciate the info.
Also, I was unaware of the last contact log, most helpful with entries not in my event log.
I have a GT630 on order now, that should fix it!

Mike.Gibson
Mike.Gibson
Joined: 17 Dec 07
Posts: 21
Credit: 3845342
RAC: 6728

Hi I have just re-joined

Hi

I have just re-joined Einstein with my new machine which has an Intel GPU. I am not getting any GPU units but am getting this message: 01/01/2014 03:02:41 | Einstein@Home | see scheduler log messages on http://einstein.phys.uwm.edu//host_sched_logs/7591/7591557

I do not understand what it is saying in there. I am only an old pensioner, so am not too familiar with modern computing.

Can someone help please?

Mike

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 117990464803
RAC: 21145515

Hi Mike, welcome

Hi Mike, welcome back!

Quote:
... I am not getting any GPU units but am getting this message: 01/01/2014 03:02:41 | Einstein@Home | see scheduler log messages on http://einstein.phys.uwm.edu//host_sched_logs/7591/7591557


This thread was posted some time ago to announce the availability of an Intel GPU app. If you read the opening message very carefully you will see that a certain driver version is required for the app to work. If you don't have that driver version, your work requests will be denied. From the scheduler log message that you refer to above, here is the relevant bit:-

2014-01-01 05:50:20.8804 [PID=17358]    [version] Checking plan class 'opencl-intel_gpu'
2014-01-01 05:50:20.8804 [PID=17358]    [version] parsed project prefs setting 'gpu_util_brp': 0.000000
2014-01-01 05:50:20.8804 [PID=17358]    [version] OpenCL device version required min: 102, supplied: 101


This message is telling you that your integrated GPU is being detected as OpenCL 1.1 capable but the requirement is for an OpenCL 1.2 capability.

The opening message in the thread I linked to above, explains about this and gives a link to the Intel website where you can get a 1.2 capable driver. There is also mention in this message of a 'feedback' thread. You should read both the announcement thread and the feedback thread where you will find the experiences of others in getting a suitable driver so that the integrated GPU can be used for crunching. Unfortunately you have to do a bit of work (and reading) in order to get all the information you need to get things working.

Quote:
I do not understand what it is saying in there.
Can someone help please?


It's saying quite a lot, isn't it? :-).
It looks like you have managed to 'lose' a bunch of CasA tasks and the server wants to resend them to you but it's not prepared to do so because it says you have no suitable app with which to crunch them. Here is a snippet:-

....
2014-01-01 05:50:20.4587 [PID=17358]    [version] no app version available: APP#24 (einstein_S6CasA) PLATFORM#9 (windows_x86_64) min_version 0
2014-01-01 05:50:20.4587 [PID=17358]    [version] no app version available: APP#24 (einstein_S6CasA) PLATFORM#2 (windows_intelx86) min_version 0
2014-01-01 05:50:20.4587 [PID=17358] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416447198]: no app version for einstein_S6CasA
2014-01-01 05:50:20.4595 [PID=17358] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416447199]: no app version for einstein_S6CasA
2014-01-01 05:50:20.4603 [PID=17358] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416447200]: no app version for einstein_S6CasA
2014-01-01 05:50:20.4610 [PID=17358] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416447201]: no app version for einstein_S6CasA
....


It looks like there are about 40 of these 'lost' tasks that can't be resent.

This is quite a separate issue and nothing to do with proper detection of the Intel GPU. It looks like your BOINC client has a corrupted state file (client_state.xml) which lives in the BOINC Data directory and whose contents are communicated to the server each time your client makes a request. Something must have gone wrong at your end and the server is complaining about the discrepancy. Maybe you can tell us if you had any crashes which may have corrupted your state file?

When you open BOINC manager (tasks tab) can you see the same number of tasks there as are listed for your computer on the website? If there are about 40 missing, you must have local corruption that isn't going to fix itself. You could get the server to resend missing tasks if you could edit your state file to correct the 'missing app version' that the server is complaining about. This is probably way beyond your comfort zone so the simplest thing for you is to crunch and report any tasks you do have - is your machine actually doing any crunching at the moment? If nothing is working, you should click the 'reset the project' after selecting the Einstein project on the 'projects' tab.

Just realise that a project reset throws everything away and starts afresh so you shouldn't do this if things actually seem to be working. If in doubt, give us more details about what is actually happening.

Cheers,
Gary.

Mike.Gibson
Mike.Gibson
Joined: 17 Dec 07
Posts: 21
Credit: 3845342
RAC: 6728

Hi Gary Thanks for the

Hi Gary

Thanks for the response.

It looks like my GPU problem is going to take me a while to get my head round what you have said, so I'll try to sort out the other problem first.

I have just checked the list of my 'In Progress' jobs on the website and it lists 69. I have only 43 showing on BOINC Manager, so I appear to be 26 adrift.

I did a restart 44 hours ago because my machine was taking forever to do other work. Any crash must have o0ccurred at that time but I don't have a log.

I currently have work from Cosmology, Rosetta & WCG as well as Einstein. I am also subscribed to CPDN, LHC, MilkyWay & Seti. I suspect that it might be better to crunch what I have got and bring my BOINC Manager list to zero. Could I then be supplied with a new version of whatever file has been corrupted to overwrite what I do have or do the missing jobs need to be deleted from the websites.

I have 3 jobs missing from Rosetta (34/37) & none missing from WCG (122/122). All 40 are missing from Seti, but they only arrived after the restart.

I can't tell what the situation with Cosmology is because they don't have their jobs listed on their website, so far as I can see. The same applies to CPDN, LHC & MilkyWay.

Happy New Year.

Mike

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 117990464803
RAC: 21145515

RE: ... I have just checked

Quote:
... I have just checked the list of my 'In Progress' jobs on the website and it lists 69. I have only 43 showing on BOINC Manager, so I appear to be 26 adrift.


The 69 tasks showing on the website are all categories. There are 5 already returned and 1 error so only 63 in progress. The difference is actually 20 which is now the precise number being complained about in the latest scheduler contact log. Here is the full list:-

2014-01-01 15:17:54.3209 [PID=31886]   Request: [USER#xxxxx] [HOST#7591557] [IP xxx.xxx.xxx.139] client 7.2.33
2014-01-01 15:17:54.3354 [PID=31886]    [send] effective_ncpus 8 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2014-01-01 15:17:54.3355 [PID=31886]    [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2014-01-01 15:17:54.3355 [PID=31886]    [send] Not using matchmaker scheduling; Not using EDF sim
2014-01-01 15:17:54.3355 [PID=31886]    [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2014-01-01 15:17:54.3355 [PID=31886]    [send] Intel GPU: req 864180.00 sec, 1.00 instances; est delay 0.00
2014-01-01 15:17:54.3355 [PID=31886]    [send] work_req_seconds: 0.00 secs
2014-01-01 15:17:54.3355 [PID=31886]    [send] available disk 96.33 GB, work_buf_min 0
2014-01-01 15:17:54.3355 [PID=31886]    [send] active_frac 0.999916 on_frac 0.977969 DCF 1.689416
2014-01-01 15:17:54.3397 [PID=31886]    [version] Checking plan class 'SSE2'
2014-01-01 15:17:54.3406 [PID=31886]    [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2014-01-01 15:17:54.3406 [PID=31886]    [version] plan class ok
2014-01-01 15:17:54.3406 [PID=31886]    [version] Don't need CPU jobs, skipping version 105 for einstein_S6CasA (SSE2)
2014-01-01 15:17:54.3406 [PID=31886]    [version] no app version available: APP#24 (einstein_S6CasA) PLATFORM#9 (windows_x86_64) min_version 0
2014-01-01 15:17:54.3406 [PID=31886]    [version] no app version available: APP#24 (einstein_S6CasA) PLATFORM#2 (windows_intelx86) min_version 0
2014-01-01 15:17:54.3406 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416447229]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3416 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416447265]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3424 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448579]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3434 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448580]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3441 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448581]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3449 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448592]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3456 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448593]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3464 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448594]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3471 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448595]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3479 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448606]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3486 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448607]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3494 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448608]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3501 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448609]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3509 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448610]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3516 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448611]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3524 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448612]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3531 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448613]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3539 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448614]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3546 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448615]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3554 [PID=31886] [CRITICAL]   [HOST#7591557] can't resend [RESULT#416448628]: no app version for einstein_S6CasA
2014-01-01 15:17:54.3554 [PID=31886]    [send] [HOST#7591557] is reliable
....
....

Each of those 20 has a 'RESULT#' (ie a result (task) ID) which you can correlate with your tasks list on the website. This link actually shows task IDs rather than task names so if you compare the task IDs from the website to the RESULT#s from the scheduler log you will see the ones being complained about are exactly the first page of 20 tasks in your tasks list.

I've had more of a think about the scheduler log messages. Where it says "can't resend [RESULT#nnnnnnnn]: no app version for einstein_S6CasA", I'm now thinking I was wrong in assuming this meant the app wasn't recorded in your state file. It can't be that otherwise all your CasA tasks would have been trashed, not just some of them. I'm thinking that perhaps you may have changed your preferences to exclude S6CasA from your list of allowed applications. This shouldn't affect tasks 'on board' but would preclude the scheduler from sending any new ones and would probably preclude the scheduler from resending any 'lost' ones.

Quote:
I did a restart 44 hours ago because my machine was taking forever to do other work. Any crash must have o0ccurred at that time but I don't have a log.


Stopping and restarting BOINC shouldn't cause any problems. You actually do have a log which will have recorded the stopping and restarting. It's stored in a file named 'stdoutdae.txt' in your BOINC Data directory and can be browsed by any text viewer such as Windows Notepad. It might be worthwhile browsing that file to see if anything unusual is recorded there.

Can you tell us if you have made any recent preference changes and can you also look at your project specific preferences and tell us exactly what applications you have 'allowed'?

What tasks (for whatever project) are currently running and are all of them making normal progress? Are there any completed (100% progress) ones showing? If so can you 'report' them? To do that manually, select the project on the 'projects' tab and click 'update'.

Cheers,
Gary.

Mike.Gibson
Mike.Gibson
Joined: 17 Dec 07
Posts: 21
Credit: 3845342
RAC: 6728

Sorry, Gary I forgot to

Sorry, Gary

I forgot to allow for the jobs on my old machine so I have now accounted for the other 6 Einstein and all the other projects.

I have made no changes to anything recently, since I re-started the other 7 projects.

I checked the log for activity. There hadn't been any for Einstein. There was activity on other projects so I suspended all jobs other than Einstein and 8 started or re-started normally. After about 40 minutes I resumed all jobs and all 8 are now listed as waiting to run with time elapsed and progress retained.

Regards

Mike

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 117990464803
RAC: 21145515

RE: I forgot to allow for

Quote:
I forgot to allow for the jobs on my old machine so I have now accounted for the other 6 Einstein and all the other projects.


Are you still crunching with the old machine? If so, fine, but if not, could you please fire it up one last time, set 'no new tasks' and then abort and report the ones you still have there? That way they can be reissued immediately rather than having to wait two weeks to time out.

Quote:
I have made no changes to anything recently, since I re-started the other 7 projects.


I had noticed you were supporting 8 projects in total but at least half of them seemed to be inactive (RAC of zero). If they are all active, you may have a problem with too much work in your cache. Do all projects have equal resource shares and are all supplying work? WCG has by far the largest RAC which suggests it has a large resource share compared to the others. If so, BOINC is going to have real trouble in getting all the Einstein work done by the deadline.

In the time I've been looking (>1 day), only 1 extra Einstein task has been returned. There are still 62 showing as 'in progress' and there is less than 10 days to deadline. I think your cache settings may be too large for the number of projects you are supporting. I notice you are using the latest BOINC so you must have upgraded that relatively recently. Were you previously using a version 6 (or earlier) BOINC on your old machine? Are you aware of the changes to cache settings between V6 and V7? If you haven't allowed for this it could cause some 'unfortunate' behaviour. Until you get things settled, I would strongly suggest you check under 'computing preferences>network settings' and set the first setting to a maximum of 1 day and the second setting to 0.01 days. For V6 BOINC you would have used something very low (probably zero) for the first setting and the actual 'days' you required in the second setting. For V7 it needs to be the other way around. The first setting is now your cache 'low water mark' and the second is the increment to create a 'high water mark'. This is why you need a low value for the second setting - unless you really do want a work cache quantum that oscillates up and down over time.

Quote:
I checked the log for activity. There hadn't been any for Einstein. There was activity on other projects so I suspended all jobs other than Einstein and 8 started or re-started normally. After about 40 minutes I resumed all jobs and all 8 are now listed as waiting to run with time elapsed and progress retained.


What you describe seems normal for a situation where other projects are preferred over Einstein. Do you have a low resource share setting for Einstein? If all 8 threads have gone to other projects, that suggests BOINC has been set up to give more resources to projects other than Einstein. With the WCG RAC being so high relative to others, it looks like you are doing lots of WCG work. If this is what you want, after you set your cache settings as suggested, you should consider aborting quite a number of the excess Einstein tasks before BOINC decides to go into 'Panic mode' in order to get them done before deadline.

Cheers,
Gary.

Mike.Gibson
Mike.Gibson
Joined: 17 Dec 07
Posts: 21
Credit: 3845342
RAC: 6728

RE: RE: I forgot to allow

Quote:
Quote:
I forgot to allow for the jobs on my old machine so I have now accounted for the other 6 Einstein and all the other projects.

Are you still crunching with the old machine? If so, fine, but if not, could you please fire it up one last time, set 'no new tasks' and then abort and report the ones you still have there? That way they can be reissued immediately rather than having to wait two weeks to time out.

I am continuing to crunch with my old machine for a limited time. I abort anything that I don't expect to finish before the deadline at the earliest opportunity, usually about 1 or 2 days before the deadline, based on the time remaining listed. I will probably abort all but 24 hours per core 1 day before switch-off.

Quote:
Quote:
I have made no changes to anything recently, since I re-started the other 7 projects.

I had noticed you were supporting 8 projects in total but at least half of them seemed to be inactive (RAC of zero). If they are all active, you may have a problem with too much work in your cache. Do all projects have equal resource shares and are all supplying work? WCG has by far the largest RAC which suggests it has a large resource share compared to the others. If so, BOINC is going to have real trouble in getting all the Einstein work done by the deadline.

In the time I've been looking (>1 day), only 1 extra Einstein task has been returned. There are still 62 showing as 'in progress' and there is less than 10 days to deadline. I think your cache settings may be too large for the number of projects you are supporting. I notice you are using the latest BOINC so you must have upgraded that relatively recently. Were you previously using a version 6 (or earlier) BOINC on your old machine? Are you aware of the changes to cache settings between V6 and V7? If you haven't allowed for this it could cause some 'unfortunate' behaviour. Until you get things settled, I would strongly suggest you check under 'computing preferences>network settings' and set the first setting to a maximum of 1 day and the second setting to 0.01 days. For V6 BOINC you would have used something very low (probably zero) for the first setting and the actual 'days' you required in the second setting. For V7 it needs to be the other way around. The first setting is now your cache 'low water mark' and the second is the increment to create a 'high water mark'. This is why you need a low value for the second setting - unless you really do want a work cache quantum that oscillates up and down over time.

Quote:
I checked the log for activity. There hadn't been any for Einstein. There was activity on other projects so I suspended all jobs other than Einstein and 8 started or re-started normally. After about 40 minutes I resumed all jobs and all 8 are now listed as waiting to run with time elapsed and progress retained.

What you describe seems normal for a situation where other projects are preferred over Einstein. Do you have a low resource share setting for Einstein? If all 8 threads have gone to other projects, that suggests BOINC has been set up to give more resources to projects other than Einstein. With the WCG RAC being so high relative to others, it looks like you are doing lots of WCG work. If this is what you want, after you set your cache settings as suggested, you should consider aborting quite a number of the excess Einstein tasks before BOINC decides to go into 'Panic mode' in order to get them done before deadline.

I updated the version of BOINC as soon as I noticed it was available which was when I was crunching WCG only. From March until the end of December I was just crunching WCG by setting 'No New Tasks' on the others, hence the high relative RAC. The resource shares have remained at CPDN 35% (but nothing available at present), Cosmology 10%, Einstein 5%, LHC 10%, MilkyWay 10%, Rosetta 7.5%, Seti 10% & WCG 12.5%.

The earliest Einstein units are not due for another 9+ days (8+ days for the first Rosetta). They should be finished in time, provided that the system estimate is not too far out. I am currently crunching the last of my WCG units, from before I released the other projects, which are due in 48 hours. Based on the system estimate, the system estimate would give a completion 10 minutes late, but I am finding that the units are being finished faster than the official estimate, so I don't expect any problems. I presume that the earlier date is why they are currently being given preference. When I restarted the other projects, they were given preference until the deadline for WCG units kicked in.

I hadn't noticed any advice of a need to adjust the cache settings but have now changed them to 5/1 instead of 0/10. I will see how that pans out.

From message 12848:

Quote:
I currently have work from Cosmology, Rosetta & WCG as well as Einstein. I am also subscribed to CPDN, LHC, MilkyWay & Seti. I suspect that it might be better to crunch what I have got and bring my BOINC Manager list to zero. Could I then be supplied with a new version of whatever file has been corrupted to overwrite what I do have or do the missing jobs need to be deleted from the websites.

The 2 records need to be reconciled, so have you any suggestions, please?

Regards

Mike

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.