Glad I found this and I usually don't even look around since I figure I should be able to figure out any problems myself.
BUT even though I have had my 660Ti SC running 24/7 for years I had all my other GPU's taking a break this year to help them move the Cern vLHC/T4T over to LHC.
Last night I decided to get my slow laptop with the old 610m running here again since Einstein GPU tasks do not depend on a internet connection 24/7 like the dreaded VB does.
First try I got d/l errors so I decided to update the Geforce to the newest driver and then tried again but since I got almost 200 d/l errors I had to wait for the server to let me give it another try.
Well as some of you know you can d/l everything except that JPLEPH
Everything else had no problem but that sat there at 0% even though it looked like it was doing something.....finally finished with that
10/17/2017 12:10:37 PM | Einstein@Home | Finished download of JPLEPH.405
This laptop never was the fastest but it was about to hit the 12million credit mark
And just a few days ago I went through the long job of doing another clean install of Windows 10 and Boinc and VB and everything else I needed.
Must admit I never expected a laptop to run 24/7 on its way to the 6 year mark here but it did.
Did anyone do the manual fix on a X64 Win 10 ?
For some reason when I tried doing things like that it just wasn't what I was used to doing with Win7 (for Cern projects)
And I have been thinking lately about taking 3 of my Geforce cards out of the older quad-cores and put them in the three new 8-core pc's I put together (they have been running and testing all the LHC projects since I fired them up).......but it sure would be nice to just add the GPU project and d/l and get them to work without figuring out the manual fix.......but then if you do it once the next several times are no problem (oh and I see this site would not let me use a simple *img* bbcode style)
Well as some of you know you can d/l everything except ...
You don't need to download JPLEPH.405 because it's not the problem. I've left in below, the relevant bit of the error message that tells you what needs to be fixed.
All you need to do is stop BOINC, open client_state.xml in a plain text editor like notepad, search for the <file> ... </file> block that refers to JPLEPH.405 and change the 32 char string that was expected (ie the one that is actually wrong) into the one that is found - which is the correct one. Just be careful to replace 32 chars with exactly 32 chars. Copy and paste is the easiest, most reliable way. Save the file making sure the name is correct and restart BOINC.
Sorry that it took us so long to notice this thread/problem, we were just too busy with the latest "events"
Anyhow, we've removed the offending md5/file and new workunits should get the correct md5 value again. We're also analyzing the problem to find out how this happened to prevent it from happening again.
I'm sorry you are having problems like this and it would be really good to get this sorted as quickly as possible. So let me give you some tips about how best to do that.
If you are going to use an existing thread, be really sure your problem is a carbon copy of what is already being discussed. The best way to do that is to go to your account page on the website, find the computer that's having the problem, click on the tasks link for it. You can filter the big list that shows up according to application and the 'state' (eg error) of the tasks you are interested in. In your case this is still a big list of errors but if you go to the very bottom of the list (there is a link to 'last') you can quickly find the most recent problem task.
If you click on the task ID link for a problem task you will be able to see exactly what was returned to the project. Even if you think it's mumbo jumbo, you should scroll through it looking for clues like the word ERROR (or similar), and when you find that you will be able to tell if it's the same problem already being discussed. For your problem, here it is.
% Starting semicoherent search over f0 and f1.
% nf1dots: 31 df1dot: 3.344368011e-015 f1dot_start: -1e-013 f1dot_band: 1e-013
% Filling array of photon pairs
ERROR: /home/bema/fermilat/src/bridge_fft_clfft.c:1176: clFinish failed. status=-36
error in opencl_qsort
15:33:41 (320948): [CRITICAL]: ERROR: MAIN() returned with error '1'
There is no reference to a bad MD5 checksum (in the above or anywhere else that I saw) so you really should NOT be using the current thread. If you had started a new thread with just the above excerpt and a link to the host having the problem, all people willing to help can immediately see if they have enough expertise to know exactly what the problem is. In my case, I don't, so all I can do is wish you luck in finding someone who does. The important point is that by doing something like what I have suggested, you will maximise your chances of getting help quickly.
My guess is that you will need someone quite familiar with the code of the GPU app to interpret this. Such people are always busy so it will pay dividends if you do as much of the grunt work as possible for them. While you wait for someone to respond, here is a thought for you to consider. By reading the above messages, it seems that the operation in progress at the time of failure was one of loading data into memory. A routine called clFinish was the bit of code that had the problem. Sounds like something went wrong whilst trying to complete the loading of data into memory - perhaps a memory write error of some sort. My immediate reaction is to ask if your card has a memory overclock?
Please don't take offense at anything I've written. I have zero intention of chastising or insulting you in any way. My main aim is to encourage all the other people who might read this to think about how best to ask for help.
Greetings All Hey thankyou
)
Greetings All
Hey thankyou those workarounds are just what was needed, I have the problem machine now cruching.
Regards
NICE!! thanks kindly Den77 &
)
NICE!! thanks kindly Den77 & GUL, this worked a treat. You guys rock :)
What a pity users were forced to take matters into their own hands and hack the client state in order to fix a server side problem. :(
Thanks to all of those who
)
Thanks to all of those who posted how to "fix" the jpleph.405 checksum issue.
It seem strange that this has been going on for a week and hasn't been resolved.
Does anyone from the project read the message boards?
Thank you [AF>EDLS]GUL, did
)
Thank you [AF>EDLS]GUL, did precisely that and am getting Computation errors.
BamaMath wrote:Thank
)
You're welcome.
I am also getting some computation errors, but I suspected my hardware. Anyone else using this workaround and having computation errors ?
Glad I found this and I
)
Glad I found this and I usually don't even look around since I figure I should be able to figure out any problems myself.
BUT even though I have had my 660Ti SC running 24/7 for years I had all my other GPU's taking a break this year to help them move the Cern vLHC/T4T over to LHC.
Last night I decided to get my slow laptop with the old 610m running here again since Einstein GPU tasks do not depend on a internet connection 24/7 like the dreaded VB does.
First try I got d/l errors so I decided to update the Geforce to the newest driver and then tried again but since I got almost 200 d/l errors I had to wait for the server to let me give it another try.
Well as some of you know you can d/l everything except that JPLEPH
Everything else had no problem but that sat there at 0% even though it looked like it was doing something.....finally finished with that
10/17/2017 12:10:37 PM | Einstein@Home | Finished download of JPLEPH.405
10/17/2017 12:10:37 PM | Einstein@Home | [error] MD5 check failed for JPLEPH.405
10/17/2017 12:10:37 PM | Einstein@Home | [error] expected d41d8cd98f00b204e9800998ecf8427e, got d6ce12bacd2a81a56423f5f238ba84eb
10/17/2017 12:10:37 PM | Einstein@Home | [error] Checksum or signature error for JPLEPH.405
And another 16 "download failed"
Stderr output
MAGIC Quantum Mechanic
)
You don't need to download JPLEPH.405 because it's not the problem. I've left in below, the relevant bit of the error message that tells you what needs to be fixed.
All you need to do is stop BOINC, open client_state.xml in a plain text editor like notepad, search for the <file> ... </file> block that refers to JPLEPH.405 and change the 32 char string that was expected (ie the one that is actually wrong) into the one that is found - which is the correct one. Just be careful to replace 32 chars with exactly 32 chars. Copy and paste is the easiest, most reliable way. Save the file making sure the name is correct and restart BOINC.
Cheers,
Gary.
Hi guys, Sorry that it took
)
Hi guys,
Sorry that it took us so long to notice this thread/problem, we were just too busy with the latest "events"
Anyhow, we've removed the offending md5/file and new workunits should get the correct md5 value again. We're also analyzing the problem to find out how this happened to prevent it from happening again.
Cheers,
Oliver
Einstein@Home Project
Oliver, Still getting
)
Oliver,
Still getting computation errors.
Einstein@Home 1.20 Gamma-ray pulsar binary search #1 on GPUs (FGRPopencl1K-nvidia) LATeah0042L_1164.0_0_0.0_5901010_1 00:00:19 (00:00:04) 23.27 100.0000000 - 11/1/2017 3:26:39 PM 1 CPU + 1 NVIDIA GPU Computation error Jakes-DT 0.00 MB 0.00 MB 10/18/2017 3:26:40 PM
Cheers,
- Jake
BamaMath wrote:Oliver,
)
Hi Jake,
I'm sorry you are having problems like this and it would be really good to get this sorted as quickly as possible. So let me give you some tips about how best to do that.
If you are going to use an existing thread, be really sure your problem is a carbon copy of what is already being discussed. The best way to do that is to go to your account page on the website, find the computer that's having the problem, click on the tasks link for it. You can filter the big list that shows up according to application and the 'state' (eg error) of the tasks you are interested in. In your case this is still a big list of errors but if you go to the very bottom of the list (there is a link to 'last') you can quickly find the most recent problem task.
If you click on the task ID link for a problem task you will be able to see exactly what was returned to the project. Even if you think it's mumbo jumbo, you should scroll through it looking for clues like the word ERROR (or similar), and when you find that you will be able to tell if it's the same problem already being discussed. For your problem, here it is.
There is no reference to a bad MD5 checksum (in the above or anywhere else that I saw) so you really should NOT be using the current thread. If you had started a new thread with just the above excerpt and a link to the host having the problem, all people willing to help can immediately see if they have enough expertise to know exactly what the problem is. In my case, I don't, so all I can do is wish you luck in finding someone who does. The important point is that by doing something like what I have suggested, you will maximise your chances of getting help quickly.
My guess is that you will need someone quite familiar with the code of the GPU app to interpret this. Such people are always busy so it will pay dividends if you do as much of the grunt work as possible for them. While you wait for someone to respond, here is a thought for you to consider. By reading the above messages, it seems that the operation in progress at the time of failure was one of loading data into memory. A routine called clFinish was the bit of code that had the problem. Sounds like something went wrong whilst trying to complete the loading of data into memory - perhaps a memory write error of some sort. My immediate reaction is to ask if your card has a memory overclock?
Please don't take offense at anything I've written. I have zero intention of chastising or insulting you in any way. My main aim is to encourage all the other people who might read this to think about how best to ask for help.
Cheers,
Gary.