Main Hard Drive fail and Data Lost...

Nobody316
Nobody316
Joined: 14 Jan 13
Posts: 141
Credit: 2008126
RAC: 0
Topic 196759

Ok earlier today had some hard drive issues with linux and windows. Linux would not show the desktop but could right click desktop. Switched over to windows and got blue screen of death. At first I thought in windows was because of a driver update from windows update. After looking through updates I didn't find the driver. It was something about opencl I think it was "don't remember as I could not find it" and windows update didn't show it again.

Reinstalled windows and again blue screen. I started digging through alot of disk I have with software I have gotten over the years. I started with Hiren's Boot disk. It has many, many, many useful tools. I find out the hard drive overall health is 12%. I have tried all the tools I have "which is almost all the tools for hard drives" nothing can access the drive and Hiren's shows it's locked and frozen. The linux disk I have can't do anything with it either. It can't be formatted, accessed, read or write to... nothing...

The hard drive has had a fairly long life of about 5 years which out of that was on a good 75% or more of the time. I had hoped it would have lasted another month or 2 in which I was/am going to be getting another new drive. It was not new but Refurbished. I am now using the backup hard drive that came out of my laptop when I bought it. Since this drive is smaller I am only going to run windows 7 for now until the upgrade to bigger drive.

Ok so what's the point "you are asking" I lost all data for games :( but more importantly all data was lost for Boinc. Linux and windows side :-S Is there a way to recover that work "not from the toasted hard drive as I have already tried" to redownload it instead of new work so it does not have to time out. If there is how to do so. It will be about a day before I get Boinc setup on here as I am working on getting everything else done first as it takes alot to reinstall all my apps and games and many many restarts.

If anyone has problems with hard drive or other things I do recommend Hiren's Boot Disk Happy Crunching

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

soft spirit
soft spirit
Joined: 27 Oct 10
Posts: 113
Credit: 5880079
RAC: 0

Main Hard Drive fail and Data Lost...

all those problems and you are STILL recommending Hiren's?

You might want to re-think that, and be sure it is not the cause of your failures first.

Nobody316
Nobody316
Joined: 14 Jan 13
Posts: 141
Credit: 2008126
RAC: 0

RE: all those problems and

Quote:

all those problems and you are STILL recommending Hiren's?

You might want to re-think that, and be sure it is not the cause of your failures first.

I guess you didn't really read this...

Quote:
I started digging through alot of disk I have with software I have gotten over the years. I started with Hiren's Boot disk.

or this...

Quote:
The hard drive has had a fairly long life of about 5 years which out of that was on a good 75% or more of the time.

or even the first few lines I wrote...

Quote:

Ok earlier today had some hard drive issues with linux and windows. Linux would not show the desktop but could right click desktop. Switched over to windows and got blue screen of death. At first I thought in windows was because of a driver update from windows update. After looking through updates I didn't find the driver. It was something about opencl I think it was "don't remember as I could not find it" and windows update didn't show it again.

Reinstalled windows and again blue screen.

there made it bold and it should be easier to read for you...

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

Janus
Janus
Joined: 10 Nov 04
Posts: 27
Credit: 23862534
RAC: 5

Now now, no reason to get

Now now, no reason to get upset. A lot of the time it is a good idea to have something monitoring the drives, like smartmontools (Bruce?) This way you may or may not have the opportunity to get a heads-up warning about the imminent failure of a drive. Of course this is easy to say in hindsight, but maybe it will save some data in the future.

Also, backups.

If the data was absolutely crucial to you you can typically get it back with the low-level tools from the manufacturer - it just takes forever and a good deal of the data may have "holes" if you (or the drive) decides to give up entirely before it has been fully recovered.

Nobody316
Nobody316
Joined: 14 Jan 13
Posts: 141
Credit: 2008126
RAC: 0

Game data does not matter

Game data does not matter though Boinc does matter but the turn around with the data is 30 mins or so. It would be alot of backing up all the time...

Quote:
Now now, no reason to get upset.

I am not upset. I have a hard time dealing with people and more so with things like this... "didn't really read it or jumped the gun"

I also have such tools but didn't have them installed yet as I was looking for the disks they are on.

Edit:
None the less if I can't get the data to finish the work here soon it will have to time out. Which is what I am seeking to do. The hard drive can't be accessed so I need to get it from the server again or it will have to time out which means it will have to be sent to someone else and if there is 1 waiting they will have to wait even longer... I have more than enough software for just about everything but it takes time to find it or remember it's name and try to download it... I have alot of cd's and dvd's but not alot of time to search each 1.

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6591
Credit: 324198817
RAC: 165069

There's also companies that

There's also companies that deal with data recovery. For a fee of course, and that then depends upon what it's worth to you. There's even hope if the drive controller has gone belly up. I had a RAID go sour - for want of a nail, a shoe was lost etc - but valuable photos are exactly that.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Nobody316
Nobody316
Joined: 14 Jan 13
Posts: 141
Credit: 2008126
RAC: 0

RE: There's also companies

Quote:

There's also companies that deal with data recovery. For a fee of course, and that then depends upon what it's worth to you. There's even hope if the drive controller has gone belly up. I had a RAID go sour - for want of a nail, a shoe was lost etc - but valuable photos are exactly that.

Cheers, Mike.

Yeah mike but by the time I could get that to happen it will time out on here. Money is tight until next month :(

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

There is absolutely no need

There is absolutely no need to worry about the Boinc data and work units as the system is built to handle the loss of work on a host. It will time out and then get sent to some other host to be crunched. Every work unit can be sent up to 20 times if need be.
The work we do takes months and years to process and if a few of the millions of work units gets delayed a couple of weeks it will hardly be noticed.

Everyone loses a few units form time to time, be it because of hardware failure or user mistake. When it happens to me and it's my fault I tend to shake my head, say a few choice words and then move on, no real harm done.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2725074
RAC: 1227

You can recover your Boinc

You can recover your Boinc Data fairly easily as long as your projects have 'resend lost tasks ' enabled,

Once you're got your host running a fresh OS, install Boinc, set each project's default 'Use CPU' & 'Use GPUs' preferences to No, then attach to each project as before, since your preferences are set to 'No' you shouldn't get any work,
if you do, set NNT quickly, crunch it and report it, Now shut down Boinc and follow this FAQ:

How to revert to an older HostID number?

Re-enable 'Use CPU' & 'Use GPU' and ask for work, lost tasks will be resent on the projects where it is enabled,

Claggy

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118337539061
RAC: 25412059

E@H uses the BOINC feature to

E@H uses the BOINC feature to 'resend lost tasks'. You can take advantage of this to recover an entire cache of work by making a few preparations before reinstalling BOINC on a replacement hard drive. If you only have one computer attached and you haven't made any backup copies of certain files, it's a bit more involved but still doable. You have to make the decision as to whether it's worth your time to attempt a recovery. I usually say it is because you tend to learn a lot by doing it.

If you want to recover your work cache, you need to create a BOINC Data directory in the correct location on your new hard drive and populate it with certain files so that when you get to actually reinstall BOINC these files will be found. Essentially, you are recreating the ID of your previous host which is found and used by the installer. That way when the newly reinstalled BOINC communicates with the scheduler, it will be recognised as having the former host's ID and all the missing tasks will be noticed and resent - in batches of 12 at a time for each scheduler contact.

Firstly you need your account authenticator. The easiest thing to do is grab a copy of the 'account_einstein.phys.uwm.edu.xml' file from the data directory of your other machine. You will use this file exactly as is.

Secondly, you need to create a very much stripped down state file 'client_state.xml'. Once again, grab a copy from your other machine as a starting point if you don't have a backup copy from your dead machine. It doesn't much matter as you will be removing most of the content. I'll post here a copy of a stripped down version of one of mine to give you an idea of how brutal you can be.
[pre]

0.999702
0.999891
0.999940
1358945604.007374

http://einstein.phys.uwm.edu/
Einstein@Home[/pre][pre] NNNNN+1
HHHHHH[/pre]
[pre] 0
0
100.000000
1.000000


http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi

1024
b03dfdccb9526079a9570304624a107059fc6abbb8b50c737102a9ef729543d8
0b1496aa6f36ac9d1f63bed351abb565637c3cede505d18878e93377787e4391
a0842b5605748fa6950e1556076d245178a9c50251986f3c7c293048ddc60318
329356bdfbc42f49006f65e742c7ead7e25f57f0ae2757e17c682a018a2b9e9f
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000010001
.

einstein_S6LV1
Gravitational Wave S6 LineVeto search

hsgamma_FGRP2
Gamma-ray pulsar search #2

JPLEPH.405
9319680.000000
0.000000
d6ce12bacd2a81a56423f5f238ba84eb
1

http://einstein6.aei.uni-hannover.de/EinsteinAtHome/download/29/JPLEPH.405

earth_09_11
1498566.000000
0.000000
e511a840ca97c90694949d8c6a301d26
1

http://einstein.ligo.caltech.edu/download/166/earth_09_11
http://einstein-dl4.phys.uwm.edu/download/166/earth_09_11
http://einstein-dl2.phys.uwm.edu/download/166/earth_09_11
http://einstein2.aei.uni-hannover.de/download/166/earth_09_11

sun_09_11
1467637.000000
0.000000
2dba4750c67426cffd1a267b6de3e904
1

http://einstein.ligo.caltech.edu/download/111/sun_09_11
http://einstein-dl4.phys.uwm.edu/download/111/sun_09_11
http://einstein-dl2.phys.uwm.edu/download/111/sun_09_11
http://einstein2.aei.uni-hannover.de/download/111/sun_09_11

S6GC1_T60h_v1_Segments.seg
3115.000000
0.000000
9701c844d6e9009f792e6532fbc81b80
1

http://einstein.ligo.caltech.edu/download/28a/S6GC1_T60h_v1_Segments.seg
http://einstein-dl4.phys.uwm.edu/download/28a/S6GC1_T60h_v1_Segments.seg
http://einstein-dl2.phys.uwm.edu/download/28a/S6GC1_T60h_v1_Segments.seg
http://einstein2.aei.uni-hannover.de/download/28a/S6GC1_T60h_v1_Segments.seg


einstein_icon.png
stat_icon


eah_slide_11.png
slideshow_einsteinbinary_BRP4_00


eah_slide_13.png
slideshow_einsteinbinary_BRP4_01


eah_slide_05.png
slideshow_einsteinbinary_BRP4_02


eah_slide_07.png
slideshow_einsteinbinary_BRP4_03


eah_slide_12.png
slideshow_einsteinbinary_BRP4_04


eah_slide_11.png
slideshow_einstein_S6Bucket_00


eah_slide_02.png
slideshow_einstein_S6Bucket_01


eah_slide_05.png
slideshow_einstein_S6Bucket_02


eah_slide_07.png
slideshow_einstein_S6Bucket_03


eah_slide_08.png
slideshow_einstein_S6Bucket_04

[/pre]

The above could be trimmed even harder but since you have a second machine to copy from, I suggest you leave stuff in that you should be able to decide is relevant. You can save downloading of things like the app(s) you will need and any fixed files, like the ones I've left the blocks for, in the above example. If you put copies of the files in the correct places, they will be found and not downloaded unnecessarily. You wont have copies of transient data files, workunits, results, etc, so these all need to be trimmed from your ultimate state file template.

I've highlighted in red the two critical things the scheduler needs to accept the ID of your host. If you look at your computer list on the website, you can see the hostID of the host which has the tasks you want to recover. Click on the details link for that host and on the next page find the value for 'Number of times client has contacted server'. You will be adding 1 to this number and inserting it in place of 'NNNNN+1' above. You will insert the hostID in place of 'HHHHHH'. Those two things are what is required for the scheduler to accept your host's identity.

You will notice a few lines below the lines in red, the line. I strongly advise that you add this line into your template because it sets BOINC for 'No new tasks' on startup. This allows you to start BOINC and then browse the event log in BOINC Manager at your leisure to make sure every thing looks normal before BOINC tries to send a work request to the scheduler. It also allows you time to review and set any local prefs (if you need to). When ready you can click the button to allow more work.

Your next step is to place copies of your account file and state file in the BOINC Data directory. You should also create a 'projects' subdirectory there and in that directory a further subdirectory called 'einstein.phys.uwm.edu'. This last one is your Einstein project directory and in it you can place copies of every fixed file you can lay your hands on if you wish to prevent unnecessary downloading of files that already exist (on your other machine for example). Because I have many hosts, I keep project directory templates filled with all needed files - saves time and bandwidth when setting up new hosts.

You wont have transient data like the 'p*.bin4' files for BRP4 or the 'h1_*' and 'l1_*' files for the GW searches. That doesn't matter because the scheduler should notice they are missing and resend them at the time it is resending all your lost tasks. Don't confuse tasks (which are sets of parameters that come in a sched_reply) with the data files on which those tasks depend.

Once you have created and populated the BOINC Data directory and the various subdirs, you should be ready to reinstall BOINC. It doesn't need to be the same version but if you have the previous installer, just use it. During the installation be very careful to make sure the installer finds and uses the Data directory you have prepared for it. Once the installation completes and BOINC fires up it will find the files you have prepared for it and it should not contact the server until you allow it to do so. With 'No new tasks' (NNT) still set, click the update button and you should get a 'master file download' (which is a very good sign :-)) followed by a batch of 12 'lost tasks' accompanied by data downloads. When that lot finishes, click 'update' again and get a further batch of lost tasks. Repeat until you have got all your tasks back. You will get all your lost tasks even though NNT is still set.

When you have the lot, check that your work cache settings are appropriate and finally you can unset NNT. If you get this far unscathed, congratulations are in order. There are lots of opportunities to overlook little things or make typos and there are details I've probably forgotten in writing this. If you have questions, please ask. It's quite daunting the first time you do something like this but once you've done it, it all seems quite easy and natural.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.