Excessive Disk Usage

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

RE: Jord's Trac ticket was

Message 97290 in response to message 97289

Quote:
Jord's Trac ticket was closed. :D


I see, but I also see from the code that it will delete everything in the directory. Good... but for if you run an anonymous platform from there... then you do not want to delete everything on a project reset. ;-)

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2952260189
RAC: 698660

Not for the first time in

Not for the first time in BOINC's history, I don't think this one has been thoroughly thought through.

I'm not worried about app_info: anyone who uses the anonymous platform mechanism should expect it to be deleted, and have a copy of their installation to hand if it's a long-term setup. App_info and the associated files would be deleted on detach/re-attach too. Here at Einstein, the main need for a project reset under AP conditions is when a Beta run has completed, and you want to revert to production status for the next few months: you'd be deleting app_info anyway, so this way works even better.

No, the problem for Einstein is that the data files are marked . I've just tested it on a machine running v6.10.45, which has the new "empty project folder" code. It works: immediately after reset, the cupboard is bare. But the data file references are not removed from client_state.xml

I haven't been crunching Einstein for a while on this box - I set 'No New Tasks', and it remains that way. But at the next BOINC restart after the project reset, the missing files were detected and downloaded all over again: I already have 20 data files, totalling 72.4MB, back in the project directory - but no program files.

I'll write a bug report for David after lunch.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

RE: App_info and the

Message 97292 in response to message 97291

Quote:
App_info and the associated files would be deleted on detach/re-attach too.


Yes, but there it's normal. By detaching you're deleting the whole Projects/ sub-directory complete with all its contents, all project files in the main BOINC directory (account*, job_log*, master*, sched*, statistics*), and all entries for that project in client_state.xml

On pre-6.10.45 the directories with an app_info.xml file in them would be left alone on a project reset, nothing in them would be deleted. Which may have been a bug all along. Although I thought of it as a nice feature. ;-)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117359251251
RAC: 35738730

RE: I'll write a bug report

Message 97293 in response to message 97291

Quote:
I'll write a bug report for David after lunch.


Richard, I've really got to hand it to you! You have a superb understanding of lots of problems and a marvelous way of describing them in an entirely clear and concise way. Your bug report to DA was first class! Thanks very much for taking the time to do it. I really don't know how you hold down a day job as well as give the support to BOINC that you do.

You captured the nub of the matter in your last sentence, since the alternatives really are about whether or not E@H participants want economy of disk space or economy of bandwidth and that's not an easy or even clear cut decision for a lot of people.

For a long time I've had a (probably unreasonable) fixation about resetting projects unnecessarily. Abhorrence of wasted bandwidth is my excuse for that. I've actually used it a few times of late and feel I understand it better now, particularly with the help of Jord's always useful comments to prod me into considering alternative viewpoints.

From an E@H perspective, it seems to me that perhaps three alternative reset/detach type commands might be useful, modeled something along the lines below. I'm not claiming to have given this heavy thought so what I'm suggesting might well be impractical/inappropriate/completely missing the point or just plain wrong. I'm just interested in what others think.

  • * Detach - what it currently does, ie completely remove all traces of a project from your computer. This includes the entire project subdirectory and all references to the project in the BOINC data directory, eg clean up the state file and various other files. I'm not sure if it does this already but perhaps it should also auto-report any unreported results and auto-abort/auto-report any unstarted tasks. It should warn about any partly completed work with the default option being to abort and auto-report those if the user doesn't select a different option.

* Retire - a new option - there might be a better name for it or it might just be part of an existing option. This option would be like a temporary detach so that you could minimise your resource exposure to a particular project without permanently severing all ties. Part of the process could be to offer you the choice of what particular files to delete, perhaps categorised as programs, data, configuration items and stats/logs. If you opted to retain only your configuration, you could retain your identity and history but still achieve almost as much reclaiming of resources as a full detach.

* Reset - what it currently does (or should do - or perhaps already does) with the proviso that some attempt is made to minimise gratuitous waste of bandwidth. I think that people treat resetting the way one would treat rebooting Windows a few years ago - there's a problem I don't understand - I bet rebooting will 'fix' it.

I'm wondering if 'resetting a project' could actually become 'consider deleting files but don't actually do so if the files are needed and their MD5 sums are OK', or something along those lines. Warn people that resetting is not likely to recover much disk space and will not do much if critical files are not damaged. I know that a good use of resetting is to revert from AP back to normal operation but in a lot of cases the app being used under AP is the same as the stock app you wish to revert to and perhaps it's possible to interrogate the project for the checksum before deleting the AP app, just in case they are the same. Maybe this is getting too complicated.

To anybody interested, please feel free to correct or comment as you see fit.

Edit: Initially, I had read just Richard's bug report to DA, on the BOINC alpha list. I've now read the extra discussion about tags and I believe Nicholas Alvarez is fully correct in what he says. I also believe it goes a bit further than that.

Under LS, when the client makes a work request, the server tries to send work appropriate for the files the client has. Even if the client only has a subset of suitable files, the server will augment the subset at the time of sending the new task. I read somewhere, the next interesting stage. If the scheduler has no tasks that 'fit' the data files on the client, either fully or partially, it will not immediately choose a new data subset. It will first set a flag for the WUG, requesting the generation of new tasks for the existing data subset. After a short timeout, the scheduler will check for any new tasks. If it can't find any it will assume that there really are no more to be had and it will request the deletion of those files for which it has failed to find a task (and for which the WUG has failed to create more).

The flaw in all this, which really bites at times like now, is that, yes, there are no new S5R6 tasks to be had. There are going to be a steady stream of (unpredictable) resends, for quite a while, which will indeed need the very files that are now in the client's state file. From actual experiment, I know that is a sentence of death, even although the files themselves may not be deleted for quite a while since the tasks that need the files may be way down the cache somewhere. It's a pity that files couldn't continue to be reported to the scheduler and then become if the scheduler subsequently discovered that it had some appropriate resends.

I've actually done the experiment (on a previous run transition) of continually removing the flags and been able to sustain a host just on resends for the existing data subset for several weeks without ever having to get new data subsets. It made an amazing difference to that host's bandwidth requirements.

So, this whole discussion about LS, excessive disk usage, excessive bandwidth and files has given me an idea for a new LS feature. The feature would require that LS data files that are both and should always be reported to the scheduler by the client, since they really are still 'available'. There should also be a client settable preference for with a default of zero (but perhaps a user settable value of a week or two - depending on the keenness of the user to help clean up 'dregs'}. If the last task depending on a file has been completed and returned, don't actually kill the sticky file until any delay has expired. The receipt of a task during the delay countdown would reset the value back to the full value. This should have the effect of finally deleting the files once the resend stream has really been exhausted.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2952260189
RAC: 698660

RE: ... day job

Message 97294 in response to message 97293

Quote:
... day job ...


Technically, I'm self-employed, so it's easy (too easy) to talk the boss into giving me time off. In practice, things have changed a great deal in the 20 years I've been doing consultancy: computers have become commodities, almost installing themselves, and software has become much more sophisticated, making it more difficult for a jobbing amateur to write something useful to a professional standard. Since I'm within a couple of years of retirement anyway, I'm happy to take the resulting silent phone as 'early release'.

Quote:
... lunch ...


Since they closed the last remaining shop in my village, lunch has involved a one-hour walk to the next village through fields and woods, and along a canal bank. It's an excellent time for drafting...

Quote:
... bug reports ...


Having been on the receiving end of user reports on software I've written (nothing so crude as a bug, of course), I'm a fully paid-up member of the software developers' Trade Union. On behalf of comrades everywhere: quote error messages verbatim, and send in relevant diagnostic logs, or we strike. One out, all out.

Quote:
... ...


Thank goodness I covered myself by saying '... (or at least has the effect) "don't remove this file reference on project reset".' Nicolas has pointed out that it has a more extensive meaning: we'll have to think more deeply about the project's response to the consultation which I'm sure is coming our way (!).

Quote:
... reset ...


It'll be easier to work out what a project reset should do if we think about the circumstances under which it might be (ought to be?) used. I can think of several:

  • * (At Einstein) End of Beta run, return to standard application
    * User settings (time fractions, DCF, debt) messed up
    * Project sent bad app/task/estimates (not Einstein, of course...)
    * Too many files hanging about (the one that started this discussion)
    * 'BOINC told me to....'

That last one, of course, is the infamous

18-Feb-2010 22:01:42 [Einstein@Home] Task h1_1114.55_S5R4__630_S5R6a_1 exited with zero status but no 'finished' file
18-Feb-2010 22:01:42 [Einstein@Home] If this happens repeatedly you may need to reset the project.


or, as it appears in the Simple User Interface,

Looking at that list, I think that in the majority of cases it would be better to keep the Locality Scheduling data files. The exception, of course, is the one that started this thread: though looking at that opening post, the OP only had 90MB, which is about the minimum possible for one set of locality files and an executable. We could have got away with a simple "Einstein searches for needles in a very, very big haystack".

Nicolas quotes 'current documentation' as saying

Quote:
The file may also be deleted at any time by the core client in order to honor limits on disk-space usage.


I wasn't aware of that nuance, and I suspect others in this discussion may not have been either. If someone could test and demonstrate that the process is working properly, then perhaps the mods could prepare suitable guidance ( "Don't Panic!" ) for next time the subject comes up. If not, I suppose I've drawn the short straw for the next bug report.....

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

Gary must've eaten chicken,

Gary must've eaten chicken, duck or goose, as we're getting all these feathers stuck into us. ;-)

I must say, my vision of doing a project reset means that everything gets set back to initial values, no files are left behind. That's why I asked for the projects directory to be cleaned out completely. And to be honest, I didn't take into account the client_state.xml file. :-(

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2952260189
RAC: 698660

RE: RE: Jord's Trac

Message 97296 in response to message 97290

Quote:
Quote:
Jord's Trac ticket was closed. :D

I see, but I also see from the code that it will delete everything in the directory. Good... but for if you run an anonymous platform from there... then you do not want to delete everything on a project reset. ;-)


Heads up and FYI:

Resetting with an app_info active doesn't delete the files - just tried it while wrestling with a conundrum at SETI Beta.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

RE: RE: RE: Jord's Trac

Message 97297 in response to message 97296

Quote:
Quote:
Quote:
Jord's Trac ticket was closed. :D

I see, but I also see from the code that it will delete everything in the directory. Good... but for if you run an anonymous platform from there... then you do not want to delete everything on a project reset. ;-)

Heads up and FYI:

Resetting with an app_info active doesn't delete the files - just tried it while wrestling with a conundrum at SETI Beta.


I never tried it. It's probably also why David hasn't responded to it. Just dismissed me. I can live with that. :)

John
John
Joined: 26 Dec 05
Posts: 2
Credit: 24060617
RAC: 0

Thanks Gary, I apologize if

Message 97298 in response to message 97284

Thanks Gary,
I apologize if my attitude seems extreme. I am a developer, and I can't count the number of times I have had a server shut down by some lazy programmer who wrote a greedy algorithm which consumed resources like it was the only thing that was ever going to run. I'm not saying the developers here are lazy, but the appearance is that E@H is grabbing resources in a way that reminds me of those times.

I agree that it may be a problem for BOINC rather than the individual project, but doesn't the project have some input into BOINC? As significant participant, I would expect they would have at least some input on what bugs are prioritized. But then, maybe since S@H is the one not getting their fair share, I should mention it over there instead :)

Since you asked, my cache is set to update every 1 day, and keep an additional 1 day of work. I learned not to over cache long ago. But that is still generally 50+ work units. Also I have a 15K drive in this machine, so it is only a 150GB. With Windows, a couple of development environments, cygwin, and a couple of games, I'm down to about 16GB free. BOINC is getting almost 10% of that. I already moved my music collection off to another computer, so you can't blame that :)

Does the project control the localization algorithm or does BOINC do that too? Because it seems that could be a large part of the problem. It appears to download a new set of *S5R? files for each new task, in spite of there being 200-300 files already on my system. It's as though it isn't successfully allocating the tasks that I have files for, but also not allowing my system to removed ones that aren't needed. Which brings me to another question of who controls what: from the log, something is aware that E@H needs 90MB more, but there is only 70MB free. If it knows that it needs at least 20MB more, what is the use of deleting a single 3.5MB file and waiting 4 hours to contact the server to find out that it now needs 16.5MB more? Could it be set up to more actively purge files it doesn't currently need?

In the spirit of experimentation, I upgraded to BOINC 6.10.43 and dropped my cache to 1GB. It preloaded the cache up to about 800MB, and got into the need 90MB loop I mentioned above. Interestingly, the new version seemed to leave 100MB or so for the other project (800+90 57 units to be cleared. At one unit per 4 hours, 228 hours or 9+ days) during which it will not only not be able to get new task for the other projects, but unless the localization starts working, won't be able to run anything itself. Actually, 100MB of that is STSP jobs that will clear when then finish running. So maybe just 4 days or so.

John

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250254614
RAC: 35394

Originally the new locality

Originally the new locality scheduling introduced with S5GCE was meant to keep better track of which files are still useful and which can and should be deleted on the clients. However we ran out of S5R6 work and couldn't produce enough ABP work for all the work requests, so we needed to come up with something fast. We ended up running the new scheduler without the new file deletion scheme for now.

Unfortunately the old file deletion code wasn't aware of the new S5R7 data files, so these weren't deleted. For now I patched the old scheduler code such that it should work with the new S5R7 files from now on. This means the S5R7 files you get with tasks from now on should be taken care of by the old scheduler code.

However the old S5R7 files that belong to tasks already delivered (more precisely: the files for which no work could be generated an hour ago) the client won't get 'delete' messages for. They have to be deleted either manually or by the client when the disk usage of the project reaches the limit specified in your preferences.

If you want to free up disk space now and don't care about bandwith, you could manually delete the files that names end in _S5R7. Make sure the Client doesn't keep any reference (Gary or Richard may have more detailed instructions), or it will download the files again even if they are no longer needed.

In any case you may want to revisit you disk space settings.

I'll keep an eye on this issue and try to find a way to get the useless files deleted on the clients by commands sent from the server side without harming the files that are still needed.

Sorry for the inconvenience.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.