CUDA jobs marked as invalid


Advanced search

Message boards : Problems and Bug Reports : CUDA jobs marked as invalid

AuthorMessage
Profile Aron
Send message
Joined: Sep 29 06
Posts: 7
Credit: 91,715,916
RAC: 0
Message 109989 - Posted 30 Jan 2011 8:33:38 UTC

    Last modified: 30 Jan 2011 8:34:11 UTC

    Hi,

    Sorry if this has already been commented on (could not find a specific post about it). Some of my/everyones computers with CUDA gpus return "Completed, marked as invalid". (http://einstein.phys.uwm.edu/show_host_detail.php?hostid=3825357 or check top 20 list for any computer with gpu(s)) This seems to be a common thing with most gpu jobs, that some will have this exit. Why, and how can one avoid it?

    Thanks! Best,
    Aron

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Forum moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: Aug 28 06
    Posts: 3020
    Credit: 46,832,889
    RAC: 60,678
    Message 109990 - Posted 30 Jan 2011 8:56:53 UTC

      Last modified: 30 Jan 2011 8:57:57 UTC

      Hi!

      There is a known "cross validation" issue which is discussed in this thread: http://einstein.phys.uwm.edu/forum_thread.php?id=8666&nowrap=true#109649. This means that different hosts will return slightly different results for the same workunit, depending on whether the task was run on CPU or GPU, or even on which model of GPU.

      In addition, I had a situation myself recently where a GPU consistently returned results that would not validate with other results. After a reboot, the results validated again.

      So it's currently not unusual to get an invalid result from time to time, unfortunately. But if you only get invalid results, a reboot would be a good idea, and one should check cooling and reduce overclocking, if applicable.

      HB
      ____________

      Profile Aron
      Send message
      Joined: Sep 29 06
      Posts: 7
      Credit: 91,715,916
      RAC: 0
      Message 110090 - Posted 1 Feb 2011 14:08:45 UTC - in response to Message 109990.

        Hi,

        Ok, I see. Thank you for your prompt reply.

        Best,
        Aron

        Faolan
        Send message
        Joined: Mar 9 11
        Posts: 1
        Credit: 2,253
        RAC: 0
        Message 111111 - Posted 15 Mar 2011 15:39:30 UTC - in response to Message 110090.

          I am working with an old PC (Dell Optiplex 320) but a gigabyte G210 graphics, CUDA capable, and I am obteining a lot of erros.

          I could read somewhere that I must update to the last BOINC software and it is done, but I updated to the last Nvidia Drivers today as well.

          I wish the number of errors will go down. Will see.
          ____________

          mikey
          Avatar
          Send message
          Joined: Jan 22 05
          Posts: 781
          Credit: 4,407,038
          RAC: 224
          Message 111135 - Posted 16 Mar 2011 10:52:56 UTC - in response to Message 111111.

            Last modified: 16 Mar 2011 10:54:28 UTC

            I am working with an old PC (Dell Optiplex 320) but a gigabyte G210 graphics, CUDA capable, and I am obteining a lot of erros.

            I could read somewhere that I must update to the last BOINC software and it is done, but I updated to the last Nvidia Drivers today as well.

            I wish the number of errors will go down. Will see.


            Two things I see that may be the cause of your problems: 1st you are check pointing every minute:
            [18:58:31][2168][INFO ] Checkpoint committed!
            [18:59:38][2168][INFO ] Checkpoint committed!
            [19:00:44][2168][INFO ] Checkpoint committed!
            [19:01:50][2168][INFO ] Checkpoint committed!
            [19:02:52][2168][INFO ] Checkpoint committed!
            [19:03:53][2168][INFO ] Checkpoint committed!
            [19:04:53][2168][INFO ] Checkpoint committed!
            [19:05:54][2168][INFO ] Checkpoint committed!
            [19:06:54][2168][INFO ] Checkpoint committed!
            [19:07:59][2168][INFO ] Checkpoint committed!
            If you raise that to say every 5, 10 or even 15 minutes it will use less memory. 2nd if it is not already set to 'leave applications in memory while suspended' change that so they ARE saved in memory. Both of these can be done on either the website or in the Boinc Manager. Doing it on the website means it is a global thing, doing it in the Boinc Manager means it is a by the pc thing. I have my checkpoint set to 900 seconds, 15 minutes.

            The 3rd thing I see is that your pc only has 2 gig of ram in it, raising that will let your pc 'breathe' and it will run much better. Right now it is using the hard drive as 'virtual' ram, raising the physical amount will stop it from having to use 'virtual' stuff. I see you are running the 64bit version of Windows 7, that means you are not confined to only using 3.5 gig but instead are confined by the physical limitations of the board. A pdf on the internet says your pc can have a Maximum memory of 4 GB. I would look at the costs involved and see if they are worth it to you. A quick look at this website
            http://www.crucial.com/store/listparts.aspx?model=OptiPlex%20380%20Desktop says it is 25 bucks for one 2gig module, that means 50 bucks to upgrade. This is for a desktop and you did not say if you have the mini form factor, a tower or what kind of case you have, so the price could change based on that.

            Post to thread

            Message boards : Problems and Bug Reports : CUDA jobs marked as invalid


            Home · Your account · Message boards

            This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

            Copyright © 2013 Bruce Allen