Announcement

Collapse
No announcement yet.

Gigabyte Aorus 1080ti - card not initialising

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Gigabyte Aorus 1080ti - card not initialising

    Hi there.

    Have searched various forums but haven't come across the exact issue I am having.

    Bought the 1080ti with a fault (randomly shutting down, black screen , fans full speed etc). Previous owner had replaced thermal pads but many were too thick on inspection.

    I tested the card in windows, all loaded up ok. Put some load on the card and it crashed. GPU was reporting 92 degrees C at the time.

    Switched the stock cooler out for an AIO water cooler from my 1080 FE, booted the card back up - temps were much better and allowed some load to be applied to the card. Ran for a short while then the card shut down again (and the PC also), this time it wouldn't restart as over current protection kicked in.

    Pulled the card and diagnosed one of the drmos chips had failed (isolated each VRM by lifting one side of chokes connected to them in turn) - 12v short to gnd on the 3rd from the bottom phase.

    I removed the drmos from the board to remove any shorts from the chip pins.

    All resistance measurements from the rails seem fine (PEX is at 200 ohms which seems a little high), all voltages are reading fine (PEX, mem, gpu, 1.8v, 3.3v, 5v, 12v). Can't find any shorts now on primary voltages. Checked all PCI data lines / capacitors and they check out fine.

    The card will now not start correctly - get backlight after 30 seconds or so and monitor comes on but no post screen, PC will continue booting into windows however).

    With 2nd card connected as primary, can load windows - 1080ti card is recognised in device manager (but won't start fully), GPU-Z finds the card with missing clocks etc.

    Checked bios flash with nvflash and can read and write bios files fine.

    Tried booting MODS/MATS. Linux environment can detect card fine with lspci.

    However when running MODS I get the following (below). Is the core partially dead? Can't find any one else reporting these kinds of errors really (tried various MODS versions with similar results 400.xx and up versions are more detailed in their output).

    Thanks in advance for any help!

    Code:
    MODS start: Fri Feb 4 06:30:36 2022 
    
    Command Line : gputest.js -skip_rm_state_init -mfg 
    
    CPU
    Foundry  : GenuineIntel
    Name   : 12th Gen Intel(R) Core(TM) i7-12700KF
    Family  : 6
    Model   : 7
    Stepping : 2
    
    Version
    MODS      : 367.56
    OperatingSystem: Linux (x86_64)
    Kernel     : 4.17.4-gentoo
    KernelDriver  : 3.87
    HostName    : tinylinux
    Smbios version [0x304] is not supported
    
    ERROR: Fuse read error
             gpu 0 dev.sub 0.0     
             --------------------------- 
    PCI Location  : 0x00, 0x05, 0x00, 0x00   
    DID      : 0x1b06           
    Raw ECID    : 0x0000000000e0224000000045b5880d91
    Raw ECID (GHS) : 0x000000016445b5880c000000090101c0
    ECID      : PHRM83-09_x02_y07      
    Device Id   : GP102            
    Revision    : a1             
    NV Base    : 0x71000000         
    FB Base    : 0x40000000         
    IRQ      : 17             
    NV_PMC_INTR_0 bit 28 high.
    Trying to clear interrupt by writing 0x0 to register 0x001140
    NV_PMC_INTR_0 bit 28 high.
    Trying to clear interrupt by writing 0x0 to register 0x001144
    NV_PMC_INTR_0 bit 30 high.
    Trying to clear interrupt by writing 0x2 to register 0x12004c
    Successfully cleared GPU's interrupt state.
     Unknown PCIE speed cap 0x4
     Unknown PCIE speed cap 0x4
    
    ** ModsDrvBreakPoint **
    
    ------------------------- BEGIN ASSERT INFO DUMP -------------------------
     invalid.
    NVRM: instSetBar0WindowToWorkspaceBase_GM200: VGA workspace base is invalid.
    NVRM: Possible bad register read: addr: 0x31c4f4, regvalue: 0xbad0122e, error code: Unknown SYS_PRI_ERROR_CODE
    ACPI: Unable to evaluate dev method (_DOD) on 0:5:0.0
    GF100GpuSubdevice: FloorsweepingAffected=0
    GF100GpuSubdevice: Floorsweeping parameters present on commandline:     
    GF100GpuSubdevice: Floorsweeping parameter mask values: display=0x0 msdec=0x0 msvld=0x0 fbio_shift_override=0x0 ce=0x0 gpc=0x0 fb=0x0 fbio=0x0 fbio_shift=0x0 gpctpc[0]=0x0 gpctpc[1]=0x0 gpctpc[2]=0x0 gpctpc[3]=0x0 gpctpc[4]=0x0 gpctpc[5]=0x0 gpctpc[6]=0x0 gpctpc[7]=0x0 gpczcull[0]=0x0 gpczcull[1]=0x0 gpczcull[2]=0x0 gpczcull[3]=0x0 gpczcull[4]=0x0 gpczcull[5]=0x0 gpczcull[6]=0x0 gpczcull[7]=0x0 
    GF108PlusGpuSubdevice: Floorsweeping parameters present on commandline:  
    GF108PlusGpuSubdevice: Floorsweeping parameter mask values: pcie_lane=0x0 fbpa=0x0 spare=0x0 
    GM10xGpuSubdevice: Floorsweeping parameters present on commandline:  
    GM10xGpuSubdevice: Floorsweeping parameter mask values: nvenc=0x0 nvdec=0x0 head=0x0
    GM20xGpuSubdevice: Floorsweeping parameters present on commandline: 
    GM20xGpuSubdevice: Floorsweeping parameters mask values: fbp_rop_l2[0]=0x0 fbp_rop_l2[1]=0x0 fbp_rop_l2[2]=0x0 fbp_rop_l2[3]=0x0 fbp_rop_l2[4]=0x0 fbp_rop_l2[5]=0x0 fbp_rop_l2[6]=0x0 fbp_rop_l2[7]=0x0 fbp_rop_l2[8]=0x0 fbp_rop_l2[9]=0x0 fbp_rop_l2[10]=0x0 fbp_rop_l2[11]=0x0 fbp_rop_l2[12]=0x0 fbp_rop_l2[13]=0x0 fbp_rop_l2[14]=0x0 fbp_rop_l2[15]=0x0 
    GP10xGpuSubdevice: Floorsweeping parameters present on commandline: 
    GP10xGpuSubdevice: Floorsweeping parameters mask values: gpc_pes[0]=0x0 gpc_pes[1]=0x0 gpc_pes[2]=0x0 gpc_pes[3]=0x0 gpc_pes[4]=0x0 gpc_pes[5]=0x0 gpc_pes[6]=0x0 gpc_pes[7]=0x0 gpc_pes[8]=0x0 gpc_pes[9]=0x0 gpc_pes[10]=0x0 gpc_pes[11]=0x0 gpc_pes[12]=0x0 gpc_pes[13]=0x0 gpc_pes[14]=0x0 gpc_pes[15]=0x0 
    NVRM: DevinitPmuOffloadDevinitToPmu Devinit complete is false
    NVRM: bp @ ../../../../resman/kernel/devinit/nv/devinit_pmu.c:391 
    
    ** ModsDrvBreakPoint **
    
    -------------------------- END ASSERT INFO DUMP --------------------------
    
    ** ModsDrvBreakPoint **
    
    ------------------------- BEGIN ASSERT INFO DUMP -------------------------
    gvalue: 0xbad0122e, error code: Unknown SYS_PRI_ERROR_CODE
    ACPI: Unable to evaluate dev method (_DOD) on 0:5:0.0
    GF100GpuSubdevice: FloorsweepingAffected=0
    GF100GpuSubdevice: Floorsweeping parameters present on commandline:     
    GF100GpuSubdevice: Floorsweeping parameter mask values: display=0x0 msdec=0x0 msvld=0x0 fbio_shift_override=0x0 ce=0x0 gpc=0x0 fb=0x0 fbio=0x0 fbio_shift=0x0 gpctpc[0]=0x0 gpctpc[1]=0x0 gpctpc[2]=0x0 gpctpc[3]=0x0 gpctpc[4]=0x0 gpctpc[5]=0x0 gpctpc[6]=0x0 gpctpc[7]=0x0 gpczcull[0]=0x0 gpczcull[1]=0x0 gpczcull[2]=0x0 gpczcull[3]=0x0 gpczcull[4]=0x0 gpczcull[5]=0x0 gpczcull[6]=0x0 gpczcull[7]=0x0 
    GF108PlusGpuSubdevice: Floorsweeping parameters present on commandline:  
    GF108PlusGpuSubdevice: Floorsweeping parameter mask values: pcie_lane=0x0 fbpa=0x0 spare=0x0 
    GM10xGpuSubdevice: Floorsweeping parameters present on commandline:  
    GM10xGpuSubdevice: Floorsweeping parameter mask values: nvenc=0x0 nvdec=0x0 head=0x0
    GM20xGpuSubdevice: Floorsweeping parameters present on commandline: 
    GM20xGpuSubdevice: Floorsweeping parameters mask values: fbp_rop_l2[0]=0x0 fbp_rop_l2[1]=0x0 fbp_rop_l2[2]=0x0 fbp_rop_l2[3]=0x0 fbp_rop_l2[4]=0x0 fbp_rop_l2[5]=0x0 fbp_rop_l2[6]=0x0 fbp_rop_l2[7]=0x0 fbp_rop_l2[8]=0x0 fbp_rop_l2[9]=0x0 fbp_rop_l2[10]=0x0 fbp_rop_l2[11]=0x0 fbp_rop_l2[12]=0x0 fbp_rop_l2[13]=0x0 fbp_rop_l2[14]=0x0 fbp_rop_l2[15]=0x0 
    GP10xGpuSubdevice: Floorsweeping parameters present on commandline: 
    GP10xGpuSubdevice: Floorsweeping parameters mask values: gpc_pes[0]=0x0 gpc_pes[1]=0x0 gpc_pes[2]=0x0 gpc_pes[3]=0x0 gpc_pes[4]=0x0 gpc_pes[5]=0x0 gpc_pes[6]=0x0 gpc_pes[7]=0x0 gpc_pes[8]=0x0 gpc_pes[9]=0x0 gpc_pes[10]=0x0 gpc_pes[11]=0x0 gpc_pes[12]=0x0 gpc_pes[13]=0x0 gpc_pes[14]=0x0 gpc_pes[15]=0x0 
    NVRM: DevinitPmuOffloadDevinitToPmu Devinit complete is false
    NVRM: bp @ ../../../../resman/kernel/devinit/nv/devinit_pmu.c:391 
    
    ** ModsDrvBreakPoint **
    NVRM: Devinit on PMU failed to execute correctly!!
    NVRM: bp @ ../../../../resman/kernel/devinit/nv/devinit.c:1063 
    
    ** ModsDrvBreakPoint **
    
    -------------------------- END ASSERT INFO DUMP --------------------------
    Failed to read good Jtag Ctrl Status
    WARNING... Failed to unlock Jtag for access!
    Error 000000000818 : Gpu.Initialize Mods detected an assertion failure
    Chipset
    VID      : FFFF (Unknown)
    DID      : FFFF (Unknown)
    Rm call failed. default Disabled.
    Chipset ASPM  : Disabled
    Chipset LTR  : Enabled
    
    Error 000000000818 : Global.InitializeGpuTests Mods detected an assertion failure
    gputest.js   : 59
    mfg.spc    : 11
    boards.js   : 7
    boards.db   : 3208
    boards_gp102.db: 16
    boards_gp104.db: 196
    boards_gp106.db: 157
    
    GpuDevMgr not initialized. Device shutdowns will likely do nothing.
    
    Error Code = 000000000818 (Mods detected an assertion failure)
    
                        
     #######   ####  ######## ###   
     #######  ######  ######## ###   
     ##    ##  ##   ##   ###   
     ##    ##  ##   ##   ###   
     #######  ########   ##   ###   
     #######  ########   ##   ###   
     ##    ##  ##   ##   ###   
     ##    ##  ## ######## ######## 
     ##    ##  ## ######## ######## 
                        
    
    MODS end : Fri Feb 4 06:30:47 2022 [10.981 seconds (00:00:10.981 h:m:s)]
    Last edited by SMDFlea; 02-12-2022, 05:40 AM.

    #2
    Re: Gigabyte Aorus 1080ti - card not initialising

    After digging around some posts on here, I came across this thread regarding a 1070 with DDR5 read errors https://www.badcaps.net/forum/showthread.php?t=100720

    When i checked the MODS logs on post#3 the output there is almost identical to mine.

    I seemed the resolution to this issue was bad DDR5 chip(s).

    I went and checked the 10 pins mentioned in post#4 on the PCIE bus and they checked out fine.. (see below)

    OL , GROUND , .510 , .510 , GROUND, OL , GROUND , .510 , .510 , GROUND

    This suggests the core is ok?

    One thing to mention, when I powered up the card a few days ago I am pretty sure some RAM chips felt hotter than others (although they all felt warm).

    Does this indicate FBVDDQ supply issues to specific chips maybe? Can this be checked easily per chip without removing them?

    Thanks

    Comment


      #3
      Re: Gigabyte Aorus 1080ti - card not initialising

      Ok, looks like all 11 chips are getting FBVDDQ (at least to caps next to them).

      However tested for 1V8_AON and chips M2, M3 and M4 are not and are measuring 800 ohms resistance to 1.8v from the choke on the power side of the board.

      Comment


        #4
        Re: Gigabyte Aorus 1080ti - card not initialising

        Ignore the last comment - was measuring on the wrong side of those 3 caps!

        Comment


          #5
          Re: Gigabyte Aorus 1080ti - card not initialising

          Attached is the MATS log for this card - read errors on all banks
          Attached Files

          Comment


            #6
            Re: Gigabyte Aorus 1080ti - card not initialising

            Ok, have run MODS/MATS again using the unattended auto method (no other gpu / igpu), just the faulty card.

            mods.log now simpler as errors were coming from GT520 primary card, but still fails.

            report.txt still read errors on all banks (was hoping it would now just pinpoint the faulty ram)
            Attached Files

            Comment


              #7
              Re: Gigabyte Aorus 1080ti - card not initialising

              Clearly something funny there, try a newer version of mods/mats

              Comment


                #8
                Re: Gigabyte Aorus 1080ti - card not initialising

                same with any version. out of interest i ran the card with no 12v connectors (just power from pcie) and the test was the same result. voltages with connectors attached look fine on the card from all rails.

                Comment


                  #9
                  Re: Gigabyte Aorus 1080ti - card not initialising

                  so I hooked the bios chip up to a ch341a usb programmer and analysed the signals on a usb oscilloscope (CS and DO and CLK pins), whilst reading the rom from the chip via neoprogrammer you can clearly see the CS pullling, CLK signal and rom data streaming from the bios chip (Winbond 25Q40EWNIG).

                  When i do the same when the card is booting up, CS line pulls fine but no data or CLK signal.

                  As the bios can be read and written to the card using nvflash, this suggests there isn't an issue with these lines to the core or the ability to push or pull data via the core to the bios chip, the core just isn't running the initialisation sequence on start up.

                  What other factors are at play here?

                  Comment


                    #10
                    Re: Gigabyte Aorus 1080ti - card not initialising

                    Hi,

                    Check all the address lines on the top side of the card (32 in 16 pairs), I believe they all must be OK.
                    .
                    Question, which side of the drmos did fail, the high side or the low side or both - meaning which mosfets had a short? If high side only there is most probably demage to the GPU/Memory, 12V was present on GPU (I assume it was the GPU voltage as it was the third mosfet). If only Lowside there is some chance there is no demage, if both then never knows, depends on which did fail as the first, and if it came to an overvoltage to the GPU.

                    The high temperature can be a symptom of: a) GPU already near end-of-life, short before exitus, b) cooling plate not seated correctly on the GPU, can happen due to permanent deformation of the board due to mechanic pressure, temperature and long time (adding some 0,5mm-1mm thick washers under the 4 GPU screws with springs may help) or due to a badly done raballing of the GPU (not parallel to the board, bad GPU-cooler contact).

                    I'm not sure, but I have also the impression, that sometimes dependent on th RAM fault degree/area, it may be impossible to run mats and discover which chip/chips is/are faulty. The only way to verify this, would be to exchange all the RAM chips for good ones. You could try to replace chip by chip, but you never know how many of them are broken and one cannot be sure to find this way all the bad ones. Never did this due to the big effort and cost, but maybe somebody will try once.
                    Last edited by DynaxSC; 02-18-2022, 11:01 AM.

                    Comment


                      #11
                      Re: Gigabyte Aorus 1080ti - card not initialising

                      Thanks for the reply.

                      Not sure which side of drmos failed, all i can say is that specific one measured short to GND after lifting all chokes and measuring from the output (VSWH). Datasheet shows LS FET connected to VSWH pins under the chip.

                      https://cdn.badcaps-static.com/pdfs/...8c1c33edaa.pdf

                      I wouldn't pay much attention to my comments about the ram chip temperatures as since measured (with finger!) and they are all the same (roughly) - not very scientific!

                      I don't have schematics for this specific card, I am using a Galax one which is mostly the same (power delivery is a lot more complex on this with a lot more components).

                      Where are the address lines? - do you mean from the GPU to the ram or from the PCIE to the GPU?

                      I have ordered 2 new bios chips (W25Q40EWNIG) which I will program and swap onto the card as i want to rule out a fault in those.

                      Why are there 2 chips on the card with no bios switch? Is it for a software based normal/OC switch between roms or a backup of the 1st rom?

                      I had downloaded the contents of the 2nd chip and flashed that to the first to check for corruption in the 1st chip to see if that made a difference (it didn't). I did however learn that the inforom data is stored in the bios above 256kb and is specific to the card (so good job i backed it up as flashing a normal bios image is only 256kb so you lose this information if the chip is erased and flashed with just a rom file).

                      Thanks

                      Comment


                        #12
                        Re: Gigabyte Aorus 1080ti - card not initialising

                        I was inspired to try swapping the bios chip because of this video from Tech Cemetery (very similar symptoms)

                        https://www.youtube.com/watch?v=9M2K...l=TechCemetery

                        Comment


                          #13
                          Re: Gigabyte Aorus 1080ti - card not initialising

                          Hi Benwaterson

                          Thank you very much for this video, just repaired a MSI GTX Aero Mini ITX thanks to this video. It had also read errors on all ram chips, worked sometimes, but stopped under load. Exchanged the Bios from MX.. to Winbond, and now it's working perfectly !

                          Comment


                            #14
                            Re: Gigabyte Aorus 1080ti - card not initialising

                            Hi DynaxSC - that's brilliant news, glad you got it fixed.

                            Will see if it also helps me when the chips arrive.

                            Comment


                              #15
                              Re: Gigabyte Aorus 1080ti - card not initialising

                              So I got hold of a schematic for the card and its solved the mystery of the two bios chips.

                              One bios is for display port, the other is for vr hdmi (there is a second hdmi port internal to the card for vr).

                              The CS line goes through a multiplexer (NC7SB3157P6X_NL) to switch which chip is being addressed.

                              Its possible there may be issues with the multiplexer circuit?



                              I have uploaded the schematic here (page 57 for the rom diags)

                              https://drive.google.com/drive/folde...eE?usp=sharing
                              Last edited by benwaterson; 02-20-2022, 06:14 PM.

                              Comment


                                #16
                                Re: Gigabyte Aorus 1080ti - card not initialising

                                So I tested the mux chip, Pin 6 is being pulled low when DVI connected, otherwise its high (HDMI/DP). I also examined the two bios files I pulled from the card and (amongst other minor differences) they have different IDs

                                IC U3 - GV-N108TAORUS-11GD_H/F60/05E3 - Version 86.02.39.00.9D (presume HDMI)
                                IC U7 - GV-N108TAORUS-11GD_D/F91/063E - Version 86.02.39.40.5F (presume DVI)

                                However the schematic has some contradictions, it says

                                1) pin A (6) pulled low for DVI on chip U7
                                2) pin A (6) pulled high for VRHDMI on chip U3

                                But then the actual schematic shows chip U3 = DP BIOS (and presume DVI), U7 = Default HDMI BIOS. This is backwards?

                                The fact there are two discreet, different BIOS would indicate that the code within them is different to drive two different outputs (else why would it need two).

                                It almost looks like the BIOS are the wrong way around? Doesn't make much sense as the card was working until the drmos failed.

                                Comment


                                  #17
                                  Re: Gigabyte Aorus 1080ti - card not initialising

                                  I have many Nvidia cards and they have 2 bios chip, never knew why, but they are 970, 980, 1060... and they dont have VR capabilities or double bios switch.

                                  Would like to know what are they for

                                  Comment


                                    #18
                                    Re: Gigabyte Aorus 1080ti - card not initialising

                                    Edited my initial comment after more investigation

                                    Originally posted by benwaterson View Post
                                    So I got hold of a schematic for the card and its solved the mystery of the two bios chips.

                                    One bios is for DVI, the other is for HDMI/DP, they switch the mux to the CS pins depending on which type of connection is present
                                    Quoted from the PDF that comes with the bios updaters from Gigabyte

                                    "AORUS 10 series have two BIOS and should change to different port
                                    when BIOS flashing, please refer as below:

                                    H Bios Flashing (with HDMI*2+DP*3)

                                    P Bios Flashing (with DVI)"
                                    Last edited by benwaterson; 02-21-2022, 03:51 PM.

                                    Comment


                                      #19
                                      Re: Gigabyte Aorus 1080ti - card not initialising

                                      So no real update, bios switch didn't change anything (in fact made it worse for a short while as hot air affected a few resistors around the chip and had to be reworked).

                                      Only thing to add is that if I boot blind with the card as primary and connect via RDP to windows, it now shows all info in GPU-Z apart from memory size (if nvidia drivers install then get blue screen).
                                      Attached Files

                                      Comment


                                        #20
                                        Re: Gigabyte Aorus 1080ti - card not initialising

                                        Ok, maybe some progress.

                                        Was reading this new post https://www.badcaps.net/forum/showthread.php?t=104132 and decided to check this on my board.

                                        There is no output (0v) on GPIO10_FBVREF_SEL to the mosfets driving the VREFC on all ram chips.

                                        Resistance measurements are fine, just nothing on pin 1 (gate).

                                        0.9v on pin 3 (drain) which is being pulled up to FBVDDQ

                                        If I manually apply 1.8v to pin 1 - is this a bad idea?
                                        Attached Files

                                        Comment

                                        Working...
                                        X