Announcement

Collapse
No announcement yet.

MSI PRO Z690-A DDR4 (MS-7D25) - stuck during boot every second time - debug code 16 & A1

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    MSI PRO Z690-A DDR4 (MS-7D25) - stuck during boot every second time - debug code 16 & A1

    Hi Folks

    I have an MSI PRO Z690-A DDR4 (MS-7525) which underwent LGA1700 replacement and had a missing resistor in the VRM section. The missing resistor RV170 (49,9Ω) has been resoldered, and I have also replaced a demaged LGA1700 socket.

    Generally the board does boot to Windows, but not always in the first attempt, but after the second attemp to boot the board with Power Button. Also restarting the board from Windows ends up with stuck boot process, but restarting the board again (long push Power Button or switching off and on PSU). Under Windows etherything works perfect, no missing devices, audio, LAN, USB's, FAN's, RGB's are working perfectly.

    The boot codes I receive on Debug Card TL631 are either:

    Debug Code 16 -> Pre-Memory System Agent initialization (System Agent module specific), according to "
    Newer AMI EZ-Flex BIOS codes" -> PIT channel 1 tested - these code is most of the time

    and sometimes, eg. after changing processor / configuration

    Debug Code A1 -> IDE Reset,
    according to "Newer AMI EZ-Flex BIOS codes" -> Cache memory size tested

    (debug code source is: https://forum-en.msi.com/index.php?t...ecoded.263610/ )


    I have done all the standard stuff, like reflashing Bios, downgrading and upgrading Bios, no change. Same with Management Engine of the PCH, upgraded to newest available from Intel, but no difference.
    I have also measured all the, in my opinion, meaningfull CPU signals directly in the LGA Socket, they are perfectly connected, so replacement of the socket seems went perfectly well.
    RAM LED socket tester also shows all OK, PCIe (@CPU) LED tester shows also all OK.
    Processor, memories, monitor and PSU work w/o any problem on different board. Changing the slots, no of memories, no difference.
    Board is inspected for any more missing devices very, very carefully, no parts missing or look demaged.

    The only observation which is strange is that the Management Engine of the PCH is in production mode still, not in normal state, but to my experience this normally does not have any influence.

    I'm slowly getting no ideas how to proceed, but I suspect this must be some hardware issue, eg. some device on the board is not resetting during soft reset.

    I have no desire to replace just all the chips on the board one after the another, so I will be graeteful for any hints or ideas how to proceed with the diagnosis.

    Rgds


    Attched pls find the boardview for this board (r1.1, my board is r1.2): [MOD EDIT] LINK to boardview --> https://www.badcaps.net/forum/troubl...22#post1823322
    Last edited by SMDFlea; 09-30-2024, 03:04 AM. Reason: link to boardview

    #2
    Originally posted by DynaxSC View Post
    Hi Folks

    I have an MSI PRO Z690-A DDR4 (MS-7525) which underwent LGA1700 replacement and had a missing resistor in the VRM section. The missing resistor RV170 (49,9Ω) has been resoldered, and I have also replaced a demaged LGA1700 socket.

    Are the 2 resistors (RV171) (RV170) ok?
    When the problem occurs, is (L5) operating normally?

    Comment


      #3
      Hi, both resistors are OK, they meet the resistance values. L5 (CPU_1P05) is present during the error and is stable, and all the time the board works. During soft reset (restart from Windows) it is also all the time present until the board is stuck on debug code 16 and during debug code 16 is shown - checked with scope. During hard reset it is absent for a while.

      BTW:
      - during debug code 16 the TL631 lights CPU LED.
      - during debug code A1 TL631 lights CPU & DRAM LED's.
      Last edited by DynaxSC; 09-30-2024, 05:23 AM.

      Comment


        #4
        How is the memory setup, is it jedec default or XMP profile
        All donations to badcaps are welcome, click on this link to donate. Thanks to all supporters

        Comment


          #5
          It is default setup, no XMP. Tried also to run the memories with slightly higher voltage (1.3V instead of 1.2V) - no difference.
          BTW I have closed the Manufacturing Mode of the PCH, and no difference, so this is ruled out.
          Disabled also all possible devices in BIOS, eg. LAN, Audio, all USB's, etc. - no difference.

          Comment


            #6
            Did you record the bios version (7D25vAJ) directly on the chip?

            In "some cases" the software is unable to record or erase correctly...

            Comment


              #7
              No, it has always been an upgrade/downgrade/upgrade of the original Bios. One time I used also the flash back to upgrade to newest version. Also it is not possible to read or program the bios with CH341, this board is blocking the programmer when trying to read/write the bios in-circuit. The chip must be desoldered to do this. I even already had desoldered the chip and made a backup, then programmed a stock bios. But then I found the missing resistor, and I programmed back the backuped bios before soldering the chip back to the board. Might try to see what happens if a stock bios is programmed, but need to desolder the chip again or I will first try to program the stock bios with fptw64.exe under Windows, so will not need to desolder the chip. If this will not work I still can desolder the chip. I just recognized that I have had already such issues with partly corrupted bios, so that upgrade did not solve the issue, so it's definitely worth to try. If this will work, the the only thing to do is to transfer the relevant DMI data to the stock bios.
              Last edited by DynaxSC; 09-30-2024, 06:35 PM.

              Comment


                #8
                Programmed the latest stock bios ver. 1.J0, but unfortunately no difference. BIOS is ruled out.

                BTW, the 725vAJ is for DDR5 version of the board.

                Comment


                  #9
                  Post your original backup + the latest manufacturer version?

                  Comment


                  • SMDFlea
                    SMDFlea commented
                    Editing a comment
                    All bios should be posted in the "Bios & Schematic requests" sub forum

                  #10
                  Attached a zip file with both files. LINK https://www.badcaps.net/forum/troubl...6-ms-7d25_bios

                  Maybe it is also worth to mention, that during the state of debug codes 16 & A1 the hard reset of the board does not react immediatelly, but scarcely after ca. 10 seconds, so something blocks the immediate execution of the hard reset.

                  In the meantime I also compared the SIO BIOS image content to a second board (same model, same rev.) that works normally - they are identical.

                  I also compared with a scope the following Reset SIO signals between both boards, and they behave exactly the same:

                  Pin 18 – ESPI_RST# - OK - short reset present
                  Pin 26 – PLTRST# - OK - reset present
                  Pin 27 – KBRST# - all time high
                  Pin 77 – PLTRST_BU3#_R - OK – reset present
                  Pin 78 – PLTRST_BU2#_R - OK – reset present
                  Pin 79 – PLTRST_BU1#_R - OK – reset present
                  Pin 101 – SIO_RSMRST# - seems not active
                  Pin 118 – RTCRST_DET# - all time low
                  Last edited by SMDFlea; 10-02-2024, 02:53 PM. Reason: Link to bios request thread

                  Comment


                    #11
                    PRO Z690-A (msi.com)

                    Is this the version page for your board?

                    Comment


                      #12
                      Finally I solved the issue, and I'm completely astonished about this.

                      The reason is that strange, as it is contrary to all my past experience.

                      I replaced already maybe 250 different CPU sockets, and I was completely convinced, that a board will not boot even if one DMI line in the DMI Bus will not be correctly connected between CPU and PCH. That was my current experience in all the cases I had with not correctly replaced CPU sockets, and there were some.

                      So I could not imagine, that the DMI bus could be partly not working, as the board did boot to Windows OS after hard reset, although not after soft reset. Therefore I didn't check the DMI Bus, as I assumed that all the 32 DMI lines must be connected correctly between CPU and PCH, otherwise system would not work.

                      Now I must correct my belief, as at least on LGA1700 it is possible, that the board will boot to Windows even if 3 DMI lines are not connected correctly. This was the case with this board, exaclty the RX lines RXP1, RXP3 and RXP5 (LGA pins AH5, AF5, AD5) were not soldered correctly in the LGA, and there was no proper connection between CPU and PCH.

                      After I corrected the not correctly soldered pins, the board now normally starts each time without any problem.

                      So the conclusion for me is simple: never fully rely on your experience, as this might prevent you from considering all possible reasons, especially the ones you think are the most unprobable or even impossible.

                      Thank you for your support and engagement, finally it helped me somehow to overcome my schematic thinking coming from experience. The cost of the learning curve are 4 full days diagnostics, soldering and brain exercise, but the succes is finally there, so I'm very glad about this.


                      BTW: the page with the BIOS files is different, this is the one I used: https://www.msi.com/Motherboard/PRO-...4/support#bios
                      The one you posted is for DDR5 version of the board.
                      Last edited by DynaxSC; 10-01-2024, 05:45 PM.

                      Comment


                        #13
                        I am extremely pleased to be able to try to help you!

                        I am sending the extracted and repaired file. LINK https://www.badcaps.net/forum/troubl...6-ms-7d25_bios

                        How did you perform the tests on the lines (RX RXP1, RXP3 and RXP5) (LGA pins AH5, AF5, AD5).... Oscilloscope or reverse conduction?
                        Last edited by SMDFlea; 10-02-2024, 02:53 PM. Reason: Link to bios request thread

                        Comment


                          #14
                          Hi, there are two methods I use.

                          One is a CPU Socket DMI tester available since shortly on Aliexpress (it has also some other pads for measurement of pwr supply lines etc.), but I use it seldom as the quality of this tester is very low. It has relatively big vias inside the pads and the pins of the LGA sometimes are stuck in the holes of the vias and get bent when closing the CPU bracket with tester inside. I have applied some solder to this pads to close the holes, and then made the surfuce plain again, but still it is not completely secure to use it. Also I have the impression that it is manufactured not precise enough, pads are probably displaced, so sometimes it shows no contact although connection is there. Also it was slightly too big in size, not fitting easily into the socket, so I had to make it slightly smaller with a file. Maybe the Chinese will make a better version, as probably many People will have same issues with it. If the quality would be better this would be a very helpful tool.


                          The other method is more work, I just make a reverse diode test (DMM positive probe to GND) directly on the socket pins using a magnifier glass and very thin probes from Aliexpress. They are made of steel, so are solid despite beeing thin, but cost 3x the normal probes. Anyway they are worth the money. During measuring there should be a voltage around 0,3V, slightly different on RX and TX lines, if not there is no connection or a short to GND. But measuring 32 lines is time consuming and you have to have a calm hand.

                          Many thanks for the repaired file. Can you tell me what was wrong with it ?

                          Comment


                            #15
                            The reverse diode mode gives us universal values when testing lines connected directly to PCH or CI....for example; 300 400 500 etc.

                            Your analysis is perfect. We learn from our mistakes and evolve.

                            The software I use to extract the executable files from the bios, automatically corrects the files...

                            Comment

                            Working...
                            X