Announcement

Collapse
No announcement yet.

Artifacts from GPU Not Always a GPU Chip Fault

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Artifacts from GPU Not Always a GPU Chip Fault

    Probably anyone who has repaired video cards before already knows this: video artifacts are not always caused by a bad GPU chip. I myself usually try not to assume this either. But with many modern GPUs, it seems that’s the case most of the time.

    Well, not in today’s post here. Although the GPU I’m about to show is not exactly “modern” anymore, it is still fairly relevant in terms of hardware design to modern GPUs. The video card I have is a Gigabyte Radeon HD6850 with 1 GB of gDDR5 RAM. Pictures of the video card, just for reference:




    You can also see the resistances of the GPU V_core, RAM, and GPU V_tt I noted down on a sticky note, as I initially thought this GPU will need a reflow (it didn’t.)

    Here is what the video displayed when hooked to a monitor:


    As you can see, the picture is mostly correct, but there were horizontal-running black lines across the image. I suppose the video artifact pattern may tell a clue as to what is failed, if one knows more about how the GPU and GPU RAM interface exactly. I don’t, so when I first saw the artifacts, I assumed it was yet another video card with a failed GPU to add to my collection. How I found that’s not the case turned out to be almost by coincidence.

    As with all used GPUs I get (be it eBay, Craigslist, or wherever else), I always check for missing, cracked, and broken SMD components on the back of the PCB (and a quick glance at the front too.) You might be surprised, but I can say that about 50% of the time, I easily get cards with a minimum of one chipped or missing SMD component. Typically it’s the larger SMD multi-layer ceramic caps (MLCCs) that seem to get damaged from rough handling, though not always.

    Knowing that, I perform a thorough inspection under good light. In case of the above Gigabyte Radeon HD6850, I found three (3) missing small SMD (1208 metric, I think) ceramic caps around the RAM. All were across the RAM Vdd rail (filtering power), so I expected the card would work fine without them (a few of these missing will almost never cause an issue.)

    But when I was greeted with video artifacts, this is when I had to dig in a little deeper. Having experienced a desktop DDR3 RAM stick not recognizing due to a cracked SMD array resistor (link to post here), I started checking all of the small resistors on the back of the card. When I got to the ones around the RAM, I found there is a pattern on these: four read around 1.6 KOhms, two ~60 Ohms, and one ~120 Ohms. Looking for this patter on the 5th RAM chip (out of the 8 chips), I saw one of the resistors was reading short-circuit across it (R2410), and two others reading much lower resistance (R2400 and R2401, IIRC.)


    At first, I thought that meant perhaps a RAM chip has gone bad. But then I discovered the resistor with the short circuit across it also had a ceramic cap (C2405) in parallel to it. So what are the chances that the ceramic cap could have gone short-circuit?
    … well, not frequent, but certainly happens from time to time. And I knew it couldn’t be the resistor, because resistors always go open-circuit or high-resistance when bad.

    So knowing that SMD ceramic caps do have the tendency to short out, I removed the one showing a short circuit (C2405) and…. VOILA! All resistors on the 5th RAM chip read normal resistances just like the other chips. I removed a ceramic cap from another RAM chip that corresponded to the same spot as the bad one on the 5th chip, and measured its value: 680 nF. Welp, I didn’t have that value, but I do know my scrap Xbox 360 motherboards have many 220 nF SMD MLCCs. So I stacked two on top of each other for a total of ~440 nF. Three would have been ideal, but I didn’t feel like stacking do many on top of each other. Figured the ~440 nF capacitance should hopefully be close enough.


    I then inserted the video card in my test PC, crossed my fingers, and pressed the power button.
    Results: IT’S ALIVE!

    This card is working properly now!

    At first, I tested the card under a quick ATI Tool GPU test, and then some older games. It didn’t give any errors or artifacts at all.

    Eventually, I installed this GPU in my newest PC I use for gaming (no, not the Q6600/EVGA i780SLI PC – built another one that I haven’t posted yet in the “Post Your System” thread ). This was back in August or thereabouts. So this Gigabyte Radeon HD6850 has been in use almost exclusively for gaming (mainly Fortnite) for close to 5 months now. It’s been very solid so far, knock on wood. Only issue I’ve had was with the fans not being wired correctly, but that’s a story for another thread.

    So there you have it: GPU video artifact problems does not necessarily mean the GPU chip is bad.

    I think I got lucky, though. Most modern GPUs do tend develop bad GPU (and often artifacts from that) rather than anything else going bad, especially problems like the card intermittently detecting, dropping out under load, or displaying “randomly” –colored video artifacts all over the screen, particularly once the drivers are installed. Nevertheless, I’m posting this here to just to show that in some rare cases, you can rescue a GPU from the dumpster and give it new life again.
    Attached Files
    Last edited by momaka; 12-14-2019, 07:22 PM.

    #2
    Re: Artifacts from GPU Not Always a GPU Chip Fault

    wow, nice find & save!
    <--- Badcaps.net Founder

    Badcaps.net Services:

    Motherboard Repair Services

    ----------------------------------------------
    Badcaps.net Forum Members Folding Team
    http://folding.stanford.edu/
    Team : 49813
    Join in!!
    Team Stats

    Comment


      #3
      Re: Artifacts from GPU Not Always a GPU Chip Fault

      Really nice read, good job!
      "The one who says it cannot be done should never interrupt the one who is doing it."

      Comment


        #4
        Re: Artifacts from GPU Not Always a GPU Chip Fault

        Thanks!

        Here is also the fan repair "story", in case anyone is interested:
        https://www.badcaps.net/forum/showpo...1&postcount=29
        (Posted in that thread, because that's where I put all of my GPU cooler related posts now.)

        Indeed this is my "lucky" card. Before this one, I bought an XFX Radeon HD4850 1GD3 and HD5750 1GD5. Both had artifacts due to bad GPU, though for the HD5750, it didn't show up until I switched coolers. Either way, reflowing these didn't help. In fact, when I reflowed them for the third time, they both ended up with shorted GPUs. Ah well, you can't always win. :\ Was fun trying to get them going, nevertheless. So when I got this HD6850 to work, I was really happy.
        Last edited by momaka; 12-15-2019, 10:08 PM.

        Comment


          #5
          Re: Artifacts from GPU Not Always a GPU Chip Fault

          I haven't had luck with ATi on the desktop side, but on the laptop side, they're really easy to fix. I just saved an Acer Aspire 5536 (same as my 5738 but AMD based instead of Intel) that was not posting. Reflowed the ATi M780G northbridge (I used tacflux, that's all I have around - it does its job tho), replaced the CMOS, and it sprang back to life. Currently has 4GB of RAM and a 500GB thin HGST drive, running Windows 7.

          Maybe desktop chips are too heat sensible compared to their laptop counterparts? I had a 4850 popcorn itself at 350*C, while a M780G chipset took the abuse without killing itself.
          Main rig:
          Gigabyte B75M-D3H
          Core i5-3470 3.60GHz
          Gigabyte Geforce GTX650 1GB GDDR5
          16GB DDR3-1600
          Samsung SH-224AB DVD-RW
          FSP Bluestorm II 500W (recapped)
          120GB ADATA + 2x Seagate Barracuda ES.2 ST31000340NS 1TB
          Delux MG760 case

          Comment


            #6
            Re: Artifacts from GPU Not Always a GPU Chip Fault

            you can reduce popcorn by gently baking the pcb for a day to remove moisture - there used to be an intel document about doing it to their bga chips before assembly.

            Comment


              #7
              Re: Artifacts from GPU Not Always a GPU Chip Fault

              Originally posted by stj View Post
              you can reduce popcorn by gently baking the pcb for a day to remove moisture - there used to be an intel document about doing it to their bga chips before assembly.
              It was done in the summer, and in no moisture conditions.
              Main rig:
              Gigabyte B75M-D3H
              Core i5-3470 3.60GHz
              Gigabyte Geforce GTX650 1GB GDDR5
              16GB DDR3-1600
              Samsung SH-224AB DVD-RW
              FSP Bluestorm II 500W (recapped)
              120GB ADATA + 2x Seagate Barracuda ES.2 ST31000340NS 1TB
              Delux MG760 case

              Comment


                #8
                Re: Artifacts from GPU Not Always a GPU Chip Fault

                Or you just stop reflowing dead chips and replace them properly. Or at least don't cook them, just 200°C for a minute or two.
                OpenBoardView — https://github.com/OpenBoardView/OpenBoardView

                Comment


                  #9
                  Re: Artifacts from GPU Not Always a GPU Chip Fault

                  interesting find and excellent fix. i guess the short across the power rail caused the vram chip in question to not receive any power thus the repeating horizontal double line pattern across the screen was missing frame buffer information from the underpowered ram chip. so in this case, the artifacts are from an underpowered ram chip and not a gpu fault itself.

                  otherwise, i dont see how a bad mlcc across a power rail can cause artifacts as i've had missing mlccs across power rails before and they never caused problems or artifacts for me too like u said unless its a decoupling mlcc for the signal line then its a different story. have u checked if c2405 is meant for filtering the power line or decoupling the signal line?

                  however, your pictures of the back of the pcb look like the pcb is burnt or discolored at the back in various spots. around the dvi connector area, around the hdmi and displayport connector solder pads, around the gpu vrm coils and gpu vrm in and out cap spots, around the vram vrm coils and gpu vtt coils as well. i'm afraid the card may have overheated for some time with the misconnected fan headers and wont last long...

                  Comment


                    #10
                    Re: Artifacts from GPU Not Always a GPU Chip Fault

                    Originally posted by ChaosLegionnaire View Post
                    interesting find and excellent fix. i guess the short across the power rail caused the vram chip in question to not receive any power thus the repeating horizontal double line pattern across the screen was missing frame buffer information from the underpowered ram chip. so in this case, the artifacts are from an underpowered ram chip and not a gpu fault itself.
                    I don't think that ceramic cap was across a power rail - at least not one of the major ones anyways. If that was the case, that ceramic cap would have been burned into a crater in the board, as VRMs on big cards usually have enough power to do that before SCP kicks in.

                    Originally posted by ChaosLegionnaire View Post
                    however, your pictures of the back of the pcb look like the pcb is burnt or discolored at the back in various spots. around the dvi connector area, around the hdmi and displayport connector solder pads, around the gpu vrm coils and gpu vrm in and out cap spots, around the vram vrm coils and gpu vtt coils as well. i'm afraid the card may have overheated for some time with the misconnected fan headers and wont last long...
                    Yeah, I noticed that too. Well, the discoloration looks worse on the pictures than it actually is. Only the VRM area for the GPU core is discolored. As for the area under DVI connectors... it's mostly flux on there. And it doesn't look like flux from someone trying to reflow the card. Most likely just poor factory cleaning. I guess I will never truly know, though. Time will tell, I suppose, as I keep using this card. Probably not going to run into any issues right now, though - it's barely 17-18C in my room. Nice and chilly for my PCs. Me.... brrr, I don't like it, but that's just part of winter.

                    Originally posted by Dan81 View Post
                    Reflowed the ATi M780G northbridge (I used tacflux, that's all I have around - it does its job tho)
                    You don't actually need flux for the reflow. That's because it's the BGA between the GPU silicon die and the GPU substrate (the square PCB) that fails, not the BGA between the substrate and the board.

                    Originally posted by Dan81 View Post
                    Maybe desktop chips are too heat sensible compared to their laptop counterparts? I had a 4850 popcorn itself at 350*C, while a M780G chipset took the abuse without killing itself.
                    Not really. It's just draw of the luck.
                    Generally, silicon made for laptops is traditionally a little "higher grade", usually because mobile/laptops don't have as much cooling available and thus are expected to run hotter. But when it comes to reflows, it's just pure luck.

                    Originally posted by stj View Post
                    you can reduce popcorn by gently baking the pcb for a day to remove moisture - there used to be an intel document about doing it to their bga chips before assembly.
                    Not just BGA chips.

                    Many SMT components are moisture-sensitive and thus typically baked before getting sealed in moisture-resistant bags and shipped off for assembly.

                    Baking for a day is a little excessive. But 1-4 hours at 100-110C will usually do the trick. I forgot what guide I read that in. It wasn't Intel. I think it was just for some SMD IC I purchased from Digikey.

                    Originally posted by Dan81 View Post
                    It was done in the summer, and in no moisture conditions.
                    Moisture gets trapped in the PCB over time. So even if you do your reflow when it's very dry, you can still popcorn a chip if there is moisture in the PCB layers, which can happen even if the PCB/device was stored in normal everyday room conditions. I usually let my boards sit for 5-10 minutes at ~100-120C before beginning the reflow process. This isn't enough to get rid of all of the moisture, but it does help.

                    Popcorning can also occur if the heat is too high.
                    Just because you do one PCB/card/board at XX degrees for TT minutes, doesn't mean that the next (different) PCB will heat up the same way. This is especially true with video cards, where some have very thick copper planes from a VRM that is placed relatively far from the GPU chip, and others with same traces, but placed very close to the GPU. Obviously the card with the bigger/longer/heavier copper planes will sink more heat, and thus won't heat up to the same temperature. Or taking it the other way around... you could do a large PCB just perfectly and then completely burn a small card following your same reflow "profile". That's what actually made me popcorn my XFX HD5750. Normal ATI HD4850 cards tend to sink quite a bit of heat. The 5750 is slightly smaller and probably needed less heat. I didn't factor for that (nor did I care to after two unsuccessfull reflows), so I cranked the heat up... and that was the end of that (already dead) GPU.

                    Originally posted by piernov View Post
                    Or you just stop reflowing dead chips and replace them properly.
                    Good solution, but only if you can find the chips.
                    When you can't or they go for more than the device you're trying to repair, it's just not worth it. At that point, you just reflow it, and if lucky, extend the useful life just a little more.

                    Originally posted by piernov View Post
                    Or at least don't cook them, just 200°C for a minute or two.
                    Won't do for some of the big video cards. I tried this with a bunch of ATI-built HD4850 and 4870 cards first. None of them came back until I really cranked the heat up and hit over 205-210C for at least 10 seconds on the core (that is, 205-210C with heat source removed - i.e. actually heating the card close to proper solder reflow temperatures.)
                    Last edited by momaka; 12-16-2019, 06:29 PM.

                    Comment

                    Working...
                    X