A while back, I purchased a Dell EMC KTN-STL3 storage enclosure so that I could switch off my janky storage setup. I’m embarrassed to even talk about it now, but it was seven 4TB SAS HDDs connected to an IBM M5015 on RAID0, directly passed into ZFS. If that wasn’t bad enough, I didn’t have enough 15 pin SATA power cables coming off the PSU connected to my motherboard, so I had to run another PSU to connect the rest. When I set it up, I didn’t understand much about storage configurations, and I was antsy to just get the system up and running, especially after dealing with much trouble with the RAID controller, but I knew it wasn’t great so I bought the enclosure, but I still didn’t have the equipment to interface with it correctly so I just let it sit there and collect dust for a while.

A couple of months after the initial setup, I was alerted by my friend of just how bad my configuration was. I wanted to change it, but the storage wasn’t physically available to me at the time because I was studying away from home. At the same time, the setup seemed to be working fine at the moment, and with no other choice, I just let it be. Half a year later, the day of reckoning arrived. The TrueNAS instance the storage was connected to had suddenly powered off.

I guided my family to diagnose the problem and restart the computer, and it did so just fine. I remotely verified that the data was intact and breathed a sigh of relief. I had documents and pictures from my old phones on there, backed up to nowhere else and I would’ve been very upset to lose it. This incident made me realize how urgent the upgrade and backup was. I was planning to at least back up my data over the next few days (I couldn’t do so immediately because I had to take care of my school duties), but the very next day, the computer had shut off again. I asked my family to do the same thing, and I remotely checked the data again. The ZFS pool was offline.

Months later, I returned home and started digging at the issue. In the RAID controller interface, I found that a single drive had failed. There was no chance of recovery because RAID0 had striped everything across all the disks. I cursed my own stupidity. What I should have done months ago was to immediately ask my family to completely shut down the system to prevent any further damage. Alas, the damage was already done and I just had to accept my losses and move on.

I got to work researching and consulting my friend on how to get the storage enclosure up and running. Months ago, the same friend had recommended to me and bought for me a HBA controller — an HP H221, flashed with some firmware that I didn’t note down — to pass through each disk separately for a proper ZFS setup. The back plane of the enclosure had four SFF 8088 ports, two on the top half and two on the bottom half. The controller had two SFF 8088 ports, so I bought two male to male connectors and hooked it up and loaded the disks into the enclosure.

After some stupidity (I didn’t realize you had to unscrew some parts of the SFF cable to fit it correctly!), I connected the cables and tried to boot. I booted into the controller’s BIOS, like normal and waited on the “Initializing…” screen… and I waited… and I waited some more. I ate a meal and came back. Still no change. I was puzzled because the system was able to boot past the controller’s BIOS and even into the controller’s configuration interface just fine before I plugged the cables in. I unplugged the cables and tried booting again. Yup, for some reason, plugging in the cables was causing problems. Maybe I was plugging in to the wrong ports, I thought. After all, there were four ports on the enclosure. I played around with plugging into different combinations of the enclosure. Still no luck.

I had tried searching up EMC KTN-STL3 manuals, guides, or just any relevant information that might be able to help, including general information about storage controllers and about the HP H221 card I had, but the information was incredibly sparse. Of the information I could find, it seemed like my setup should have worked. I further probed my friend for consultation, but we came up short. For the next few days, I did more research and fiddled around with settings. I tried disabling BIOS booting through the controller interface as per this Reddit thread (take a moment to look at the upvotes and realize how uncommon my problem was). I was able to skip the controller’s BIOS and boot straight into TrueNAS, but it was failing to start up properly, giving me errors like “Root mount waiting for: CAM”. I could only find a few threads on this with much information I couldn’t even understand, but there was one TrueNAS thread that gave me a lead before delving deeper to learn enough to parse the information in the other threads. It mentioned that this error was associated with an HBA failing to initialize under UEFI.

When I came into this, I was hoping I wouldn’t have to flash my controller. Things can very easily go wrong if you’re not careful or knowledgeable, and I had neither of those attributes — horror stories of irreversibly bricking a controller were flooding through my mind. However, I soon realized flashing the controller to upgrade to UEFI support was the only lead. My friend had told me of the tools — megarec and sas2flash — but these were fairly specialized tools that I couldn’t easily find digestible documentation for, so I had been scared away in the past. I found a very comprehensive thread to update LSI SAS controllers with an UEFI motherboard, and an even more detailed thread on the (cross)flashing process, and with the go-ahead signal of my friend, I started hacking away.

I was told to use megarec to wipe the BIOS of the controller before using sas2flash, so I did that after backing it up with the same tool. I then switched off the DOS shell containing megarec and onto the EFI shell containing sas2flash, created by following the first guide, and tried to follow it.  sas2flash -listall was giving me a controller with malfunctioning firmware, which was not what the guide expected, but I wasn’t fazed because I knew I had just wiped the firmware with megarec. I followed the rest of the guide, but I was getting errors trying to run sas2flash -o -f xxx.bin -b x64sas2.rom -b mptsas2.rom, telling me there was a mismatch in the chipset or something. Ah, I’m flashing with the wrong firmware, I realized. After a couple of trips up and down the stairs between my workstation and the storage location trying to figure out the right firmware, I was still getting mismatch errors. I scratched my head in confusion because I was fairly certain I had gotten it right, especially after pausing to understand the crossflashing process a bit more. As I read the second guide more closely, I found out that in newer versions of sas2flash, it doesn’t like it when you crossflash, even when you’re crossflashing a compatible card. I sat down and fully digested the second guide and found a hacked version of sas2flash given in the minimal USB files at the bottom of the guide. I made sure to find compatible firmware files with the HP H221 in this thread, downloaded them off of broadcom, and copied over the minimal USB files and the firmware files onto my USB and used the knowledge from the first guide to find and copy the right firmware files.

Back into the EFI shell, I used the hacked sas2flash to run sas2flash -o -f 9205-8e.bin -b x64sas2.rom -b mptsas2.rom like the first guide mentioned and… the flash succeeded! I updated the SAS ID (I just made up a random ID, which was fine according to the second thread) and exited the EFI shell. Booting back up, the familiar text of the controller BIOS greeted me and I held my breath as the “Initializing…” text showed up again. One mississippi, two mississipi, … I counted in my head. After a short, but nervewracking pause, the screen flashed with new text listing the drives detected by the controller and it booted into TrueNAS with no problems. I checked on the web interface that the drives were indeed being detected — they were!

This whole process of trial and error took over a week and a half, and I’m writing this only hours after the success. It was exhilarating seeing the pieces finally fall into place after much perseverance, and traces of the joy I felt can still be found as I write this. I wanted to write this post to document my struggles and serve as a guiding light for the few who find themselves in a similar situation and stumble upon this post. I omitted some details for sake of comprehensibility — none that I think are relevant — but for those who use this post to troubleshoot their own problems and are confused on some of the steps I took, contact me at [email protected] and include a reference to this post in the subject so I can find it (something like “Storage Help Required” will do). I’m by no means an expert and cannot guarantee that I’ll even be helpful, but I’ll be happy to provide the full details of what I did.

Cheers.

Leave a Reply

Your email address will not be published. Required fields are marked *