Home » Hardware

Category Archives: Hardware

ata3:00: status: DRDY ERR

harddisk apart

The Problem

During the years in my career I relatively often see a specific error message on Linux production servers. The error is associated with hard drive failures. I saw the problem on systems with conventional HDD drives as well on servers which are only using NVMe drives. RAID array disks are affected as well as single drive systems.

The information in the internet is relatively rare in relation to the appearance of the problem. Sometimes the error are appearing on std out, directly in the shell and sometimes they are only visible when “dmesg” is executed.

Here is the error which is meant:

ata3:00: status: { DRDY ERR }
ata3.00: error {UNC }
ata3:00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata3:00: BMDMA stat 0x25
ata3:00: failed command: READ DMA

For sure the details like the ATA number are different depending on the configuration of the system.


The Analysis

The error basically says that somethings wrong with a specific disk drive. We need to find out which physical disk is causing the error.

This command works for me on Ubuntu 20.04:

ls /dev/disk/by-path -al

It shows the ATAx to /dev/sdX association on my system. Since I know now the /dev/sdx device I need to get the serial number of the hard drive:

smartctl -a /dev/sdX

To know the serial number of the hard disk is important because the serial number is, at least on HDDs, printed on a sticker on the disk.


The Fix

I had luck at least in 40 percent of the cases in which this error appeared by just shutting down the computer, opening the casing, identifying the right physical disk and changing the connecting SATA cable.

One time the error appeared on multiple disks at the same time. The fix was to order an identical mainboard and physically installing it. I connected the disks again to the new mainboard and the error never appeared again.

Another method could be just to shut down the system and unplug and reconnect the SATA cables on the same socket (loose contact). This solved the problem one time for me.

In the case of NVMe drives it could help to shut down the computer and also unplug the NVMe and reconnect it to rule out a loose contact.

But sometimes there is just a hard disk drive defect and the hard disk needs to be replaced.

3ware 9500S-4LP – Raid 5 Degraded

screwdriver and micro chip

The Problem

I had a RAID 5 failure on an old Linux Server with special hardware connected. A RAID 5 as you probably know is a redundant disk array of hard disks. Disks are usually connected to a RAID Controller. The controller in my server is a 3ware 9500S-4LP. This controller is outdated but still available to buy at some web shops. Sometimes it is not possible to migrate outdated systems to new hardware like in this specific case.

On my server three disks were attached to the controller. I used the tw_cli which was available for downloading on the 3ware homepage in the past. Recently I found the binaries here:

https://www.thomas-krenn.com/de/download/hide_component.1/frame.only_content/hide_category.1/hide_product.1/product.2924.html

I downloaded the tw_cli package, extracted it, executed it and then to get an overview I did:

//server> show 

Ctl	Model 		Ports	Drives	Units	NotOpt	RRate	VRate	BBU ------------------------------------------------------------------------ 
c8 	9500S-4LP 	4 	3 	1 	1 	4 	4 	-

I used the “show” command to get the basic controller information.

Then, the next command showed detailed information about the controller state.

//server> info c8 u0 
Unit 	UnitType Status 	%Cmpl 	Port 	Stripe 	Size(GB) Blocks 
-------------------------------------------------------------------------
u0 	RAID-5 	DEGRADED 	- 	- 	64K 	745.037 1562456064 
u0-0 	DISK 	DEGRADED 	- 	p3 	- 	372.519 781228032 
u0-1 	DISK 	OK 		- 	p2 	- 	372.519 781228032 
u0-2 	DISK 	OK 		- 	p0 	- 	372.519 781228032

I saw that it was degraded because of a failed disk.

The Fix

I had to stop all the processes which were executed on the RAID 5 volume. Then I had to umount the RAID 5 volume from the file system. The next step was to remove the degraded disk on the port 3 with this command:

//server>maint remove c8 p3

Then I shut the server down. I saw on the controller hardware on which SATA Port the p3 Disk was connected. I replaced the disk with a newer, bigger one. The cache of the new disk was bigger than on the old one. That is important!

//server>maint rescan c8

This was the next important step – “rescan” for the newly installed disk. And then to start recreation of the RAID I finally did:

//server>maint rebuild c8 u0 p3

After a few hours i got this:

//server> info c8 u0 
Unit 	UnitType 	Status 	%Cmpl 	Port 	Stripe 	Size(GB) Blocks ----------------------------------------------------------------------- 
u0 	RAID-5 		REBUILDING 42 	- 	64K 	745.037	1562456064 
u0-0 	DISK 		DEGRADED 	- p3 	- 	372.519	781228032 
u0-1 	DISK 		OK 		- p2 	- 	372.519	781228032
u0-2 	DISK 		OK 		- p0 	- 	372.519	781228032

The rebuild took only a few hours.

If the Rebuild doesn’t start and you get an error message, it could be a good alternative to start the rebuild inside the controller BIOS.

Finally the server was as good as it was before 🙂

Cannot install Windows 11 on Samsung NVMe

Opened Harddisk drive

Basic Information

Recently I created with Ubuntu and the Linux dd Tool (Disk Dump) a bootable Windows 11 USB Stick. The aim was to install Windows 11 on a PC with a Samsung NVMe disk.


The Problem

As I started the Windows 11 installation procedure no disk was shown by the Windows installer and I was asked to provide an appropriate storage/disk driver. I basically could not install Windows on the NVMe disk, no matter which drivers I provided.

I also changes several BIOS settings without success. No matter if I changed setting like “Secure Boot”, operating system type, “fast boot” and so on, I was not able to install Windows 11. I tried also to install Windows 10 but failed too.

After a long search with Google I could finally fix the issue on my PC.


The Fix

I just created the USB Stick again, but not with dd (Disk Dump).

I installed the tool WoeUSB on my Ubuntu PC which is an Opensource tool and available for Linux. Then I installed the Windows ISO with WoeUSB on the USB Stick.

Please note that the step called “Installing GRUB bootloader for legacy PC booting support…” took me very long time – up to 25 minutes but the installation on the stick succeeded in my case.

That‘s it. After booting the Stick, the NVMe was recognized by the Windows installer and no driver was required. I could just modify the Partitions and Volumes and went on with the installation.

consulting picture

WordPress Cookie Notice by Real Cookie Banner