Linux Tips Weekly

– [Instructor] Let’s take a few minutes to explore data storage devices on Linux. In between the physical hard drive and the files in your file browser, there are a few layers of organization that help keep things where they need to be. A hard drive, or a solid-state device, can be thought of as a whole bunch of empty little containers, each with their own address. These spaces, or sectors, are designed to store pieces of data that make up the files we use. Different types of storage have different sector sizes. Many hard disks use sector sizes of either 512 bytes or four kilobytes, and SSDs use sectors of 512 kilobytes.

We can store a file on this kind of device without a little bit of abstraction, though. For that, we need to start organizing these sectors the device gives us, and designate groups of them to use for storage. We could use all of them, or set up an imaginary line across them to divide them into different areas or partitions. And on top of these partitions, we apply a filesystem, which is a data structure that keeps track of all of the file names and file metadata. Filesystems think in terms of blocks, or chunks of data that are all the same size. The filesystem stores these blocks in the sectors on the hard disk.

The Ext4 format, for example, uses blocks that are four kilobytes in size, so files that are a few megabytes will take up hundreds of blocks each, and consequently, hundreds of sectors on a magnetic disk. The filesystem associates the files with a list of blocks where the pieces of each file are stored. The system handles writing and reading these blocks to and from disk with the help of the disk controller. Old disks didn’t have a disk controller, so the operating system and the user had to be aware of the physical geometry of the disk. In that mode of working with storage, we were concerned with tracks, which are the flat, horizontal stripe of a disk where the magnetic medium stored data, and cylinders, which referred to the set of tracks where a disk’s read head could be at any given time.

But now the disk controller handles all of the physical location stuff within a spinning magnetic disk, and these concepts don’t make much sense at all in solid-state storage, so generally speaking, we don’t have to worry about them, even though they still show up in some disk management interfaces. Let’s take a look at adding a disk to a system and creating partitions and filesystems. I’ve plugged a USB flash drive into my system here, and it’s fresh out of the package, so it’s formatted for Windows, or more precisely, formatted with the FAT32 filesystem. Linux can read and write this filesystem, which is convenient for a drive that will need to be read on different platforms, but it’s limited to storing files that are no larger than about four gigabytes.

Other filesystems have different limits, most of which are so large that we’re unlikely to run into problems. The limits are usually in the terabyte, petabyte, or exabyte range, thousands, millions, or billions of gigabytes. We can see the device with lsblk, which shows block devices on the system. Block devices are what Linux calls storage devices that store data in blocks, as opposed to character or raw devices, which let you write data directly. Here’s my primary hard disk, sda, or Serial Disk A, and here’s my flash drive, sdb, Serial Disk B.

They’re called serial because they’re on serial buses, Serial ATA or SATA for my primary disk, and USB, Universal Serial Bus, for my flash drive. Older systems had parallel buses, and those disks are called HDA, HDB, and so on. For an end-user, it doesn’t really matter, though. We work with them the same way. This flash drive came with a partition with a filesystem on it, and to modify its underlying structure, we need to unmount it, or tell the system not to use it right now. I’ll unmount the filesystem with umount /dev/sdb1, dev/sdb1, partition number one on disk B.

There are two ways we could go about modifying this device. We can use fdisk from the command line, with fdisk /dev/sdb, but the fdisk interface is a little bit, well, odd, if you haven’t worked with partitions before, so I’ll use a program called GParted, a graphical version of the partition editor Parted. This gives us a graphical interface so we can see what’s going on better. I’ll open it up here from the command line, or you could find it in your Applications menu. If GParted isn’t installed, you can install it with your package manager.

Here, I can see a visual representation of the disks on my system. I’ll use the selector here in the top right to switch to sdb. This bar shows the whole device, and within it, there’s one partition with one filesystem. I’ll right-click on that and choose to remove it, then I’ll right-click again and add a partition. If you have a completely blank disk, like a hard drive right out of a retail box, you’ll need to create a partition table if there isn’t already one. Here, I can choose some of the characteristics of my new partition.

I can set the size, choose whether to create a primary, logical, or extended partition, set a filesystem, and set a label. I’ll create an Ext4 partition called My Files using the maximum size available on the disk. What we’ve done so far is to create a list of steps for GParted to process. These changes haven’t been made to the disk yet. In order to make the changes, I’ll click on the green check mark here and then I’ll click Apply. As the process completes, I can see the commands that are being run in the background.

When the operations are completed, I’ll click Close and I’ll close GParted. Now if I open up my file browser, I can see my disk in the list. In order for my user to write files to this disk, I’ll need to change the permissions with chown, my user and my group, and the path to the disk, which is a media, my username, and My Files.

Now I can make changes to this disk. If you’re getting started working with disks, the terminology can be a little bit confusing, especially if you come from another platform. Historically, Windows, Linux, and macOS have used some slightly different terms to refer to the same things. In Linux, the filesystem looks unified, but there are separate filesystems on different disks, all nested under the root filesystem. In Windows, what’s called a drive is usually a filesystem, and separate drives show up under My Computer with letters instead of as paths in the filesystem.

On a Mac, individual filesystems are sometimes called volumes and they show up separately, but they’re collected in a filesystem called Volumes. So, it can take a moment to get your bearings when you switch platforms, but under the hood, storage works the same way.