Top products from r/zfs

We found 28 product mentions on r/zfs. We ranked the 51 resulting products by number of redditors who mentioned them. Here are the top 20.

Next page

Top comments that mention products on r/zfs:

u/txgsync · 6 pointsr/zfs

Linking OP's problem here...

Chances are 9/10 that the CPU is not "busy", but instead bumping up against a mutex lock. Welcome to the world of high-performance ZFS, where pushing forward the state-of-the-art is often a game of mutex whac-a-mole!

Here's the relevant CPU note from the post:

> did a perf top and it shows most of the kernel time spent in _raw_spin_unlock_irqrestore in z_wr_int_4 and osq_lock in z_wr_iss.

Seeing "lock" in the name of any kernel process is often a helpful clue. So let's do some research: what is "z_wr_iss"? What is "osq_lock"?

I decided to pull down the OpenZFS source code and learn by searching/reading. Lots more reading than I can outline here.

txgsync: ~/devel$ git clone
txgsync: ~/devel$ cd openzfs/
txgsync: ~/devel/openzfs$ grep -ri z_wr_iss
txgsync: ~/devel/openzfs$ grep -ri osq_lock

Well, that was a bust. It's not in the upstream OpenZFS code. What about the zfsonlinux code?

txgsync: ~/devel$ git clone
txgsync: ~/devel$ cd zfs
txgsync: ~/devel/zfs$ grep -ri z_wr_iss
txgsync: ~/devel/zfs$ grep -ri osq_lock

Still no joy. OK, time for the big search: is it in the Linux kernel source code?

txgsync: ~/devel$ cd linux-4.4-rc8/
txgsync: ~/devel/linux-4.4-rc8$ grep -ri osq_lock

Time for a cup of coffee; even on a pair of fast, read-optimized SSDs, digging through millions of lines of code with "grep" takes several minutes.

include/linux/osq_lock.h:#ifndef LINUX_OSQ_LOCK_H
include/linux/osq_lock.h:#define OSQ_LOCK_UNLOCKED { ATOMIC_INIT(OSQ_UNLOCKED_VAL) }
include/linux/osq_lock.h:static inline void osq_lock_init(struct optimistic_spin_queue
include/linux/osq_lock.h:extern bool osq_lock(struct optimistic_spin_queue lock);
include/linux/rwsem.h:#include <linux/osq_lock.h>
include/linux/rwsem.h:#define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL
include/linux/mutex.h:#include <linux/osq_lock.h>
kernel/locking/Makefile:obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
kernel/locking/rwsem-xadd.c:#include <linux/osq_lock.h>
kernel/locking/rwsem-xadd.c: osq_lock_init(&sem->osq);
kernel/locking/rwsem-xadd.c: if (!osq_lock(&sem->osq))
kernel/locking/mutex.c:#include <linux/osq_lock.h>
kernel/locking/mutex.c: osq_lock_init(&lock->osq);
kernel/locking/mutex.c: if (!osq_lock(&lock->osq))
kernel/locking/osq_lock.c:#include <linux/osq_lock.h>
kernel/locking/osq_lock.c:bool osq_lock(struct optimistic_spin_queue

For those who don't read C well -- and I number myself among that distinguished group! -- here's a super-quick primer: if you see a file with ".h" at the end of the name, that's a "Header" file. Basically, it defines variables that are used elsewhere in the code. It's really useful to look at headers, because often they have helpful comments to tell you what the purpose of the variable is. If you see a file with ".c" at the end, that's the code that does the work rather than just defining stuff.

It's z_wr_iss that's driving the mutex lock; there's a good chance I can ignore the locking code itself (which is probably fine; at least I hope it is, because ZFS on Linux is probably easier to push through a fix than core kernel IO locking semantics) if I can figure out why we're competing over the lock (which is the actual problem). Back to grep...

txgsync: ~/devel/linux-4.4-rc8$ grep -ri z_wr_iss

MOAR COFFEE! This takes forever. Next hobby project: grok up my source code trees in ~devel; grep takes way too long.



And the search came up empty. Hmm. Maybe _iss is a structure that's created only when it's running, and doesn't actually exist in the code? I probably should understand what I'm pecking at a little better. Let's go back to the ZFS On Linux code:

mbarnson@txgsync: ~/devel/zfs$ grep -r z_wr

module/zfs/zio.c: "z_null", "z_rd", "z_wr", "z_fr", "z_cl", "z_ioctl"

Another clue! We've figured out the Linux Kernel name of the mutex we're stuck on, and that z_wr is a structure in "zio.c". Now this code looks pretty familiar to me. Let's go dive into the ZFS On Linux code and see why z_wr might be hung up on a mutex lock of type "_iss".

txgsync: ~/devel/zfs$ cd module/zfs/
txgsync: ~/devel/zfs/module/zfs$ vi zio.c

z_wr is a type of IO descriptor:

  • ==========================================================================
  • I/O type descriptions
  • ==========================================================================
    const char
    zio_type_name[ZIO_TYPES] = {
    "z_null", "z_rd", "z_wr", "z_fr", "z_cl", "z_ioctl"

    What about that z_wr_iss thing? And competition with z_wr_int_4? I've gotta leave that unanswered for now, because it's Saturday and I have a lawn to mow.

    It seems there are a few obvious -- if tentative -- conclusions:

  1. You're hung up on a mutex lock. This is probably not something that "tuning" will usually eliminate; double-check that you're not using compression, encryption, deduplication, or other obvious resource hogs.
  2. The name of the mutex lock is osq_lock in the Linux kernel. The name seems obvious: it's a queue of some sort. Could it be a write queue to access the device? A parallel test to all your devices -- without ZFS, just simultaneous writes across the stripe in some kind of raw fashion -- might turn up if this mutex is being held due to IO in general, or if it is specific to ZFS.
  3. The mutex competition appears to be between z_wr_int_4 (the write queue for 4k blocks, perhaps?) and z_wr_iss. You might be able to determine if z_wr_int_4 is what I described by re-running your test to see if the new competition is between z_wr_iss with something like z_wr_int_8 for 8k blocks instead.
  4. If I were the OP, I'd evaluate the disks one-by-one. Create a zpool of just the one drive, and run the IO test on just that drive first. If performance is good with a single-drive zpool, nuke the pool and use two drives in a stripe. Try again. See what the scale tipping point is with three drives, four drives, etc. Xen historically had challenging IO queueing when managing more than four block devices; I wonder if some legacy of this remains?
  5. You really need to see if you can reproduce this on bare metal. It seems likely that this is an artifact of virtualization under Xen. Even with paravirtualization of IO, any high-performance filesystem is really sensitive to latency in the data path. Seems more a Xen bug than a ZFS bug, but it might be work-around-able.
  6. Xen -- if I understand correctly -- uses a shared, fixed-size ring buffer and notification mechanism for I/O, just one per domU. So although you're throwing more drives at it, this moves the bottleneck from the drives to the ring buffer. If I were to pursue this further, I'd look to competition for this shared ring buffer resource as a likely candidate imposing a global throttle on all IO to the domU under your hypervisor:
    • you've filled the ring buffer,
    • Xen has to empty it and make room for more data before the lock can clear,
    • this suggests that the real governor is how long the Linux kernel mutex has to wait for Xen to poll the ring buffer again.
    • You might not observe this with forked processes in a paravirtualized kernel. ZFS is a multithreaded kernel process, so I wonder if it's being forced to use a single ring buffer for I/O in a Xen environment.

      It's just a hypothesis, but I think it may have some legs and needs to be ruled out before other causes can be ruled in.

      I was willing to dive into this a bit because I'm in the midst of some similar tests myself, and am also puzzled why the IO performance of Solaris zones so far out-strips ZFSoL under Xen; even after reading Brendan Gregg's explanation of Zones vs. KVM vs. Xen I obviously don't quite "get it" yet. I probably need to spend more time with my hands in the guts of things to know what I'm talking about.

      TL;DR: You're probably tripping over a Linux kernel mutex lock that is waiting on a Xen ring buffer polling cycle; this might not have much to do with ZFS per se. Debugging Xen I/O scheduling is hard. Please file a bug.

      ADDENDUM: The Oracle Cloud storage is mostly on the ZFS Storage Appliances. Why not buy a big IaaS instance from Oracle instead and know that it's ZFS under the hood at the base of the stack? The storage back-end systems have 1.5TB RAM, abundant L2ARC, huge & fast SSD SLOG, and lots of 10K drives as the backing store. We've carefully engineered our storage back-ends for huge IOPS. We're doubling-down on that approach with Solaris Zones and Docker in the Cloud with Oracle OpenStack for Solaris and Linux this year, and actively disrupting ourselves to make your life better. I administer the architecture & performance of this storage for a living, so if you're not happy with performance in the Oracle Cloud, your problem is right in my wheelhouse.

      Disclaimer: I'm an Oracle employee. My opinions do not necessarily reflect those of Oracle or its affiliates.
u/Fiberton · 2 pointsr/zfs

Best thing to do is to buy a new case. Either this Which a quite a lot of folks I know who are using mini iTX are using something like this. 8 hotswap 3.5 and 4 x 2.5 or if you want to use ALL your drives and a cheaper alternative You can fit 15 x 3.5 in that. or get some 2x2.5 to 1x3.5 to shove some SSDs in there too. There are various companies I looked quickly on Amazon. That way you can have 12 drives rather than just 6. The cheap sata cards will fix you up or shove this in there . Hope this helps :)

u/mercenary_sysadmin · 1 pointr/zfs

Can you link me to a good example? Preferably one suited for a homelab, ie not ridicu-enterprise-priced to the max? This is something I'd like to play with.

edit: is something like this a good example? How is the initial configuration done - BIOS-style interface accessed at POST, or is a proprietary application needed in the OS itself to configure it, or...?

u/hab136 · 1 pointr/zfs

Current: (6-1) x 4 TB = 20 TB

(3-1) x 6 TB = 12 TB
(3-1) x 4 TB = 8 TB
20 TB total

You don't gain any space by doing this, though you do prepare for the future.

Are you able to add more drives to your system, perhaps externally? I've personally used these Mediasonic 4-bay enclosures along with an eSATA controller (though the enclosures also support USB3). Get some black electrical tape though, because the blue lights on the enclosure are brighter than the sun. The only downside with port-splitter enclosures is that if one drive fails and knocks out the SATA bus, the other 3 drives will drop offline too. The infamous 3 TB Seagates did that, but I had other drives (both 3 TB WD and 2 TB Seagates) fail without interfering with the other drives. Nothing was permanently damaged; just had to remove the failed drive before the other 3 started working again. Also, the enclosure is not hot-swap; you have to power down to replace drives. But hey, it's $99 for 4 drive bays.

6 TB Red drives are $200 right now ($33/TB); 8 TB are $250 ($31/TB), and 10 TB are $279 ($28/TB).

Instead of spending $600 (three 6 TB drives) and getting nothing, spend $672 ($558 for two 10 TB drives, $100 for enclosure, $30 for controller, $4 for black electrical tape) and get +10 TB by adding a pair of 10 TB drives in a mirror in an enclosure, and have another 2 bays free for future expansion.

(6-1) x 4 TB = 20 TB
(2-1) x 10 TB = 10 TB
30 TB total, $668 for +10 TB

Later buy another two 10 TB drives and put them in the two empty slots:

(6-1) x 4 TB = 20 TB
(2-1) x 10 TB = 10 TB
(2-1) x 10 TB = 10 TB
40 TB total, $558 for +10 TB

Then in the future you only have to upgrade two drives at a time, and you can replace your smallest drives with the now-replaced drives.

You can repeat this with a second enclosure, of course. :)

Don't forget that some of your drives will fail outside of warranty, which can speed your replacement plans. If a 4 TB drive fails, go ahead and replace it with a 10 TB drive. You won't see any immediate effect, but you'll turn that 20 TB RAIDz1 into 50 TB that much quicker.

Oh, and make sure you've set your recordsize to save some space! For datasets where you're mainly storing large video files, set your recordsize to 1 MB: "zfs set recordsize=1M poolname/datasetname". This only takes effect on new writes, so you'd have to re-write your existing files to see any difference. You can rewrite files with "cp -a filename tmpfile; mv tmpfile filename" for all files, or a much easier way is just create a new dataset with the proper recordsize, move all files over, then delete the old dataset and rename the new dataset.

See this spreadsheet. With 6 disks in RAIDz1 and the default 128K record size (16 sectors on the chart) you're losing 20% to parity. With 1M record size (256 sectors on the chart) you're losing only 17% to parity. 3% for free!

u/zfsbest · 1 pointr/zfs

--I use an old quad-core i3 laptop with a 2-port eSATA Expresscard to connect the 4-bay Probox. Can connect it with a USB3 Expresscard as well, but I don't trust that configuration. I was also able to connect it to an older motherboard that had SATA port expansion with an internal-to-external SATA cable.


3FT eSATA to SATA male to male M/M Shielded Extender Extension HDD Cable 6Gbps


--If I need quicker scrub times, I can take the drives and put them in a 5-bay Sans Digital HDDRACK5 with a PC power supply, and hook them up to one of my SAS cards in the tower server I had built from Fry's a few years ago. It's LSI2008 with the cables routed externally.


Cable: External Mini SAS 26pin (SFF-8088) Male to 4x 7Pin Sata Cable

Cards: SAS9200-8E 8PORT Ext 6GB Sata+sas Pcie 2.0

Fan card: Titan Adjustable Dual Fan PCI Slot VGA Cooler (TTC-SC07TZ)


--Sorry for the late reply, BTW - haven't checked the forum for a few days.

u/killmasta93 · 2 pointsr/zfs

Thanks for the reply,

so i need to buy this cable only 2 cables which goes from the h200 to the SAS backpane correct? and the other cables that go to the Board i would remove it correct?


Thank you

u/melp · 1 pointr/zfs

I'd really recommend these two books for high-level administration of ZFS:

And the other one I linked has one chapter that gets into the low-level workings of ZFS:

u/old63 · 1 pointr/zfs

Thanks so much for all this!

I had found the memory and controller card below in the interim.

I think these will work. What do you think?

On this build I probably won't try to get a slog for the zil but in the future I may if we test and can hook these up to our vm hosts. Do you have any recommendations for that? I know NFS does sync writes so I think I'll need a slog if I do that.

u/monoslim · 1 pointr/zfs

Something like this maybe:

Norco DS-12D External 2U 12 Bay Hot-Swap SAS/SATA Rackmount JBOD Enclosure

u/mayhempk1 · 1 pointr/zfs


I had a power outage yesterday, and my UPS shut down at 75% battery remaining with no warning (I don't think I was writing to my ZFS array). I have a ZFS RAID 0 array with 3 WD Red 8TBs (yes, I understand it is 100% temporary storage and it WILL fail, two of the disks have 6k power on hours and the other has like 600, but I expect it to be good for at least a bit longer and not have a triple failure) with this device: and it failed. I destroyed the zpool and created it from scratch, copied all data over, and there was STILL a ton of checksum errors (not sure if I did zpool clear at any point, or if I needed to?)... does that mean all my drives are bad, or my enclosure is bad, or is it possible it was just a temporary issue? I turned that box off, rebooted my computer, turned it back on and created my zpool.

I am going to replace my UPS because I didn't trust it before and now I definitely don't, but I'm hoping my disks are okay? I don't have ANY SMART issues, offline_uncorrectable, reallocated_event_count, current_pending_sector, reallocated_sector_ct, etc are all fine and all have a raw_value of 0.

I would certainly like to hear your thoughts as always.

u/5mall5nail5 · 3 pointsr/zfs

If interested in some info on ZFS on Linux - its not a huge book, but its very technical in parts.

u/ChrisOfAllTrades · 1 pointr/zfs

No, the SAS back panel will also have the single SFF-8087 port - it will look the same as on the Dell H200.

You just need a regular cable like this:

u/Liwanu · 2 pointsr/zfs

That's why i bought a New-Older motherboard that uses DDR3.
My SuperMicro X8DTE-F-O is a MF Beast with 192GB of RAM.
Each 16GB stick of RAM is $40

u/airmantharp · 1 pointr/zfs

Bumping to say that I'm using this cheaper one, but it's also on Server 2016 that is sharing the drives to FreeNAS through Hyper-V for ZFS pools, which are a combination of ports on this card and on the motherboard, ten drives total. No major complaints; only minor one is that the attached drives showing up as removable, which is scary when removing USB drives in Windows as they show up in that menu.

I've considered an LSI server pull as a replacement- I have more drives!- but haven't gotten around to it yet.