Friday, July 31, 2009

ZFS Datasets and "zfs list"

Any of you who have looked at the zfs(1M) man page will have come across the term "dataset".

A dataset can be:
a) a file system
b) a volume
c) a snapshot

Every time we create a ZFS file system we are actually creating a dataset with a setting of "type=filesystem".

Every pool starts out with a single dataset with a name that is the same as the pool name.

E.g. When we create a pool called "ttt" we automatically have a dataset called "ttt" which by default has a mounted file system called /ttt.

We can change mount point but we can not change the name of this dataset.
Every time we add a new dataset to a pool it must be a "child" of an existing dataset.
E.g. A new dataset called "ttt/xxx" can be created which is a "descendant" of the "ttt" dataset.

We can then create "ttt/xxx/yyy" which is a descendant of both "ttt" and "ttt/xxx"

It is not possible to create a dataset called "ttt/xxx/yyy" if "ttt/xxx" does not already exist.

All of the datasets we have created thus far have been mounted file systems; today won't be any different.

We will look at volumes and snapshots another day.

Today we will look at the "zfs list" command. This command is used to list ZFS datasets.


First create some temp files and a new pool:
# mkfile 119M /tmp/file1
# mkfile 119M /tmp/file2
# zpool create ttt /tmp/file1 /tmp/file2

And let's create a 50MB file in our new /ttt file system.

# mkfile 50M /ttt/50_meg_of_zeros

Now run "df -h" to see that 50MB is used.

# df -h /ttt
Filesystem Size Used Available Capacity Mounted on
ttt 196M 50M 146M 26% /ttt

If you don't see 50M under the "Used" column, try running the command again.
There may be a time lapse between creating the 50_meg_of_zeros file and having "df" report 50MB of data in the file system.

Now let's use "zfs list" to get additional information about the "ttt" dataset.

# zfs list ttt
NAME USED AVAIL REFER MOUNTPOINT
ttt 50.2M 146M 50.0M /ttt

Note that the information provided is similar to the data provided by "df -h".
The "REFER" column shows us how much data is used by this data set.

The "USED" column refers to the amount of data used by this dataset and all the descendants of this dataset.

Since we only have a top level dataset with no descendants, the two values are roughly equal. Small discrepancies are generally due to overhead.

The "AVAIL" column shows the amount of available space in the dataset, which (not coincidentally) is equal to the free space in the pool.

Now create some file systems… this syntax works on Solaris 10 08/07 with recent patches.

# zfs create -o mountpoint=/apps_test ttt/apps # descendant of ttt
# zfs create -o mountpoint=/work_test ttt/work # descendant of ttt
# zfs create -o mountpoint=/aaa ttt/work/aaa # descendant of both ttt and ttt/work
# zfs create -o mountpoint=/bbb ttt/work/bbb # descendant of both ttt and ttt/work

If you are using an older version of Solaris 10 you may need to use the following syntax to achieve the same thing:

# zfs create ttt/apps # descendant of ttt
# zfs set mountpoint=/apps_test ttt/apps
# zfs create ttt/work # descendant of ttt
# zfs set mountpoint=/work_test ttt/work
# zfs create ttt/work/aaa # descendant of both ttt and ttt/work
# zfs set mountpoint=/aaa ttt/work/aaa
# zfs create ttt/work/bbb
# zfs set mountpoint=/bbb ttt/work/bbb # descendant of both ttt and ttt/work

Regardless of which syntax you used to create the file systems, let's move on and create another 50MB file in one of our file systems.

# mkfile 50M /aaa/50_meg_of_zeros

# df -h | egrep 'ttt|Filesystem'
Filesystem Size Used Available Capacity Mounted on
ttt 196M 50M 96M 35% /ttt
ttt/apps 196M 24K 96M 1% /apps_test
ttt/work 196M 24K 96M 1% /work_test
ttt/work/aaa 196M 50M 96M 35% /aaa
ttt/work/bbb 196M 24K 96M 1% /bbb

Now we have two file systems ("/ttt" and "/aaa") each showing a utilization of 50GB. Nothing surprising so far.

Note you could use "df -F zfs -h"… but that will show all zfs file systems on the system. The "egrep" syntax used above limits us to the file systems that are part of the "ttt" pool.

Now lets rerun "zfs list ttt"

# zfs list ttt
NAME USED AVAIL REFER MOUNTPOINT
ttt 100M 95.6M 50.0M /ttt

Note that the "REFER" column is still showing 50MB because /ttt still contains only one 50MB file.

But the "USED" column now shows 100MB. Remember, the USED column represents the amount of data in the dataset and all of the descendants of the dataset.

We have 50MB in "ttt" (mounted under /ttt) and 50MB in "ttt/work/aaa" (mounted under /aaa), the total space consumed by ttt and its descendants is 100MB.

We can also use "zfs list" to look at specific datasets in the pool.

# zfs list ttt/work
NAME USED AVAIL REFER MOUNTPOINT
ttt/work 50.1M 95.6M 24.5K /work_test

Note that the "ttt/work" dataset (mounted under /work_test) contains no data so the "REFER" column shows roughly 0MB.

But the USED value of 50MB reflects the data from the descendant "ttt/work/aaa".

# zfs list ttt/work/aaa
NAME USED AVAIL REFER MOUNTPOINT
ttt/work/aaa 50.0M 95.6M 50.0M /aaa

The "ttt/work/aaa" dataset (mounted under /aaa) contains one 50MB file but. The dataset has no descendants. Therefore both USED and REFER show 50M.

If we want to recursively list all datasets that are part of the "ttt" pool (and exclude all other pools) we need to use the "-r" option and specify the pool name.

# zfs list -r ttt
NAME USED AVAIL REFER MOUNTPOINT
ttt 100M 95.6M 50.0M /ttt
ttt/apps 24.5K 95.6M 24.5K /apps_test
ttt/work 50.1M 95.6M 24.5K /work_test
ttt/work/aaa 50.0M 95.6M 50.0M /aaa
ttt/work/bbb 24.5K 95.6M 24.5K /bbb

Clean up time

# zpool destroy ttt
# rmdir /apps_test
# rmdir /work_test
# rmdir /aaa
# rmdir /bbb
# rm /tmp/file*

Readers who read this page, also read:




Bookmark and Share My Zimbio http://www.wikio.com

0 comments: