Thursday, November 26, 2009

Solaris Jumpstart Howto

Here's a procedure in makig a Solaris jumpstart server.

# mkdir /jumpstart/image
# mkdir /jumpstart/config
# mkdir /jumpstart/share

# lofiadm -a /var/tmp/Solaris10_u5_1108.iso
/dev/lofi/1
# lofiadm /dev/lofi/1
/var/tmp/Solaris10_u5_1108.iso

# svcadm disable volfs
# mkdir -p /cdrom/cdrom0

# mount -F hsfs -o ro /dev/lofi/1 /cdrom/cdrom0

# cd /cdrom/cdrom0/Solaris_10/Tools
# ./setup_install_server /jumpstart/image
Verifying target directory...
Calculating the required disk space for the Solaris_11 product
... output skipped ...

# cd /
# umount /cdrom/cdrom0
# lofiadm -d /dev/lofi/1
# lofiadm
Block Device File

# cd /jumpstart/image/Solaris_10/Misc/jumpstart_sample
# cp ./check /jumpstart/config

# cp /etc/dfs/dfstab /etc/dfs/dfstab.org

# vi /etc/dfs/dfstab
+-------------------
| share -F nfs -o ro,anon=0 /jumpstart/config
| share -F nfs -o ro,anon=0 /jumpstart/image
| share -F nfs -o ro,anon=0 /jumpstart/share

# vi /etc/dfs/dfstab
+-------------------
| share -F nfs -o ro,anon=0 /jumpstart

# svcadm enable nfs/server
# shareall

# vi /jumpstart/config/sysidcfg
+------------------------------
| system_locale=en_US
| timezone=MET
| name_service=NONE
| terminal=dtterm
| timeserver=localhost
| root_password="WybF.D5GwZnz2"
| network_interface=primary { netmask=255.0.0.0 protocol_ipv6=no
default_route=127.0.0.1}
| security_policy=NONE
| nfs4_domain=dynamic

# vi /jumpstart/config/sun4u_profile
+-----------------------------------
| install_type initial_install
| system_type standalone
| partitioning explicit
| filesys any 1024 /
| filesys any 1024 /usr
| filesys any 1024 /var
| filesys any 1024 /opt
| filesys any 1024 /export/home
| filesys any 256 swap
| cluster SUNWCreq
| package SUNWman
| package SUNWbash
| package SUNWless

# cd /jumpstart/config
# vi ./rules
+-----------
| karch sun4u - sun4u_profile -

# ./check

# vi /etc/hosts
+--------------
| 10.0.0.2 pino

# cd /jumpstart/image/Solaris_10/Tools
# ./add_install_client \
> -e 8:0:20:0:0:02 \
> -i 10.0.0.2 \
> -s tommie:/jumpstart/image \
> -c tommie:/jumpstart/config \
> -p tommie:/jumpstart/config \
> pino \
> sun4u

# svcadm enable rarp

# inetconv

# init 0
ok boot net - install

Create a finish script

# vi /jumpstart/config/sun4u_after
+---------------------------------
| {
| mkdir /a/server
| mount -F nfs -o ro 10.0.0.1:/jumpstart/share /a/server
|
| cp /a/server/crontab.root /a/var/spool/cron/crontabs/root
| cp /a/server/hosts.header /a/hosts
|
| HOSTNAME=`cat /etc/nodename`
| regel=`grep $HOSTNAME /a/server/hosts.org`
| echo "$regel loghost ." >> /a/hosts
| grep -v $HOSTNAME /a/server/hosts.org >> /a/hosts
|
| mv /a/hosts /a/etc/hosts
| | umount /a/server
| rmdir /a/server
|
| touch /a/noautoshutdown
| touch /a/etc/.NFS4inst_state.domain
| } > /a/server.log 2> /a/server.errlog

# vi /jumpstart/share/hosts.header
+---------------------------------
| #
| # Internet host table
| #

# vi /jumpstart/share/hosts.org
+------------------------------
| 10.0.0.1 tommie
| 10.0.0.2 pino # crontab -l > /jumpstart/share/crontab.root

Update the rules file

# vi rules
+-----------
| karch sun4u - sun4u_profile sun4u_after

# ./check

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Thursday, November 12, 2009

Sun INIT States

Sun init States

Solaris init states refer to the level of services provided by the system. The exact services and processes run at each init level are determined by the scripts in the /etc/rc#.d directories. The default service levels for each init state are listed below:

* 0: The system is at the PROM monitor (ok>) or security monitor (>) prompt. It is safe to shut down the system when it is at this init state.
* 1, s or S: This state is known as "single-user" or "system administrator" mode. Root is the only user on the system, and only basic kernel functions are enabled. A limited number of filesystems (usually only root and /usr) are mounted. This init state is often used for sensitive functions (such as kernel libc patches) or while troubleshooting a problem that is keeping the system from booting into multiuser mode.
* 2: Multiple users can log in. Most system services (except for NFS server and printer resource sharing) are enabled.
* 3: Normal operating state. NFS and printer sharing is enabled, where appropriate.
* 4: Usually undefined.
* 5: Associated with the boot -a command. The system is taken to init 0 and an interactive boot is started.
* 6: Reboot. This state takes the system to init state 0 and then to the default init state (usually 3, but can be redefined in the /etc/inittab file).

The init states are defined in the /etc/inittab file, which usually points at the scripts in /sbin/rcrun-level. These scripts in turn examine the contents of the /etc/rcrun-level directories. The scripts in these directories whose names begin with the letter K are run in "stop" mode first in alphabetical order. Then the scripts whose names begin with the letter S are run in "start" mode in alphabetical order.

To get to a desired run level n, each of the rc (run control) scripts from 1 to n is run. To get to run level 0, the K scripts are run in each rc#.d directory between the current run level and 0 in reverse numerical order.

In the default configuration, the rc scripts accomplish the following tasks:

* /sbin/rc0
o Stop system services/daemons.
o Terminate running processes.
o Unmount all file systems.
* /sbin/rc1
o Stop system services/daemons.
o Terminate running processes.
o Unmount all file systems.
o Bring up the system in single-user mode.
* /sbin/rc2
o Set the TIMEZONE variable.
o Stop the print and NFS services.
o Stop the vold daemon.
o Mount local filesystems, enable disk quotas (as appropriate).
o Remove temporary files.
o Create new device entries if this is the result of a boot -r.
o Save a core file if enabled.
o Configure system accounting, (as appropriate).
o Set the default router.
o Set the NIS domain.
o Set up the network interfaces appropriately.
o Start inetd.
o Start named, if appopriate.
o Start rpcbind.
o Start kerbd (the Kerberos client daemon) if appropriate.
o Start ypbind or rpc.nisd as appropriate.
o Start keyserv.
o Start statd and lockd.
o Mount NFS filesystems from /etc/vfstab.
o Start the automounter.
o Start cron.
o Start lp daemons, as appropriate.
o Start sendmail.
* /sbin/rc3
o Clean up sharetab.
o Start nfsd and mountd.
o Start rarpd and rpc.bootparamd, as appropriate.
* /sbin/rc4 is usually not defined. It can be used in a non-default configuration to achieve a tailored run level.
* /sbin/rc5
o Kill print daemons.
o Unmount local file systems.
o Kill syslogd.
o Unmount NFS file systems.
o Stop NFS services.
o Stop NIS services.
o Stop RPC services.
o Stop cron services.
o Stop statd and lockd (NFS client services).
o Kill active processes.
o Initiate an interactive boot.
* /sbin/rc6
o Stop system services/daemons.
o Terminate running processes.
o Unmount all file systems.
o Boots to the initdefault level from the /etc/inittab
* /sbin/rcS: This run level differs from 1 in the following particulars:
o Minimal network is established.
o System name is set.
o root, /usr and /usr/kvm filesystems are checked and mounted (if necessary).
o Pseudo file systems proc and /dev/ are started.
o Rebuilds device entries (for reconfiguration reboots only).

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Tuesday, October 27, 2009

ZFS Cheat Sheet

Create simple striped pool:
zpool create [pool_name] [device] [device] ...
zpool create datapool c5t433127A900011C370000C00003210000d0 c5t433127B4001031250000900000540000d0

Create mirrored pool:
zpool create [pool_name] mirror [device] [device] ...
zpool create datapool mirror c5t433127A900011C370000C00003210000d0 c5t433127B4001031250000900000540000d0

Create Raid-Z pool:
zpool create [pool_name] raidz [device] [device] [device] ...
zpool create datapool raidz c5t433127A900011C370000C00003210000d0 c5t433127B4001031250000900000540000d0 c5t439257C4000019250000900000540000d0

Transform simple pool to a mirror:
zpool create [pool_name] [device]
zpool attach [pool_name] [existing_device] [new_device]
zpool create datapool c5t433127A900011C370000C00003210000d0
zpool attach datapool c5t433127A900011C370000C00003210000d0 c5t433127B4001031250000900000540000d0

Expand simple pool:
zpool create [pool_name] [device]
zpool add [pool_name] [new_device]
zpool create datapool c5t433127A900011C370000C00003210000d0
zpool add datapool c5t433127B4001031250000900000540000d0

Expand mirrored pool by attaching additional mirror:
zpool add [pool_name] mirror [new_device] [new_device]
zpool add datapool mirror c5t433127A900011C370000C00003460000d0 c5t433127B400011C370000C00003410000d0

Replace device in a pool:
zpool replace [pool_name] [old_device] [new_device]
zpool replace datapool c5t433127A900011C370000C00003410000d0 c5t433127B4001031250000900000540000d0

Destroy pool:
zpool destroy [pool_name]
zpool destroy datapool

Set pool mountpoint:
zfs set mountpoint=/path [pool_name]
zfs set mountpoint=/export/zfs datapool

Display configured pools:
zpool list
zpool list

Display pool status info:
zpool status [-v] [pool_name]
zpool status -v datapool

Display pool I/O statistics:
zpool iostat [pool_name]
zpool iostat datapool

Display pool command history:
zpool history [pool_name]
zpool history datapool

Export a pool:
zpool export [pool_name]
zpool export datapool

Import a pool:
zpool import [pool_name]
zpool import datapool

Create a filesystem:
zfs create [pool_name]/[fs_name]
zfs create datapool/filesystem

Destroy a filesystem:
zfs destroy [pool_name]/[fs_name]
zfs destroy datapool/filesystem

Rename a filesystem:
zfs rename [pool_name]/[fs_name] [pool_name]/[fs_name]
zfs rename datapool/filesystem datapool/newfilesystem

Move a filesystem:
zfs rename [pool_name]/[fs_name] [pool_name]/[fs_name]/[fs_name]
zfs rename datapool/filesystem datapool/users/filesystem

Display properties of a filesystem:
zfs get all [pool_name]/[fs_name]
zfs get all datapool/filesystem

Make a snapshot:
zfs snapshot [pool_name]/[fs_name]@[time]
zfs snapshot datapool/filesystem@friday

Roll back filesystem to its snapshot:
zfs rollback [pool_name]/[fs_name]@[time]
zfs rollback datapool/filesystem@friday

Clone a filesystem:
zfs snapshot [pool_name]/[fs_name]@[time]
zfs clone [pool_name]/[fs_name]@[time] [pool_name]/[fs_name]
zfs snapshot datapool/filesystem@today
zfs clone datapool/filesystem@today datapool/filesystemclone

Backup filesystem to a file:
zfs send [pool_name]/[fs_name] > /path/to/file
zfs send datapool/filesystem@friday > /tmp/filesystem.bkp

Restore filesystem from a file:
zfs receive [pool_name]/[fs_name] < /path/to/file zfs receive datapool/restoredfilesystem < /tmp/filesystem.bkp Create ZFS volume: zfs create -V [size] [pool_name]/[vol_name] zfs create -V 100mb datapool/zvolume newfs /dev/zvol/dsk/datapool/zvolume

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Tuesday, September 29, 2009

Clearing Sendmail queue

This one is quite old but still handy. For those of still using sendmail that ever felt the need of flushing the sendmail queue then this post is for you.

If you're worried about sendmail pending mail flush do the following two things:

1) manually method –> delete /var/spool/mail/*.* files in this dir –> delete /var/mqueue/*.* files

then check if all mail gone using mailq command. all mail will be deleted.

2) using command:

use simple command sendmail -v -q in root prompt. it will flush all pending mails.

3) if you want to delete a certain domain or user or recipient mail use this command

sendmail -qS -v test.com it will delete all mail from *@test.com

sendmail -qR -v hotmail.com it will delete all mail from recepient of hotmail….

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Tuesday, September 8, 2009

How to: Mounting an ISO image in Solaris

we can use the Loopback File driver to mount an ISO image without having to write the ISO image onto a CD or DVD.

Following procedure should help you mount an ISO image in Sun Solaris

Attach a Block Device

sunsolaris# lofiadm -a /export/software/iso_image.iso /dev/lofi/1

Mount the ISO Image block device

sunsolaris# mount -F hsfs -o ro /dev/lofi/1 /mnt

Where /mnt is the mount point.

This should mount the ISO image.

To confirm, change directory to /mnt and do a “ls” to lis the files

sunsolaris# cd /mnt

sunsolaris# ls

if at anytime, you want to look at these block devices simply type “lofiadm” command with no arguement.

sunsolaris# lofiadm
Block Device File
/dev/lofi/1 /export/software/iso_image.iso

When we are done with the files on the mounted ISO, we can unmount and detach the Block device we attached earlier:

sunsolaris# umount /mnt

sunsolaris# lofiadm -d /dev/lofi/1

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Thursday, September 3, 2009

How to: IPMP Load Balancing & Resilience in Solaris

P Multipathing (IPMP) in Sun Solaris enables the load balancing capabilities and resilience for Network connections with multple Network Interface Cards (NIC).

Discussed here is about providing resilience for network connections with multiple NICs on the system.

Now, we take it to the next step and make the network connections not only resilient but also load balance the network connections such that both the NICs participating in IPMP are active and forwards traffic. This improves the network throughput and thereby efficiency of the server especially if it is a critical system serving multiple connections.

The requirements to configure IPMP for load balancing are:

1. Two Virtual IP Addresses. These IPs are used by the Applications for data.

2. Test IP Address for each NIC. These IPs are not used by applications and are only used to prode a remote target device to check connectivity.

3. Each Interface has unique MAC-Address. By default in SPARC platforms, all NICs have a System-wide MAC-Address assigned and so they share a single MAC-Address.

The NICs doesn’t have to be of the same kind but have to be of the same speed (10/100/1000Mbps).

In our configuration,

192.168.1.99 – Virtual IP1

192.168.1.100 – Virtual IP2

192.168.1.101 – Test IP for ce0 (NIC1)

192.168.1.102 – Test IP for ce1 (NIC2)

appserver – Actual hostname

appserver-1 – Hostname for Data IP2

appserver-2 – Hostname for Data IP2

appserver-ce0 – Hostname for test IP on ce0 interface

appserver-ce1 – Hostname for test IP on ce1 interface

Add Host Entries in /etc/hosts

Let’s start with adding the hosts entries for the IP addresses in the /etc/hosts file.

# IPMP group appserver-ipmp
127.0.0.1 localhost
192.168.1.99 appserver-1 loghost
192.168.1.100 appserver-2 appserver loghost
192.168.1.101 appserver-ce0 loghost
192.168.1.102 appserver-ce1 loghost

We have configured a hostname for each of the Virtual IPs and the Test IPs. However, the Test IPs should not be used by applications for any network connections.

Create hostname.ce* files

For every interface on the system create a hostname.ce* file. For us, create the files

hostname.ce0 & hostname.ce1

Edit hostname.ce0

Add the following on the hostname.ce0 file. This is the primary or master interface of the IPMP Pair

appserver-ce0 netmask + broadcast + group appserver-ipmp deprecated -failover up \
addif appserver netmask + broadcast + failover up

where
netmask – assigns the default netmask

broadcast – assigns the default broadcast value

group – specifies the IPMP group

deprecated – indicate test Interface not be used for data transfer

-failover – makes the test interface not to failover

Now, the configuration is complete and an ifconfig output should look like as follows:

root@ appserver:/$
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ce0: flags=9040843 mtu 1500 index 2
inet 192.168.1.99 netmask ffffff00 broadcast 192.168.1.255
groupname appserver-ipmp
ether 0:xx:xx:xx:xx:x
ce0:1: flags=1000843 mtu 1500 index 2
inet 192.168.1.101 netmask ffffff00 broadcast 192.168.1.255
ce1: flags=69040843 mtu 1500 index 3
inet 192.168.1.100 netmask ffffff00 broadcast 192.168.1.255
groupname appserver-ipmp
ether 0:xx:xx:xx:xx:x
ce1:1: flags=1000843 mtu 1500 index 4

inet 192.168.1.102 netmask ffffff00 broadcast 192.168.1.255

Now, both the NICs will forward traffic and when one of the inerface fails, it transparently failover the virtual IP address onto the other active interface and you can see an interface “ce1:2″ would be created for the failed over IP. When the link is restored, this will be failed back to the ce0 interface. There should be no disruption to the network connections

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Friday, August 21, 2009

crontab: unexpected end of line

Whenever you modify your crontabe file, the error “Your “crontab” on server unexpected end of line. This entry has been ignored” is sent to the users email. This happens if there is a blank line in your crontab file.


For example, in the following crontab file there is a blank line between the last two cron jobs.

root@solaris# crontab -l
# The root crontab should be used to perform accounting data collection.
#
# The rtc command is run to adjust the real time clock if and when
# daylight savings time changes.
#
10 1 * * 0,4 /etc/cron.d/logchecker
10 2 * * 0 /usr/lib/newsyslog
15 3 * * 0 /usr/lib/fs/nfs/nfsfind

30 4 * * * /usr/local/bin/disk_check,sh
;;;;
;;;
;;
;

The solution for this problem is to edit the crontab file and look for the blank line and delete the line. In the above, after editing the crontab, it should look like the following:

root@solaris# crontab -l
# The root crontab should be used to perform accounting data collection.
#
# The rtc command is run to adjust the real time clock if and when
# daylight savings time changes.
#
10 1 * * 0,4 /etc/cron.d/logchecker
10 2 * * 0 /usr/lib/newsyslog
15 3 * * 0 /usr/lib/fs/nfs/nfsfind
30 4 * * * /usr/local/bin/disk_check,sh
;;;
;;
;

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Thursday, August 20, 2009

Find WWN (world wide name) in Solaris

First off, what is WWN anyway?

World Wide Name (WWN) are unique 8 byte (64-bit) identifiers in SCSI or fibre channel similar to that of MAC Addresses on a Network Interface Card (NIC).

Talking about the WWN names, there are also

World Wide port Name (WWpN), a WWN assigned to a port on a Fabric which is what you would be looking for most of the time.

World Wide node Name (WWnN), a WWN assigned to a node/device on a Fibre Channel fabric.

To find the WWN numbers of your HBA card in Sun Solaris, you can use one the following procedures

Using fcinfo (Solaris 10 only)

This is probably the easiest way to find the WWN numbers on your HBA card. Here you can see the HBA Port WWN (WWpN) and the Node WWN (WWnN) of the two ports on the installed Qlogic HAB card.

This is also useful in finding the Model number, Firmwar version FCode, supported and current speeds and the port status of the HBA card/port.



root@ sunserver:/root# fcinfo hba-port | grep WWN
HBA Port WWN: 2100001b32xxxxxx
Node WWN: 2000001b32xxxxxx
HBA Port WWN: 2101001b32yyyyyy
Node WWN: 2001001b32yyyyyy

For detailed info including Make & model number, Firmware, Fcode and current status and supported/current speeds then

root@ sunserver:/root# fcinfo hba-port
HBA Port WWN: 2100001b32xxxxxx
OS Device Name: /dev/cfg/c2
Manufacturer: QLogic Corp.
Model: 375-3356-02
Firmware Version: 4.04.01
FCode/BIOS Version: BIOS: 1.24; fcode: 1.24; EFI: 1.8;
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: 4Gb
Node WWN: 2000001b32xxxxxx
HBA Port WWN: 2101001b32yyyyyy
OS Device Name: /dev/cfg/c3
Manufacturer: QLogic Corp.
Model: 375-3356-02
Firmware Version: 4.04.01
FCode/BIOS Version: BIOS: 1.24; fcode: 1.24; EFI: 1.8;
Type: unknown
State: offline
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: not established
Node WWN: 2001001b32yyyyyy



Using scli



root@ sunserver:/root# scli -i | egrep �Node Name|Port Name�
Node Name : 20-00-00-1B-32-XX-XX-XX
Port Name : 21-00-00-1B-32-XX-XX-XX
Node Name : 20-01-00-1B-32-YY-YY-YY
Port Name : 21-01-00-1B-32-YY-YY-YY



For more detailed info on the HBA Cards run as follows: Similar to fcinfo but also provides Model Name and serial number.



root@ sunserver:/root# scli -i
��������������������������
Host Name : sunserver
HBA Model : QLE2462
HBA Alias :
Port : 1
Port Alias :
Node Name : 20-00-00-1B-32-XX-XX-XX
Port Name : 21-00-00-1B-32-XX-XX-XX
Port ID : 11-22-33
Serial Number : AAAAAAA-bbbbbbbbbb
Driver Version : qlc-20080514-2.28
FCode Version : 1.24
Firmware Version : 4.04.01
HBA Instance : 2
OS Instance : 2
HBA ID : 2-QLE2462
OptionROM BIOS Version : 1.24
OptionROM FCode Version : 1.24
OptionROM EFI Version : 1.08
OptionROM Firmware Version : 4.00.26
Actual Connection Mode : Point to Point
Actual Data Rate : 2 Gbps
PortType (Topology) : NPort
Total Number of Devices : 2
HBA Status : Online
��������������������������
Host Name : sunserver
HBA Model : QLE2462
HBA Alias :
Port : 2
Port Alias :
Node Name : 20-01-00-1B-32-YY-YY-YY
Port Name : 21-01-00-1B-32-YY-YY-YY
Port ID : 00-00-00
Serial Number : AAAAAAA-bbbbbbbbbb
Driver Version : qlc-20080514-2.28
FCode Version : 1.24
Firmware Version : 4.04.01
HBA Instance : 3
OS Instance : 3
HBA ID : 3-QLE2462
OptionROM BIOS Version : 1.24
OptionROM FCode Version : 1.24
OptionROM EFI Version : 1.08
OptionROM Firmware Version : 4.00.26
Actual Connection Mode : Unknown
Actual Data Rate : Unknown
PortType (Topology) : Unidentified
Total Number of Devices : 0
HBA Status : Loop down



Using prtconf

root@ sunserver:/root# prtconf -vp | grep -i wwn
port-wwn: 2100001b.32xxxxxx
node-wwn: 2000001b.32xxxxxx
port-wwn: 2101001b.32yyyyyy
node-wwn: 2001001b.32yyyyyy

Using prtpicl

root@ sunserver:/root# prtpicl -v | grep wwn
:node-wwn 20 00 00 1b 32 xx xx xx
:port-wwn 21 00 00 1b 32 xx xx xx
:node-wwn 20 01 00 1b 32 yy yy yy
:port-wwn 21 01 00 1b 32 yy yy yy



Using luxadm

Run the following command to obtain the physical path to the HBA Ports

root@ sunserver:/root$ luxadm -e port
/devices/pci@400/pci@0/pci@9/SUNW,qlc@0/fp@0,0:devctl CONNECTED
/devices/pci@400/pci@0/pci@9/SUNW,qlc@0,1/fp@0,0:devctl NOT CONNECTED



With the physical path obtained from the above command, we can trace the WWN numbers as follows. here I use the physical path to the one that is connected:

root@ sunserver:/root$ luxadm -e dump_map /devices/pci@400/pci@0/pci@9/SUNW,qlc@0/fp@0,0:devctl
Pos Port_ID Hard_Addr Port WWN Node WWN Type
0 123456 0 1111111111111111 2222222222222222 0�0 (Disk device)
1 789123 0 1111111111111111 2222222222222222 0�0 (Disk device)
2 453789 0 2100001b32xxxxxx 2000001b32xxxxxx 0�1f (Unknown Type,Host Bus Adapter)

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Wednesday, August 19, 2009

ZFS boot/root - backup and restore

Today we will look at backup and restore of a ZFS root pool using native ZFS tools.

The command “ufsdump” does not work on a ZFS file system. This should not surprise anybody.

“flashbackup” will supposedly work if you have all the right patches. I recommend avoiding it until Solaris 10 update 8 is released.

The command “zfs send” is used to create backup images.
The command “zfs receive” is used to restore from backup images.

Some of you may ask: “Do we need backup/restore if we have snapshots?” The answer is yes… if you are performing dangerous maintenance on a critical system, it may be wise to have more than one rollback plan.

Additionally, backup/restore may be useful when migrating an OS between servers, or shinking the size of a root pool.

And some of you will ask: “Can’t we just restore the OS from Netbackup?” The answer is yes… but it will take much longer than using native ZFS tools.

I am not advocating that “zfs send” and “zfs receive” be a replacement for regularly scheduled Netbackup backups; instead I recommend that these commands be used when there is a high probability that a restore of a ZFS root pool will be required.

Backup a ZFS root pool

ZFS backups are done from snapshots. This ensures “single point in time” consistency.

It is simple to recursively create snapshots of all datasets in the root pool with a single command:

# SNAPNAME=`date +%Y%m%d`

# zfs snapshot -r rpool@$SNAPNAME


Backups can be saved to local disk, remote disk, tape, DVD, punch cards, etc. I recommend using an NFS server.

It is possible to backup all the snapshots with a single command. But I don’t recommend this unless you wish to have the contents of the swap and dump devices included in the backup. Backups of swap and dump can be avoided by splitting the backup into four separate commands.


Start with a non-recursive backup of the top level dataset (rpool); then perform recursive “replication stream” backups of rpool/ROOT, rpool/home, and rpool/tools.


You might wish to embed the hostname and date in your filenames, but for this example I will use simple filenames.


# zfs send -v rpool@$SNAPNAME > /net/NFS_SERVER/BACKUP_DIR/rpool.zfs_snap

# zfs send -Rv rpool/ROOT@$SNAPNAME > /net/NFS_SERVER/BACKUP_DIR/root.zfs_snap

# zfs send -Rv rpool/tools@$SNAPNAME > /net/NFS_SERVER/BACKUP_DIR/tools.zfs_snap

# zfs send -Rv rpool/home@$SNAPNAME > /net/NFS_SERVER/BACKUP_DIR/home.zfs_snap


Now list the contents of the backup directory… we should see four backup files.


# ls –l /net/NFS_SERVER/BACKUP_DIR/

total 4957206

-rw-r--r-- 1 root root 2369414848 Aug 12 15:45 root.zfs_snap

-rw-r--r-- 1 root root 690996 Aug 12 15:46 home.zfs_snap

-rw-r--r-- 1 root root 92960 Aug 12 15:43 rpool.zfs_snap

-rw-r--r-- 1 root root 166600992 Aug 12 15:46 tools.zfs_snap


The OS backup is now complete.


You may wish to view and record the size of the swap and dump datasets for future use.


# zfs get volsize rpool/dump rpool/swap

NAME PROPERTY VALUE SOURCE

rpool/dump volsize 1G -

rpool/swap volsize 8G -


Also, if there are multiple boot environments it is a good idea to record which one is currently in use.


# df -k /

Filesystem 1024-blocks Used Available Capacity Mounted on

rpool/ROOT/blue 70189056 1476722 58790762 3% /


#############################################################################

Pick a server and boot it from cdrom or network


Pick a server to restore to.


Depending on your requirements, you can restore to the original server or to a different server.

If you restore to a different server, the CPU architecture must match that of the original server.

e.g. don’t try to restore a backup from a sun4v server to a sun4u server.

In my testing I was able to recover from a 480R to a V210 without any problems.


Boot the server from the network or cdrom using a recent release of Solaris.


ok> boot net


Wait for the server to boot.


Follow the prompts to exit the installation program (usually Esc-2, Esc2, Esc-2, Esc-5, Esc2).

##############################################


Prepare the disks

Pick a pair of boot disks. The disks do not need to be the same size as the disks on the original system.


If you are using the original disks on the original system you can skip this entire section and jump to the restore.


If you have an x86 system, you need to first make sure there is a valid fdisk partition with type “Solaris 2”; it needs to be marked as “active”.


Here is a sample fdisk layout:


Partition Status Type Start End Length %

========= ====== ============ ===== === ====== ===

1 Active Solaris2 1 8923 8923 100


Regardless of whether you are using a sparc or x86 system, the disks must be labelled with SMI labels and a valid VTOC.


Do not use EFI labels!


Keep in mind that:

When you create a VTOC on a sparc system it applies to the entire disk.

When you create a VTOC on an x86 system it applies to the first fdisk partition of type “Solaris 2”. i.e. in the x86 world a disk is not a disk.


Only one slice is required on each disk… generally it is recommended to use slice #0.


Usually it will make sense to use the entire disk, but if you don’t wish to have your root pool occupying the entire disk then size slice #0 to fit your needs. If you have an x86 system, you should avoid cylinder 0.


Here is a sample VTOC for a sparc server:

Part Tag Flag Cylinders Size Blocks

0 root wm 0 - 14086 68.35GB (14087/0/0) 143349312

1 unassigned wu 0 0 (0/0/0) 0

2 backup wu 0 - 14086 68.35GB (14087/0/0) 143349312

3 unassigned wm 0 0 (0/0/0) 0

4 unassigned wm 0 0 (0/0/0) 0

5 unassigned wm 0 0 (0/0/0) 0

6 unassigned wm 0 0 (0/0/0) 0

7 unassigned wm 0 0 (0/0/0) 0


Here is a sample VTOC for an x86 server.

Part Tag Flag Cylinders Size Blocks

0 root wm 1 - 8920 68.33GB (8920/0/0) 143299800

1 unassigned wm 0 0 (0/0/0) 0

2 backup wu 0 - 8920 68.34GB (8921/0/0) 143315865

3 unassigned wm 0 0 (0/0/0) 0

4 unassigned wm 0 0 (0/0/0) 0

5 unassigned wm 0 0 (0/0/0) 0

6 unassigned wm 0 0 (0/0/0) 0

7 unassigned wm 0 0 (0/0/0) 0

8 boot wu 0 - 0 7.84MB (1/0/0) 16065

9 unassigned wm 0 0 (0/0/0) 0


Install the bootblocks on both disks


For sparc systems run:


# installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t2d0s0

# installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t3d0s0

-

For x86 systems run:


# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c0t2d0s0

# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c0t3d0s0


Configure the server to use one of the new disks as the boot disk.


For sparc systems this can be done using luxadm(1M).


# luxadm set_boot_dev /dev/dsk/c0t2d0s0


##############################################

Create a new pool and perform the restore


Mount the file system where the backup images are located:


e.g.

# mount -o ro NFS_SERVER:BACKUP_DIR /mnt

# ls -1 /mnt

be.zfs_snap

home.zfs_snap

rpool.zfs_snap

tools.zfs_snap


Create a new pool using the syntax shown here. Change only the disk names.

And make sure to include the slices in the names. e.g. c0t2d0s0 and not c0t2d0.

If you only have one disk, the “mirror” keyword must be dropped.


But don’t drop the “mirror” keyword if you specify more than one disk or you will hit problems later on.


# zpool create -f -o failmode=continue -R /a -m legacy -o cachefile=/etc/zfs/zpool.cache rpool mirror c0t2d0s0 c0t3d0s0


Start the restore; the datasets will be automatically created.

# zfs receive -Fd rpool < /mnt/rpool.zfs_snap

# zfs receive -Fd rpool < /mnt/root.zfs_snap

# zfs receive -Fd rpool < /mnt/home.zfs_snap

# zfs receive -Fd rpool < /mnt/tools.zfs_snap


Optionally you may wish to view the recovered datasets


# zfs list -t filesystem -o name,mountpoint,mounted

NAME MOUNTPOINT MOUNTED

rpool legacy no

rpool/ROOT legacy no

rpool/ROOT/blue /a yes

rpool/ROOT/blue/var /a/var yes

rpool/home /a/home yes

rpool/tools none no

rpool/tools/marimba /a/opt/Marimba yes

rpool/tools/openv /a/usr/openv yes

rpool/tools/bmc /opt/bmc yes


Create datasets for swap and dump.


# zfs create -V 8G -b 8k rpool/swap

# zfs create -V 1G rpool/dump


Set the default boot envionment

# zpool set bootfs=rpool/ROOT/blue rpool


Note: if you created the pool with multiple disk devices and forgot to specify the “mirror” keyword, this command will fail.


Note: if you have EFI labels on the boot disks this command will fail.


Note: if you followed the instructions above, then everything should work perfectly!


It is not necessary to rebuild the device entries prior to rebooting, even if you are migrating to completely different hardware!


In fact, the system will boot fine even if /dev/dsk and /dev/rdsk are empty directories.

If the restore is part of a migration, you may safely edit /a/etc/nodename, /a/etc/hostname*, /a/etc/inet/hosts, etc. Otherwise move on to the next step.


###########################################################

Cross Fingers, reboot, and pray


# touch /a/reconfigure

# reboot


Wait for the server to reboot. The operating system should come up looking like it did before.


If you have recovered to completely different hardware, you may need to modify the network interface files (/etc/hostname.*) to match the network devices. If necessary, all the network devices can be temporarily plumbed by running “ifconfig –a plumb”


Finally, you may notice that the snapshots still exist. They can be recursively removed at your convience.


e.g.

# zfs list -t snapshot

NAME USED AVAIL REFER MOUNTPOINT

rpool@20090813 17K - 35.5K -

rpool/ROOT@20090813 0 - 18K -

rpool/ROOT/blue@20090813 5.07M - 1.41G -

rpool/ROOT/blue/var@20090813 634K - 312M -

rpool/home@20090813 0 - 447K -

rpool/tools@20090813 0 - 18K -

rpool/tools/marimba@20090813 353K - 125M -

rpool/tools/openv@20090813 0 - 27.1M -


# zfs destroy -r rpool@20090813


###########################################################

Reward yourself for pulling off an OS recovery in 15 minutes flat.

i.e go have a an ice cream.

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Tuesday, August 18, 2009

ZuiTube - Youtube for kids


Yes, you read it right. There's a new video site catered for our little ones and maybe a bunch of us :). First there was Kidzui, the company behind the child-safe web browser of the same name anad now they did it again with its online video experience with ZuiTube, a kid-friendly video destination site.



The main tag lines for ZuiTube are "Play, Laugh, Learn and Share". They boasts of having the largest video collection for kids anywhere, Videos approved by parents and teachers, Channels created by kids and editors,TV mode that plays all videos,search with suggestions and KidRank.

ZuiTube, like Kidzui is free to use. The company makes its money by selling premium memberships that give parents deeper controls and kids more personalization tools and virtual goods.

Other players in the independent kid vid space include Totlol, which had to put up a registration wall in order to stay in business, and Kideo, which is a browser-based video player that pulls up random clips on a continuous cycle.

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Thursday, August 13, 2009

OpenSSH Server Best Security Practices

OpenSSH is a FREE version of the SSH connectivity tools that technical users of the Internet rely on. Users of telnet, rlogin, and ftp may not realize that their password is transmitted across the Internet unencrypted, but it is. OpenSSH encrypts all traffic (including passwords) to effectively eliminate eavesdropping, connection hijacking, and other attacks. Additionally, OpenSSH provides secure tunneling capabilities and several authentication methods, and supports all SSH protocol versions.

Currently, almost all communications in computer networks are done without encryption. As a consequence, anyone who has access to any machine connected to the network can listen in on any communication. This is being done by hackers, curious administrators, employers, criminals, industrial spies, and governments. Some networks leak off enough electromagnetic radiation that data may be captured even from a distance.

When you log in, your password goes in the network in plain text. Thus, any listener can then use your account to do any evil he likes. Many incidents have been encountered worldwide where crackers have started programs on workstations without the owner's knowledge just to listen to the network and collect passwords. Programs for doing this are available on the Internet, or can be built by a competent programmer in a few hours.

Businesses have trade secrets, patent applications in preparation, pricing information, subcontractor information, client data, personnel data, financial information, etc. Currently, anyone with access to the network (any machine on the network) can listen to anything that goes in the network, without any regard to normal access restrictions.

Many companies are not aware that information can so easily be recovered from the network. They trust that their data is safe since nobody is supposed to know that there is sensitive information in the network, or because so much other data is transferred in the network. This is not a safe policy.

Before implementing these here are the config files and locations:

Default Config Files and SSH Port

* /etc/ssh/sshd_config - OpenSSH server configuration file.
* /etc/ssh/ssh_config - OpenSSH client configuration file.
* ~/.ssh/ - Users ssh configuration directory.
* ~/.ssh/authorized_keys or ~/.ssh/authorized_keys - Lists the public keys (RSA or DSA) that can be used to log into the user’s account
* /etc/nologin - If this file exists, sshd refuses to let anyone except root log in.
* /etc/hosts.allow and /etc/hosts.deny : Access controls lists that should be enforced by tcp-wrappers are defined here.
* SSH default port : TCP 22

Here are Some best OpenSSH Server security practice that you can implement:

#1: Disable OpenSSH Server

Workstations and laptop can work without OpenSSH server. If you need not to provide the remote login and file transfer capabilities of SSH, disable and remove the SSHD server. CentOS / RHEL / Fedora Linux user can disable and remove openssh-server with yum command:

# chkconfig sshd off
# yum erase openssh-server

Debian / Ubuntu Linux user can disable and remove the same with apt-get command:

# apt-get remove openssh-server

You may need to update your iptables script to remove ssh exception rule. Under CentOS / RHEL / Fedora edit the files /etc/sysconfig/iptables and /etc/sysconfig/ip6tables.

Once done restart iptables service:

# service iptables restart
# service ip6tables restart

#2: Only Use SSH Protocol 2

SSH protocol version 1 (SSH-1) has man-in-the-middle attacks problems and security vulnerabilities. SSH-1 is obsolete and should be avoided at all cost. Open sshd_config file and make sure the following line exists:

Protocol 2

#3: Limit Users' SSH Access

By default all systems user can login via SSH using their password or public key. Sometime you create UNIX / Linux user account for ftp or email purpose. However, those user can login to system using ssh. They will have full access to system tools including compilers and scripting languages such as Perl, Python which can open network ports and do many other fancy things. One of my client has really outdated php script and an attacker was able to create a new account on the system via a php script. However, attacker failed to get into box via ssh because it wasn't in AllowUsers.

Only allow root, vivek and jerry user to use the system via SSH, add the following to sshd_config:

AllowUsers root vivek jerry

Alternatively, you can allow all users to login via SSH but deny only a few users, with the following line:

DenyUsers saroj anjali foo

You can also configure Linux PAM allows or deny login via the sshd server. You can allow list of group name to access or deny access to the ssh.

#4: Configure Idle Log Out Timeout Interval

User can login to server via ssh and you can set an idel timeout interval to avoid unattended ssh session. Open sshd_config and make sure following values are configured:

ClientAliveInterval 300
ClientAliveCountMax 0

You are setting an idle timeout interval in seconds (300 secs = 5 minutes). After this interval has passed, the idle user will be automatically kicked out (read as logged out).See how to automatically log BASH / TCSH / SSH users out after a period of inactivity for more details.

#5: Disable .rhosts Files

Don't read the user's ~/.rhosts and ~/.shosts files. Update sshd_config with the following settings:

IgnoreRhosts yes

SSH can emulate the behavior of the obsolete rsh command, just disable insecure access via RSH.

#6: Disable Host-Based Authentication

To disable host-based authentication, update sshd_config with the following option:

HostbasedAuthentication no

#7: Disable root Login via SSH

There is no need to login as root via ssh over a network. Normal users can use su or sudo (recommended) to gain root level access. This also make sure you get full auditing information about who ran privileged commands on the system via sudo. To disable root login via SSH, update sshd_config with the following line:

PermitRootLogin no

#8: Enable a Warning Banner

Set a warning banner by updating sshd_config with the following line:

Banner /etc/issue


#8: Firewall SSH Port # 22

You need to firewall ssh port # 22 by updating iptables or pf firewall configurations. Usually, OpenSSH server must only accept connections from your LAN or other remote WAN sites only.

Netfilter (Iptables) Configuration

Update /etc/sysconfig/iptables (Redhat and friends specific file) to accept connection only from 192.168.1.0/24 and 202.54.1.5/29, enter:

-A RH-Firewall-1-INPUT -s 192.168.1.0/24 -m state --state NEW -p tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -s 202.54.1.5/29 -m state --state NEW -p tcp --dport 22 -j ACCEPT

If you've dual stacked sshd with IPv6, edit /etc/sysconfig/ip6tables (Redhat and friends specific file), enter:

-A RH-Firewall-1-INPUT -s ipv6network::/ipv6mask -m tcp -p tcp --dport 22 -j ACCEPT

Replace ipv6network::/ipv6mask with actual IPv6 ranges.

*BSD PF Firewall Configuration

If you are using PF firewall update /etc/pf.conf as follows:

pass in on $ext_if inet proto tcp from {192.168.1.0/24, 202.54.1.5/29} to $ssh_server_ip port ssh flags S/SA synproxy state

#9: Change SSH Port and Limit IP Binding

By default SSH listen to all available interfaces and IP address on the system. Limit ssh port binding and change ssh port (by default brute forcing scripts only try to connects to port # 22). To bind to 192.168.1.5 and 202.54.1.5 IPs and to port 300, add or correct the following line:

Port 300
ListenAddress 192.168.1.5
ListenAddress 202.54.1.5

A better approach to use proactive approaches scripts such as fail2ban or denyhosts (see below).

#10: Use Strong SSH Passwords and Passphrase

It cannot be stressed enough how important it is to use strong user passwords and passphrase for your keys. Brute force attack works because you use dictionary based passwords. You can force users to avoid passwords against a dictionary attack and use john the ripper tool to find out existing weak passwords. Here is a sample random password generator (put in your ~/.bashrc):

genpasswd() {
local l=$1
[ "$l" == "" ] && l=20
tr -dc A-Za-z0-9_ < /dev/urandom | head -c ${l} | xargs
}


Run it:

genpasswd 16

Output:

uw8CnDVMwC6vOKgW

#11: Use Public Key Based Authentication

Use public/private key pair with password protection for the private key. See how to use RSA and DSA key based authentication. Never ever use passphrase free key (passphrase key less) login.

#12: Use Keychain Based Authentication

keychain is a special bash script designed to make key-based authentication incredibly convenient and flexible. It offers various security benefits over passphrase-free keys.

#13: Chroot SSHD (Lock Down Users To Their Home Directories)

By default users are allowed to browse the server directories such as /etc/, /bin and so on. You can protect ssh, using os based chroot or use special tools such as rssh. With the release of OpenSSH 4.8p1 or 4.9p1, you no longer have to rely on third-party hacks such as rssh or complicated chroot(1) setups to lock users to their home directories.

#14: Use TCP Wrappers

TCP Wrapper is a host-based Networking ACL system, used to filter network access to Internet. OpenSSH does supports TCP wrappers. Just update your /etc/hosts.allow file as follows to allow SSH only from 192.168.1.2 172.16.23.12 :

sshd : 192.168.1.2 172.16.23.12


#15: Disable Empty Passwords

You need to explicitly disallow remote login from accounts with empty passwords, update sshd_config with the following line:

PermitEmptyPasswords no

#16: Thwart SSH Crackers (Brute Force Attack)

Brute force is a method of defeating a cryptographic scheme by trying a large number of possibilities using a single or distributed computer network. To prevents brute force attacks against SSH, use the following softwares:

* DenyHosts is a Python based security tool for SSH servers. It is intended to prevent brute force attacks on SSH servers by monitoring invalid login attempts in the authentication log and blocking the originating IP addresses.
* Explains how to setup DenyHosts under RHEL / Fedora and CentOS Linux.
* Fail2ban is a similar program that prevents brute force attacks against SSH.
* security/sshguard-pf protect hosts from brute force attacks against ssh and other services using pf.
* security/sshguard-ipfw protect hosts from brute force attacks against ssh and other services using ipfw.
* security/sshguard-ipfilter protect hosts from brute force attacks against ssh and other services using ipfilter.
* security/sshblock block abusive SSH login attempts.
* security/sshit checks for SSH/FTP bruteforce and blocks given IPs.
* BlockHosts Automatic blocking of abusive IP hosts.
* Blacklist Get rid of those bruteforce attempts.
* Brute Force Detection A modular shell script for parsing application logs and checking for authentication failures. It does this using a rules system where application specific options are stored including regular expressions for each unique auth format.
* IPQ BDB filter May be considered as a fail2ban lite.

#17: Rate-limit Incoming Port # 22 Connections

Both netfilter and pf provides rate-limit option to perform simple throttling on incoming connections on port # 22.

Iptables Example

The following example will drop incoming connections which make more than 5 connection attempts upon port 22 within 60 seconds:

#!/bin/bash
inet_if=eth1
ssh_port=22
$IPT -I INPUT -p tcp --dport ${ssh_port} -i ${inet_if} -m state --state NEW -m recent --set
$IPT -I INPUT -p tcp --dport ${ssh_port} -i ${inet_if} -m state --state NEW -m recent --update --seconds 60 --hitcount 5 -j DROP


Call above script from your iptables scripts. Another config option:

$IPT -A INPUT -i ${inet_if} -p tcp --dport ${ssh_port} -m state --state NEW -m limit --limit 3/min --limit-burst 3 -j ACCEPT
$IPT -A INPUT -i ${inet_if} -p tcp --dport ${ssh_port} -m state --state ESTABLISHED -j ACCEPT
$IPT -A OUTPUT -o ${inet_if} -p tcp --sport ${ssh_port} -m state --state ESTABLISHED -j ACCEPT
# another one line example
# $IPT -A INPUT -i ${inet_if} -m state --state NEW,ESTABLISHED,RELATED -p tcp --dport 22 -m limit --limit 5/minute --limit-burst 5-j ACCEPT

See iptables man page for more details.

*BSD PF Example

The following will limits the maximum number of connections per source to 20 and rate limit the number of connections to 15 in a 5 second span. If anyone breaks our rules add them to our abusive_ips table and block them for making any further connections. Finally, flush keyword kills all states created by the matching rule which originate from the host which exceeds these limits.

sshd_server_ip="202.54.1.5"
table persist
block in quick from
pass in on $ext_if proto tcp to $sshd_server_ip port ssh flags S/SA keep state (max-src-conn 20, max-src-conn-rate 15/5, overload flush)

#18: Use Port Knocking

Port knocking is a method of externally opening ports on a firewall by generating a connection attempt on a set of prespecified closed ports. Once a correct sequence of connection attempts is received, the firewall rules are dynamically modified to allow the host which sent the connection attempts to connect over specific port(s). A sample port Knocking example for ssh using iptables:

$IPT -N stage1
$IPT -A stage1 -m recent --remove --name knock
$IPT -A stage1 -p tcp --dport 3456 -m recent --set --name knock2

$IPT -N stage2
$IPT -A stage2 -m recent --remove --name knock2
$IPT -A stage2 -p tcp --dport 2345 -m recent --set --name heaven

$IPT -N door
$IPT -A door -m recent --rcheck --seconds 5 --name knock2 -j stage2
$IPT -A door -m recent --rcheck --seconds 5 --name knock -j stage1
$IPT -A door -p tcp --dport 1234 -m recent --set --name knock

$IPT -A INPUT -m --state ESTABLISHED,RELATED -j ACCEPT
$IPT -A INPUT -p tcp --dport 22 -m recent --rcheck --seconds 5 --name heaven -j ACCEPT
$IPT -A INPUT -p tcp --syn -j doo

* fwknop is an implementation that combines port knocking and passive OS fingerprinting.
* Multiple-port knocking Netfilter/IPtables only implementation.

#19: Use Log Analyzer

These tools make your log reading life easier. It will go through your logs for a given period of time and make a report in the areas that you wish with the detail that you wish. Make sure LogLevel is set to INFO or DEBUG in sshd_config:

LogLevel INFO

#20: Patch OpenSSH and Operating Systems

It is recommended that you use tools such as yum, apt-get, freebsd-update and others to keep systems up to date with the latest security patches.

Other Options

To hide openssh version, you need to update source code and compile openssh again. Make sure following options are enabled in sshd_config:

# Turn on privilege separation
UsePrivilegeSeparation yes

# Prevent the use of insecure home directory and key file permissions
StrictModes yes

# Turn on reverse name checking
VerifyReverseMapping yes

# Do you need port forwarding?
AllowTcpForwarding no
X11Forwarding no

# Specifies whether password authentication is allowed. The default is yes.
PasswordAuthentication no

Verify your sshd_config file before restarting / reloading changes:
# /usr/sbin/sshd -t

Tighter SSH security with two-factor or three-factor (or more) authentication.

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Thursday, August 6, 2009

How to kill defunct processes

If you have been an system administrator for Solaris for sometime you should be familiar with such processes. Basically a defunct or much more known as a zombie process is a process that has completed execution but still has an entry in the process table, this entry being still needed to allow the process that started the zombie process to read its exit status. The term zombie process derives from the common definition of zombie—an undead person. In the term's colorful metaphor, the child process has died but has not yet been reaped.

Whenever I encounter this zombies I normally would kill the parent process that spawned it and much worst if this doesn't work I might have to restart the whole server. Well I've found another option ( befroe you start rebooting the machine).

There's a little less known command to try and kill these defunct/zombie process and that is the command preap.

From the man pages:

NAME
preap - force a defunct process to be reaped by its parent

SYNOPSIS
preap [-F] pid...

DESCRIPTION
A defunct (or zombie) process is one whose exit status has
yet to be reaped by its parent. The exit status is reaped
via the wait(3C), waitid(2), or waitpid(3C) system call. In
the normal course of system operation, zombies may occur,
but are typically short-lived. This may happen if a parent
exits without having reaped the exit status of some or all
of its children. In that case, those children are reparented
to PID 1. See init(1M), which periodically reaps such
processes.

An irresponsible parent process may not exit for a very long
time and thus leave zombies on the system. Since the operat-
ing system destroys nearly all components of a process
before it becomes defunct, such defunct processes do not
normally impact system operation. However, they do consume a
small amount of system memory.

preap forces the parent of the process specified by pid to
waitid(3C) for pid, if pid represents a defunct process.

preap will attempt to prevent the administrator from
unwisely reaping a child process which might soon be reaped
by the parent, if:

o The process is a child of init(1M).

o The parent process is stopped and might wait on the
child when it is again allowed to run.

o The process has been defunct for less than one minute.

So to kill a defunct process you can try:

server# ps -ef| grep -i defunct
oracle 23650 22802 0 - ? 0:01
oracle 23657 22802 0 - ? 0:01
oracle 23580 22802 0 - ? 0:00
oracle 23924 16560 0 - ? 0:00
oracle 23750 22802 0 - ? 0:01
oracle 23928 16363 0 - ? 0:00
oracle 23915 17114 0 - ? 0:00
oracle 23940 20910 0 - ? 0:00
oracle 23863 21896 0 - ? 0:00


server# ps -ef| grep -i defunct |awk {'print $2'}|xargs preap
23650: killed by signal KILL
23657: killed by signal KILL
23580: killed by signal KILL
23924: exited with status 141
23750: killed by signal KILL
23928: exited with status 141
23915: exited with status 141
23940: killed by signal KILL
23863: killed by signal KILL
23921: exited with status 141
23912: exited with status 141
23652: killed by signal KILL
23889: exited with status 141
23752: killed by signal KILL
23931: exited with status 141
23925: exited with status 141
23936: exited with status 141
23916: exited with status 141
23784: killed by signal KILL
23744: killed by signal KILL
23656: killed by signal KILL
preap: cannot examine 6343: no such process
23926: exited with status 141
23651: killed by signal KILL
23631: killed by signal KILL
23922: exited with status 141
23654: killed by signal KILL
23781: killed by signal KILL
23933: exited with status 141
23923: exited with status 141
23790: killed by signal KILL
24011: exited with status 141
23938: exited with status 141
23634: killed by signal KILL
23907: exited with status 141
23864: exited with status 141
23908: exited with status 141
23883: exited with status 141
23812: killed by signal KILL
23765: killed by signal KILL
23906: exited with status 141
23910: exited with status 141
23871: exited with status 141
23653: killed by signal KILL
23902: killed by signal KILL
23782: killed by signal KILL
23743: killed by signal KILL
server# ps -ef| grep -i defunct

If this still fails to kill them then you have to restart the mother process or the server itself.

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Saturday, August 1, 2009

ZFS Tip: Comparison of SVM mirroring and ZFS mirroring

Most of us are familiar with SVM (a.k.a. Solaris Volume Manager, Disksuite, Onlne Disksuite, ODS, Solstice Disksuite, SDS, etc)

Under SVM we build "metadevices" from one or more disks. We can then mirror equal sized metadevices to create a "metamirror".

In this example we have two 30GB metadevices, one happens to be a concatenation of dissimilar disks, the other is a striped over equal sized disks.

The two metadevices are mirrored together to create a 30GB metamirror.
If any disk fails, the entire submirror will go offline but the metamirror will remain online.

-------------------------------------------------------------------------------|
|30GB metamirror |
| ----------------------------------------------------------- |
| |30GB submirror built from concatenation of two disks | |
| | ------------- ------------ | |
| | | 10G | | 20G | | |
| | | disk | + | disk | | |
| | ------------- ------------ | |
| ----------------------------------------------------------- |
| |
| ----------------------------------------------------------- |
| |30G submirror striped over three 10GB disks | |
| | ------------- ----------- ------------ | |
| | | 10G | | 10 | | 10G | | |
| | | disk | : | disk | : | disk | | |
| | ------------- ------------ ------------ | |
| ----------------------------------------------------------- |
| |
--------------------------------------------------------------------------------


In a ZFS pool we mirror inside the vdevs.


In the example below, we have a 45GB ZFS pool. Data in the pool is spread dynamically over three independent vdevs.

Two of the vdevs are mirrored, the third is unmirrored.

There is no requirement for the vdevs to be the same size.

The disks (or partitions) within a single vdev should be the same size or some disk space will go to waste.

If a disk in one of the mirrored vdevs fails, the vdev will enter an unmirrored state but there will be no loss of data.

If the disk in the unmirrored vdev fails, the entire vdev will fail and this will cause the entire pool to fail.
----------------------------------------------------------------------------------------------
|45GB ZFS pool |
| ---------------------- ---------------------- ----------------------- |
| | 10G mirrored vdev | | 20G mirrored vdev | | 15G unmirrored vdev | |
| | ------------- | | ------------- | | | |
| | | 10G | | | | 20G | | | ------------ | |
| | | disk | | | | disk | | | | 15G | | |
| | ------------- | : | ------------- | : | | disk | | |
| | ------------- | | ------------- | | ------------ | |
| | | 10G | | | | 20G | | | | |
| | | disk | | | | disk | | | | |
| | ------------- | | ------------- | | | |
| ---------------------- ---------------------- ----------------------- |
| |
----------------------------------------------------------------------------------------------

Notes:
In the real world we would probably not create a pool where some vdevs are mirrored and some are not.

If we have more than three disks in a vdev we have a choice of a "three-way mirror" or we could use RAIDZ. RAIDZ is an improvement over RAID-5.

If we are using LUNs from the SAN which are already built in a RAID array, there may be little value in using mirroring or RAIDZ within a vdev.

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Friday, July 31, 2009

ZFS Datasets and "zfs list"

Any of you who have looked at the zfs(1M) man page will have come across the term "dataset".

A dataset can be:
a) a file system
b) a volume
c) a snapshot

Every time we create a ZFS file system we are actually creating a dataset with a setting of "type=filesystem".

Every pool starts out with a single dataset with a name that is the same as the pool name.

E.g. When we create a pool called "ttt" we automatically have a dataset called "ttt" which by default has a mounted file system called /ttt.

We can change mount point but we can not change the name of this dataset.
Every time we add a new dataset to a pool it must be a "child" of an existing dataset.
E.g. A new dataset called "ttt/xxx" can be created which is a "descendant" of the "ttt" dataset.

We can then create "ttt/xxx/yyy" which is a descendant of both "ttt" and "ttt/xxx"

It is not possible to create a dataset called "ttt/xxx/yyy" if "ttt/xxx" does not already exist.

All of the datasets we have created thus far have been mounted file systems; today won't be any different.

We will look at volumes and snapshots another day.

Today we will look at the "zfs list" command. This command is used to list ZFS datasets.


First create some temp files and a new pool:
# mkfile 119M /tmp/file1
# mkfile 119M /tmp/file2
# zpool create ttt /tmp/file1 /tmp/file2

And let's create a 50MB file in our new /ttt file system.

# mkfile 50M /ttt/50_meg_of_zeros

Now run "df -h" to see that 50MB is used.

# df -h /ttt
Filesystem Size Used Available Capacity Mounted on
ttt 196M 50M 146M 26% /ttt

If you don't see 50M under the "Used" column, try running the command again.
There may be a time lapse between creating the 50_meg_of_zeros file and having "df" report 50MB of data in the file system.

Now let's use "zfs list" to get additional information about the "ttt" dataset.

# zfs list ttt
NAME USED AVAIL REFER MOUNTPOINT
ttt 50.2M 146M 50.0M /ttt

Note that the information provided is similar to the data provided by "df -h".
The "REFER" column shows us how much data is used by this data set.

The "USED" column refers to the amount of data used by this dataset and all the descendants of this dataset.

Since we only have a top level dataset with no descendants, the two values are roughly equal. Small discrepancies are generally due to overhead.

The "AVAIL" column shows the amount of available space in the dataset, which (not coincidentally) is equal to the free space in the pool.

Now create some file systems… this syntax works on Solaris 10 08/07 with recent patches.

# zfs create -o mountpoint=/apps_test ttt/apps # descendant of ttt
# zfs create -o mountpoint=/work_test ttt/work # descendant of ttt
# zfs create -o mountpoint=/aaa ttt/work/aaa # descendant of both ttt and ttt/work
# zfs create -o mountpoint=/bbb ttt/work/bbb # descendant of both ttt and ttt/work

If you are using an older version of Solaris 10 you may need to use the following syntax to achieve the same thing:

# zfs create ttt/apps # descendant of ttt
# zfs set mountpoint=/apps_test ttt/apps
# zfs create ttt/work # descendant of ttt
# zfs set mountpoint=/work_test ttt/work
# zfs create ttt/work/aaa # descendant of both ttt and ttt/work
# zfs set mountpoint=/aaa ttt/work/aaa
# zfs create ttt/work/bbb
# zfs set mountpoint=/bbb ttt/work/bbb # descendant of both ttt and ttt/work

Regardless of which syntax you used to create the file systems, let's move on and create another 50MB file in one of our file systems.

# mkfile 50M /aaa/50_meg_of_zeros

# df -h | egrep 'ttt|Filesystem'
Filesystem Size Used Available Capacity Mounted on
ttt 196M 50M 96M 35% /ttt
ttt/apps 196M 24K 96M 1% /apps_test
ttt/work 196M 24K 96M 1% /work_test
ttt/work/aaa 196M 50M 96M 35% /aaa
ttt/work/bbb 196M 24K 96M 1% /bbb

Now we have two file systems ("/ttt" and "/aaa") each showing a utilization of 50GB. Nothing surprising so far.

Note you could use "df -F zfs -h"… but that will show all zfs file systems on the system. The "egrep" syntax used above limits us to the file systems that are part of the "ttt" pool.

Now lets rerun "zfs list ttt"

# zfs list ttt
NAME USED AVAIL REFER MOUNTPOINT
ttt 100M 95.6M 50.0M /ttt

Note that the "REFER" column is still showing 50MB because /ttt still contains only one 50MB file.

But the "USED" column now shows 100MB. Remember, the USED column represents the amount of data in the dataset and all of the descendants of the dataset.

We have 50MB in "ttt" (mounted under /ttt) and 50MB in "ttt/work/aaa" (mounted under /aaa), the total space consumed by ttt and its descendants is 100MB.

We can also use "zfs list" to look at specific datasets in the pool.

# zfs list ttt/work
NAME USED AVAIL REFER MOUNTPOINT
ttt/work 50.1M 95.6M 24.5K /work_test

Note that the "ttt/work" dataset (mounted under /work_test) contains no data so the "REFER" column shows roughly 0MB.

But the USED value of 50MB reflects the data from the descendant "ttt/work/aaa".

# zfs list ttt/work/aaa
NAME USED AVAIL REFER MOUNTPOINT
ttt/work/aaa 50.0M 95.6M 50.0M /aaa

The "ttt/work/aaa" dataset (mounted under /aaa) contains one 50MB file but. The dataset has no descendants. Therefore both USED and REFER show 50M.

If we want to recursively list all datasets that are part of the "ttt" pool (and exclude all other pools) we need to use the "-r" option and specify the pool name.

# zfs list -r ttt
NAME USED AVAIL REFER MOUNTPOINT
ttt 100M 95.6M 50.0M /ttt
ttt/apps 24.5K 95.6M 24.5K /apps_test
ttt/work 50.1M 95.6M 24.5K /work_test
ttt/work/aaa 50.0M 95.6M 50.0M /aaa
ttt/work/bbb 24.5K 95.6M 24.5K /bbb

Clean up time

# zpool destroy ttt
# rmdir /apps_test
# rmdir /work_test
# rmdir /aaa
# rmdir /bbb
# rm /tmp/file*

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Thursday, July 30, 2009

ZFS Tip: "zpool list", "zpool status", "zpool iostat" & "zpool history"

For those who asked, I will convert these tips to html and post on termite. If I can get this done today I will provide a URL tomorrow.
For those who did not try last week's exercises, I am afraid you will not be eligible for certificates, plaques, trophies or awards.
But the good news is that it is not too late to catch up. If you cut and paste, each exercise should take roughly two minutes.
Today we will look at some "status" or "informational" commands that will give us more information about our ZFS pools. First we need a pool and some file systems to work with. As per usual, we can build the pool on top of files instead of real disk.

Try this on a server near you:

# mkfile 119M /tmp/file1
# mkfile 119M /tmp/file2
# zpool create ttt /tmp/file1 /tmp/file2 # zfs create -o mountpoint=/apps_test ttt/apps # zfs create -o mountpoint=/work_test ttt/work

And lets put some data in one of our file systems.

# mkfile 50M /apps_test/50_meg_of_zeros


First list all the ZFS pools on the system:

# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tools 4.97G 121M 4.85G 2% ONLINE -
ttt 228M 50.2M 178M 22% ONLINE -

Notice that my system has two pools. The "tools" pool is created by jumpstart.

If we only want information for the "ttt" pool we can type:

# zpool list ttt
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
ttt 228M 50.2M 178M 22% ONLINE

The next command will list all the vdevs in the pool; our pool currently has two vdevs (each vdev is comprised for a 119MB file).

# zpool status ttt
pool: ttt
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
ttt ONLINE 0 0 0
/tmp/file1 ONLINE 0 0 0
/tmp/file2 ONLINE 0 0 0
errors: No known data errors

If you have a "tools" pool on your system you can run "zpool status tools" and see how a mirrored vdev is displayed. I promise I will dig into mirroring soon… but not today.

If we want to see how much data is in each vdev we can use another command:

# zpool iostat -v ttt
capacity operations bandwidth
pool used avail read write read write
------------ ----- ----- ----- ----- ----- -----
ttt 50.2M 178M 0 1 15 42.1K
/tmp/file1 24.1M 89.9M 0 0 6 20.2K
/tmp/file2 26.1M 87.9M 0 0 8 21.8K
------------ ----- ----- ----- ----- ----- -----

Notice that our 50MB file has been spread evenly over the two vdevs.
We can also add a time duration to repeatedly display statistics (similar to iostat(1M)).

# zpool iostat -v ttt 5 # this will display statistics every 5 seconds.

We can use "zpool iostat" to see how new writes are balanced over all vdevs in the pool.
Let's first add a third vdev to the pool.

# mkfile 119M /tmp/file3
# zpool add ttt /tmp/file3
# zpool iostat -v ttt
capacity operations bandwidth
pool used avail read write read write
------------ ----- ----- ----- ----- ----- -----
ttt 50.3M 292M 0 0 1 3.26K
/tmp/file1 24.1M 89.9M 0 0 0 1.51K
/tmp/file2 26.1M 87.9M 0 0 0 1.62K
/tmp/file3 8K 114M 0 18 0 80.9K
------------ ----- ----- ----- ----- ----- -----

Now we have an empty vdev. Notice that existing data has not been redistributed.
But if we start writing new data, the new data will be distributed over all vdevs (unless one or more vdevs is full).

# mkfile 50M /apps_test/50_meg_of_zeros_2
# zpool iostat -v ttt
capacity operations bandwidth
pool used avail read write read write
------------ ----- ----- ----- ----- ----- -----
ttt 100M 242M 0 0 1 6.06K
/tmp/file1 39.5M 74.5M 0 0 0 2.37K
/tmp/file2 41.6M 72.4M 0 0 0 2.48K
/tmp/file3 19.2M 94.8M 0 6 0 183K
------------ ----- ----- ----- ----- ----- -----

Let's close off with a self eplanatory command:

# zpool history ttt

History for 'ttt':
2008-02-12.11:27:02 zpool create ttt /tmp/file1 /tmp/file2
2008-02-12.11:29:29 zfs create -o mountpoint=/apps_test ttt/apps
2008-02-12.11:29:32 zfs create -o mountpoint=/work_test ttt/work
2008-02-12.16:31:00 zpool add ttt /tmp/file3
Now if you have a "tools" pool on your system, and you want to see how Jumpstart set it up, try running "zpool history tools".

Clean up time already:

# zpool destroy ttt
# rmdir /apps_test
# rmdir /work_test
# rm /tmp/file*

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Wednesday, July 29, 2009

E10K: Powering on/off procedures

Powering off individual domains

1. Connect to the correct domain
1. Login to ssp as ssp and enter ${domain_name} at the Please enter SUNW_HOSTNAME: prompt.
2. If already logged into the ssp, enter domain_switch ${domain_name} in a command window.
2. ID proper domain boards by executing domain_status and noting the board numbers under the heading SYSBDS.

${SSP}:${Domain}% domain_status

DOMAIN TYPE PLATFORM OS SYSBDS

domain1 Ultra-Enterprise-10000 Plat_name 2.6 4 5

domain2 Ultra-Enterprise-10000 Plat_name 2.6 0 1

domain3 Ultra-Enterprise-10000 Plat_name 2.6 3

domain4 Ultra-Enterprise-10000 Plat_name 2.6 6 8

domain5 Ultra-Enterprise-10000 Plat_name 2.6 9 10

# Bring the domain down.

1. Open another command window, issue domain_switch if necessary.
2. Execute netcon to start up the domain console.
3. Log in as root.
4. Execute sync;sync;sync;init 0
5. Once the system is at the OK prompt, continue.

# In the first command window, enter power -off -sb ${brd_numbers[*]}. Board numbers are listed together with space separators.


Powering off the entire E10K

1. Bring down all E10K domains:
1. Open a command window.
2. Issue domain_switch ${domain_name} to connect to the correct domain.
3. Issue netcon to open the domain console.
4. Log in as root.
5. Execute sync;sync;sync;init 0
6. Once the domain is at the OK prompt, exit the netconsole by issuing ~. (Press/hold tilde while pressing period).
7. Iterate through the above until all domains are down.
2. Open a command window on shamash and enter power -B -off -all. When the command completes, you will hear the power switches changing position in the E10K cabinet.

3. Power off the SSP:
1. su - root
2. sync;sync;sync;init 0
3. When the OK prompt appears, turn off the power to the ssp.


Powering on individual domains

1. Open two command windows; issue domain_switch ${domain_name} as necessary.
2. In one of the windows, ID the system boards by issuing domain_status and noting the board numbers under the heading SYSBDS.

${SSP}:${Domain}% domain_status

DOMAIN TYPE PLATFORM OS SYSBDS

domain1 Ultra-Enterprise-10000 Plat_name 2.6 4 5

domain2 Ultra-Enterprise-10000 Plat_name 2.6 0 1

domain3 Ultra-Enterprise-10000 Plat_name 2.6 3

domain4 Ultra-Enterprise-10000 Plat_name 2.6 6 8

domain5 Ultra-Enterprise-10000 Plat_name 2.6 9 10

3. Issue power -on -sb ${board_numbers[*]}. Board numbers are listed together with space separators.
4. Issue bringup -A off -l7. NOTE: Space between the '-A' and 'off' and lower case L in the '-l7'.
5. In the other window, issue netcon. Wait for the OK prompt to appear, then execute boot.
6. Wait for the system to come up completely, then exit the netconsole by issuing ~. (Press/hold tilde while pressing period).


Powering on the entire E10K

1. Power on the SSP; at the OK prompt, type boot
2. Flip the power switches on the E10K.
3. Log in as ssp
1. Enter ${Plat_name} at the Please enter SUNW_HOSTNAME: prompt.
2. Open a command window; execute power -on -all
3. Open another command windows. foreach domain do:
1. In one window, execute domain_switch ${domain_name}
2. Execute bringup -A off -l7 NOTE: Space between the '-A' and 'off' and lower case L in the '-l7'.
3. In the other command window, execute domain_switch ${domain_name} followed by netcon
4. When the OK prompt appears, execute boot

NOTES:

1. ONLY BRING UP ONE SYSTEM AT A TIME Otherwise, the boot process will take longer than it already does!

Source:
filibeto.org

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Tuesday, July 28, 2009

Moving a pool to a different server

Today we are going to move a ZFS pool from one server to another. There are several ways we could execute this exercise:

a) we could create a pool on SCSI or SAS drives and physically move the drives from one server to another
b) we could create a pool on SAN disk and then ask the Storage team to rezone the disks to another server.
c) we could create a pool on a bunch of memory sticks and move the memory sticks.
d) we could create a pool on 64MB files and ftp the files from one server to the other.

Let's use option "d" (64MB files) because we don't need special hardware.


I encourage you to give this a try; you need a pair of servers; both servers should be setup with the same release of Solaris 10.

The servers I used are called gnat ant epoxy. One is sparc, the other is x86.

If you prefer to use two sparc boxes or two x86 boxes that is fine too.

gnat is a T2000 loaded with Solaris 10 08/07-sparc
epoxy is an X4200 loaded with Solaris 10 07/07-x86

First create a pool with two vdevs… each vdev will consist of a single 64MB file (no mirroring or raidz today)

gnat# mkfile 64M /tmp/file1
gnat# mkfile 64M /tmp/file2
gnat# zpool create ttt /tmp/file1 /tmp/file2

Now add a couple of file systems mounted under "/apps_test", and "/work_test".

Earlier this week we used one command to create file systems and a second command to rename the mount points.

Today we are combining the two steps into a single command to save typing.

gnat# zfs create -o mountpoint=/apps_test ttt/apps
gnat# zfs create -o mountpoint=/work_test ttt/work

gnat# df -h | egrep 'ttt|Filesystem'
Filesystem Size Used Available Capacity Mounted on
ttt 87M 24K 87M 1% /ttt
ttt/apps 87M 24K 87M 1% /apps_test
ttt/work 87M 24K 87M 1% /work_test

Write some data to the file systems.

gnat# echo "hello world #1" > /apps_test/testfile_in_apps
gnat# echo "hello world #2" > /work_test/testfile_in_work
gnat# ls -lR /*test*
/apps_test:
total 2
-rw-r--r-- 1 root root 15 Feb 8 15:21 testfile_in_apps
/work_test:
total 2
-rw-r--r-- 1 root root 15 Feb 8 15:22 testfile_in_work

Now export the pool.

This is similar to exporting a Veritas volume group… but we don't need to bother unmounting the file systems first.

The pool will be left in a state where it can be moved to another system.

gnat# zpool export ttt

If we were using real disk, we would now physically or logically move the disk to the other server.

But since we are using 64MB files, we can simply copy them to the /tmp directory on another server.

If you use ftp make sure to do a "binary" transfer! I used scp.

gnat# scp /tmp/file1 myusername@e_p_o_x_y:/tmp
...
gnat# scp /tmp/file2 myusername@e_p_o_x_y:/tmp
...
Log into the second server. Check that the files are intact:

epoxy# ls -l /tmp/file*
-rw------- 1 waltonch unixadm 67108864 Feb 8 15:31 /tmp/file1
-rw------- 1 waltonch unixadm 67108864 Feb 8 15:35 /tmp/file2

Now import the pool.

If we were using real disks, we could ask the zpool command to examine all new disks searching for "importable" pools.

But since we are using files, we need to tell the zpool command where to look. If you have a million files in /tmp this may take a while. If /tmp is relatively empty it should be quick.

epoxy# zpool import -d /tmp ttt

Now check it out:

epoxy# df -h | egrep 'ttt|Filesystem'
Filesystem Size Used Available Capacity Mounted on
ttt/apps 87M 26K 87M 1% /apps_test
ttt 87M 24K 87M 1% /ttt
ttt/work 87M 26K 87M 1% /work_test


epoxy# ls -lR /*test*
/apps_test:
total 2
-rw-r--r-- 1 root root 15 Feb 8 15:21 testfile_in_apps

/work_test:
total 2
-rw-r--r-- 1 root root 15 Feb 8 15:22 testfile_in_work

Many of us have exported Veritas volume groups from one machine and imported them into another.

But with veritas we had to create the mount points, edit the vfstab file, and manually mount the file systems.

ZFS did it all! And notice that ZFS did not complain going from sparc to x86. Pretty cool folks; pretty cool.

Now we must cleanup on both servers:
epoxy# zpool destroy ttt
epoxy# rmdir /apps_test
epoxy# rmdir /work_test
epoxy# rm /tmp/file*

gnat# rmdir /apps_test
gnat# rmdir /work_test
gnat# rm /tmp/file*


Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Monday, July 27, 2009

ZFS Tip: Multiple vdevs in a pool

Today we will look at spanning a pool over multiple disks (or for demo purposes: multiple 64MB files).

The basic building block of a ZFS pool is called a "vdev" (a.k.a. "virtual device")
A vdev can be one of:

• a single "block device" or a "regular file" (this is what we have used so far)
• a set of mirrored "block devices" and/or "regular files"
• a "raidz" group of "block devices" and/or "regular files" (raidz is an improved version of raid5)

A pool can contain multiple vdevs.

• The total size of the pool will be equal to sum of all vdevs minus overhead.
• Vdevs do not need to be the same size.

Let's jump to it and create a pool with two vdevs… where each vdev is a simple 64MB file. In this case our pool size will be 128MB minus overhead. We will leave mirroring and raidz for another day.


Please try this on an unused Solaris 10 box:

Create two 64MB temp files (if you don't have space in /tmp you can place the files elsewhere… or even use real disk partitions)

# mkfile 64M /tmp/file1
# mkfile 64M /tmp/file2

Create a ZFS pool called "ttt" with two vdevs. The only difference from yesterday's syntax is that we are specifying two 64MB files instead of one.

# zpool create ttt /tmp/file1 /tmp/file2


And create a extra file system called ttt/qqq using the default mount point of /ttt/qqq.

# zfs create ttt/qqq
# df -h | egrep 'ttt|Filesystem' # sorry for inconsistancies: yesterday I used "df -k"; today I switched to "df -h"

Filesystem Size Used Available Capacity Mounted on
ttt 87M 25K 87M 1% /ttt
ttt/qqq 87M 24K 87M 1% /ttt/qqq

We now have 87MB of usable space; this is a bit more than double what we had with only one vdev so it seems the ratio of overhead to usable space improves as we add vdevs.
But again, overhead is generally high because we are dealing with tiny (64MB) vdevs.
Okay.. Lets fill up /ttt/qqq which a bunch of zeros. This will take a minute or two to run and will generate an error.

# dd if=/dev/zero of=/ttt/qqq/large_file_full_of_zeros
write: No space left on device
177154+0 records in
177154+0 records out

We are not using quotas, so ttt/qqq was free to consume all available space. i.e. both /ttt and /ttt/qqq are now full file systems even though /ttt is virtually empty.
# df -h | egrep 'ttt|Filesystem'

Filesystem Size Used Available Capacity Mounted on
ttt 87M 25K 0K 100% /ttt
ttt/qqq 87M 87M 0K 100% /ttt/qqq


# mkfile 109M /tmp/file3

Let's add it to the pool

# zpool add ttt /tmp/file3

If we had been using Veritas or SVM we would have had a three step process: adding disk, resizing volumes, and growing the file systems.

With ZFS, as soon as disk space is added to the pool, the space becomes available to all the file systems in the pool.

So after adding a 109MB vdev to our pool, both /ttt and /ttt/qqq instantly show 104MB of available space. Very cool.

# df -h | egrep 'ttt|Filesystem'

Filesystem Size Used Available Capacity Mounted on
ttt 191M 25K 104M 1% /ttt
ttt/qqq 191M 87M 104M 46% /ttt/qqq

Notice that when talking about pools and vdevs today, I did not mention the words "striping" (raid-0) or "concatenation"… terms that we are used to seeing in the SVM and Veritas worlds.

ZFS pools don't use structured stripes or concatenations. Instead, the a pool will dynamically attempt to balance the data over all its vdevs.

If we started modifying data in our ttt pool, the pool would eventually balance itself
out so the data will be spread evenly over the entire pool.

i.e. No hot spots!

Time for cleanup.

# zpool destroy ttt
# rm /tmp/file[1-3]

Since we used the default mount points today, the directories "/ttt" and "/ttt/qqq" have been removed for us, so there is no more cleanup to do.

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com

Sunday, July 26, 2009

Creating multiple ZFS file systems in a single pool

All of us have experienced the following scenario.
Developers ask for two file systems with specific sizes:
e.g.
/aaa 1GB
/bbb 5GB
Let’s assume that we only have 6GB available and we create file systems as requested.
A few days later /aaa is full and /bbb contains almost nothing. The developers ask for more space in /aaa.
Do you purchase new disk?
Do you backup, resize, and restore?

Or if you are running VXFS/VXVM do you start running convoluted commands to resize the file systems?

Let's look at what the situation would be like if we had used ZFS.

A ZFS pool is capable of housing multiple file systems… all file systems share the same underlying disk space.

• No rigid boundaries are created between file systems; the data from each file system is evenly distributed throughout the pool.
• By default, any file system is allowed to use any (or all) of the free space in the pool.
• If data is deleted from a file system, the space is returned to the pool as free space.

So in the example above, if we had created a 6GB pool housing both /aaa and /bbb, either file system could potentially grow to almost 6GB.

We would not get a report of a full file system until the entire pool is full. The pool won't fill up until the total data written to both file systems is roughly equal to the size of the pool.

Thus there would be nothing stopping the developers from placing 4GB in /aaa and 1GB in /bbb… this would leave approximately 1GB of space free for either file system to consume.

The behaviour can be adjusted with "reservations" and "quotas"… but lets leave that for another day.

So let's see how to create a ZFS pool with multiple file systems. Normally we would create the pool on one or more real disks, but for test purposes we can use a 64GB file . Try this on an unused server:

Create a 64MB temp file
# mkfile 64M /tmp/file1

Create a ZFS pool called "ttt" on top of the temp file.
# zpool create ttt /tmp/file1

Run df to make sure /ttt exists

# df -k | egrep 'ttt|Filesystem'
Filesystem 1024-blocks Used Available Capacity Mounted on
ttt 28160 24 28075 1% /ttt

Now create two new file systems within pool ttt

# zfs create ttt/xxx
# zfs create ttt/yyy

Now view all three file systems:

# df -k | egrep 'ttt|Filesystem'

Filesystem 1024-blocks Used Available Capacity Mounted on
ttt 28160 27 27971 1% /ttt
ttt/xxx 28160 24 27971 1% /ttt/xxx
ttt/yyy 28160 24 27971 1% /ttt/yyy

Note that ZFS file systems within a pool must be created in a hierarchical structure. The ZFS pool (in this case "ttt") is always the root of the pool.

The mount points by default will share the same name as the file systems (prefixed with a "/").

But nobody wants to use /ttt/xxx or /ttt/yyy as mount points, so lets change the mount points.

# zfs set mountpoint=/aaa ttt/xxx
# zfs set mountpoint=/hello/world ttt/yyy
# df -k | egrep 'ttt|Filesystem'

Filesystem 1024-blocks Used Available Capacity Mounted on
ttt 28160 24 27962 1% /ttt
ttt/xxx 28160 24 27962 1% /aaa
ttt/yyy 28160 24 27962 1% /hello/world

Note that we did not have to create mount points or set anything up in /etc/vfstab. ZFS takes care or everything for us and life is great.

And to clean up… the commands are the same as before… but you may have to manually remove some of the mount points.

# zpool destroy ttt
# rm /tmp/file1
# rmdir /aaa
# rmdir /hello/world
# rmdir /hello

Read the rest of this entry...

Bookmark and Share My Zimbio http://www.wikio.com