HugePages with Oracle example on...

HugePages with Oracle example on RHEL 5 with 10g

 
Determine hugepages requirement and kernel parameters (database should be running for this)
The perl script below will first backup the sysctl.conf and limits.conf files, write the new recommended and calculated values to a new version of each file, then write back the changes to the active files.  See comments in script for details of what it does.
 
create file hugemem.pl with content below and run with:
    perl hugemem.pl

#!/usr/bin/perl -w
use strict;
# Make timestamped backup for sysctl.conf and limits.conf
my $timestamp = `date +%Y%m%d%H%M`;
system("cp /etc/sysctl.conf /etc/sysctl.conf.$timestamp");
system("cp /etc/security/limits.conf /etc/security/limits.conf.$timestamp");
# Get kernel version
my $kern = `uname -r`;
$kern =~ /^(\d\.\d)/;
$kern = $1;
my $hpg_sz = `grep Hugepagesize /proc/meminfo | awk '{print \$2}'`;
my $num_pg = 1;
my $min_pg = 0;
# Get oracle shared memory segments, initialize afterKey and smssum for the for loop below
my @ipcs_out = `ipcs -m`;
my $afterKey = 0;
my $smssum = 0;
# Find total available mem from system
my $mem = `free | grep Mem | awk '{print \$2}'`;
# Convert mem to bytes
my $totmem =  $mem * 1024;
# Get hugepagesize of architecture we're on
my $huge = `grep Hugepagesize /proc/meminfo |awk '{print \$2}'`;
# Calculate the % of total memory for SHMMAX, in this case 75%
my $max = ($totmem * 75) / 100;
# Calculate SHMALL by dividing SHMAX by Hugepagesize
my $all = $max / $huge;
# Oracle recommended semaphores
my $sem = '250 32000 100 142';
# Shared memory segments
my $mni = '4096';
# File limits recommended by oracle
my $fmax = '131072';
# Receive socket buffer size
my $rmemd = '262144';
my $rmemm = '4194304';
# Send socket buffer size
my $wmemd = '262144';
my $wmemm = '4194304';
# TCP socket buffer
my $ipv4r = '4096 262144 4194304';
my $ipv4w = '4096 262144 4194304';
# Port range
my $ipv4p = '1024 65000';
# Frequency of keepalive packets when connection is not in use
my $katime = '30';
# Kernel wait between probes
my $kintvl = '60';
# Max probes
my $kprobe = '9';
# SYN retries
my $synr = '2';
# Memory settings
# Disable swapping for oracle
my $swap = '0';
# % of active memory that can have dirty pages
my $dirtyb = '3';
# % of total memory that can have dirty pages
my $dirtyr = '15';
# 1/100th of seconds that page cache data is expired
my $dirtye = '500';
# frequency pdflush will clean dirty pages
my $dirtyw = '100';
# limits.conf recommended by oracle
my $nproc = '131072';
# Find size of all shared memory segments
foreach my $ipcsLine (@ipcs_out) {
        chomp $ipcsLine;
        next if ! $ipcsLine;
        if ($afterKey) {
                my @ipcsVals = split /\s+/, $ipcsLine;
                if (! $ipcsVals[6]) { $smssum += $ipcsVals[4]; }
        }
        $afterKey++ if $ipcsLine =~ /^key\s/;
}
# Determine number of huge pages needed to hold all shared mem segments
$min_pg = $smssum / ($hpg_sz * 1024);
$num_pg = $min_pg + 1;
# Calculate HUHETBL_POOL size
my $hugetbl_pool = ($num_pg * $hpg_sz) / 1024;
# Get oracle group id
my $oracle_gid = `id -g oracle`;
# Calculate memlock for limits.conf based upon allocated huge pages
my $memlock = $num_pg * 1024 * 2;
# Write out limits.conf
open OUTPL, '>/etc/security/limits.conf.hugemem' or die "Cannot write /etc/security/limits.conf.hugemem: $!";
open LIMITS, '/etc/security/limits.conf' or die "Cannot read limits.conf: $!";
while (my $linel = <LIMITS>) {
        chomp $linel;
        next if $linel =~ /memlock/;
        next if $linel =~ /End/;
        next if $linel =~ /nproc/;
        print OUTPL "$linel\n";
}
close LIMITS;
print OUTPL "oracle soft  memlock  $memlock\n";
print OUTPL "oracle hard  memlock  $memlock\n";
print OUTPL "oracle soft  nproc  $nproc\n";
print OUTPL "oracle hard  nproc  $nproc\n";
close OUTPL;
# Write out sysctl.conf
open OUTP, '>/etc/sysctl.conf.hugemem' or die "Cannot write /etc/sysctl.conf.hugemem: $!";
open SYSCTL, '/etc/sysctl.conf' or die "Cannot read sysctl.conf: $!";
while (my $line = <SYSCTL>) {
        chomp $line;
        next if $line =~ /^vm\.hugetlb_shm_group/;
        next if $line =~ /^kernel\.shmmax/;
        next if $line =~ /^kernel\.shmall/;
        next if $line =~ /^kernel\.sem/;
        next if $line =~ /^kernel\.shmmni/;
        next if $line =~ /^fs\.file-max/;
        next if $line =~ /^net\.core\.rmem_default/;
        next if $line =~ /^net\.core\.rmem_max/;
        next if $line =~ /^net\.core\.wmem_default/;
        next if $line =~ /^net\.core\.wmem_max/;
        next if $line =~ /^net\.ipv4\.tcp_rmem/;
        next if $line =~ /^net\.ipv4\.tcp_wmem/;
        next if $line =~ /^net\.ipv4\.ip_local_port_range/;
        next if $line =~ /^net\.ipv4\.tcp_keepalive_time/;
        next if $line =~ /^net\.ipv4\.tcp_keepalive_intvl/;
        next if $line =~ /^net\.ipv4\.tcp_keepalive_probes/;
        next if $line =~ /^net\.ipv4\.tcp_syn_retries/;
        next if $line =~ /^vm\.swappiness/;
        next if $line =~ /^vm\.dirty_background_ratio/;
        next if $line =~ /^vm\.dirty_ratio/;
        next if $line =~ /^vm\.dirty_expire_centisecs/;
        next if $line =~ /^vm\.dirty_writeback_centisecs/;
        if ($kern eq '2.4') {
                next if $line =~ /^vm\.hugetlb_pool/;
        } elsif ($kern eq '2.6') {
                next if $line =~ /^vm\.nr_hugepages/;
        }
        print OUTP "$line\n";
}
close SYSCTL;
if ($kern eq '2.4') {
        print OUTP "vm.hugetlb_pool = $hugetbl_pool\n";
} elsif ($kern eq '2.6') {
        print OUTP "vm.nr_hugepages = $num_pg\n";
}
print OUTP "vm.hugetlb_shm_group = $oracle_gid\n";
print OUTP "kernel.shmmax = $max\n";
print OUTP "kernal.shmall = $all\n";
print OUTP "kernal.sem = $sem\n";
print OUTP "kernal.shmmni = $mni\n";
print OUTP "fs.file-max = $fmax\n";
print OUTP "net.core.rmem_default = $rmemd\n";
print OUTP "net.core.rmem_max = $rmemm\n";
print OUTP "net.core.wmem_default = $wmemd\n";
print OUTP "net.core.wmem_max = $wmemm\n";
print OUTP "net.ipv4.tcp_rmem = $ipv4r\n";
print OUTP "net.ipv4.tcp_wmem = $ipv4w\n";
print OUTP "net.ipv4.ip_local_port_range = $ipv4p\n";
print OUTP "net.ipv4.tcp_keepalive_time = $katime\n";
print OUTP "net.ipv4.tcp_keepalive_intvl = $kintvl\n";
print OUTP "net.ipv4.tcp_keepalive_probes = $kprobe\n";
print OUTP "net.ipv4.tcp_syn_retries = $synr\n";
print OUTP "vm.swappiness = $swap\n";
print OUTP "vm.dirty_background_ratio = $dirtyb\n";
print OUTP "vm.dirty_ratio = $dirtyr\n";
print OUTP "vm.dirty_expire_centisecs = $dirtye\n";
print OUTP "vm.dirty_writeback_centisecs = $dirtyw\n";
close OUTP;
system("mv /etc/sysctl.conf.hugemem /etc/sysctl.conf");
system("mv /etc/security/limits.conf.hugemem /etc/security/limits.conf");


/etc/sysctl.conf will be updated with similar output to below:

# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
# sysctl.conf(5) for more details.


# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# Controls source route verification
net.ipv4.conf.default.rp_filter = 1

# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename
# Useful for debugging multi-threaded applications
kernel.core_uses_pid = 1

# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1

# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536

# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536

# Controls the maximum shared segment size, in bytes

# Controls the maximum number of shared memory segments, in pages

vm.nr_hugepages = 4002
vm.hugetlb_shm_group = 1034

kernel.shmmax = 28450271232
kernal.shmall = 13891734
kernal.sem = 250 32000 100 142
kernal.shmmni = 4096
fs.file-max = 131072
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 4194304
net.ipv4.tcp_rmem = 4096 262144 4194304
net.ipv4.tcp_wmem = 4096 262144 4194304
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_syn_retries = 2
vm.swappiness = 0
vm.dirty_background_ratio = 3
vm.dirty_ratio = 15
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100


sysctl -p  #run this to activate new kernel parameters  
Example limits.conf output

oracle  soft    nofile  4096
oracle  hard    nofile  65536

oracle soft  memlock  8196096
oracle hard  memlock  8196096
oracle soft  nproc  131072
oracle hard  nproc  131072

   
Reboot after these changes to ensure that oracle can obtain the new hugepages settings and limits.conf settings etc.
 
Also, if your sga is set too small and you need to update your spfile be sure to rerun this script after you've updated and restarted your database.  You will most likely need to try various settings and run through a few iterations to obtain the best configuration/performance.
 
vm.overcommit_memory settings #for VM's:
    0 =  kernel estimates amount of free memory left when userspace requests more
    1 =  kernel pretends there is always enough until it runs out
    2 =  never overcommit
 
Check dirty pages and adjust vm.dirty_background_ratio and vm.dirty_ration on a VM accordingly
    grep -A 1 dirty /proc/vmstat  #the lower the numbers the better
 
Example spfile for large memory system:

*._b_tree_bitmap_plans=false
*._column_elimination_off=TRUE
*.audit_file_dest='/oracle/admin/test/audit'
*.audit_trail='os'
*.background_dump_dest='/oracle/admin/test/bdump'
*.compatible='9.2.0'
*.control_files='/testdata01/test/testctrl1','/testdata01/test/testctrl2','/
oracle/admin/test/cfile/testctrl3'
*.core_dump_dest='/oracle/admin/test/cdump'
*.db_block_size=32768
*.db_cache_size=26214400000
*.db_file_multiblock_read_count=32
*.db_files=500
*.db_keep_cache_size=21474836480
*.db_name='test'
*.java_pool_size=20971520
*.job_queue_processes=4
*.large_pool_size=10485760
*.log_buffer=1048576
*.O7_DICTIONARY_ACCESSIBILITY=true
*.open_cursors=512
*.optimizer_index_caching=10
*.optimizer_index_cost_adj=80
*.parallel_max_servers=12
*.parallel_min_servers=0
*.pga_aggregate_target=16777216000
*.processes=125

*.query_rewrite_enabled='FALSE'
*.query_rewrite_integrity='stale_tolerated'
*.remote_login_passwordfile='EXCLUSIVE'
*.resource_limit=true
*.sga_max_size=45G
*.shared_pool_size=125M
*.star_transformation_enabled='true'
*.timed_statistics=true
*.undo_management='auto'
*.undo_retention=18000
*.undo_tablespace='undo'
*.user_dump_dest='/oracle/admin/test/udump'

2 comments:

Unknown said...

Here is a slightly updated version correcting the kernal/kernel spelling error and addressing an issue where if the system was just made and this script ran the hugepages would be less that 100 (more like 1).

http://ruffingthewitness.com/wp-content/scripts/hugepage.pl

Paul Valentino said...

Thanks KSiR, I also place an update on Hugepages Script