Wednesday, August 15, 2012

Java 7: Fork/Join Framework Example

The Fork/Join Framework in Java 7 is designed for work that can be broken down into smaller tasks and the results of those tasks combined to produce the final result.

In general, classes that use the Fork/Join Framework follow the following simple algorithm:

// pseudocode
Result solve(Problem problem) {
  if (problem.size < SEQUENTIAL_THRESHOLD)
    return solveSequentially(problem);
  else {
    Result left, right;
    INVOKE-IN-PARALLEL {
      left = solve(extractLeftHalf(problem));
      right = solve(extractRightHalf(problem));
    }
    return combine(left, right);
  }
}
In order to demonstrate this, I have created an example to find the maximum number from a large array using fork/join:
import java.util.Random;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;

public class MaximumFinder extends RecursiveTask<Integer> {

  private static final int SEQUENTIAL_THRESHOLD = 5;

  private final int[] data;
  private final int start;
  private final int end;

  public MaximumFinder(int[] data, int start, int end) {
    this.data = data;
    this.start = start;
    this.end = end;
  }

  public MaximumFinder(int[] data) {
    this(data, 0, data.length);
  }

  @Override
  protected Integer compute() {
    final int length = end - start;
    if (length < SEQUENTIAL_THRESHOLD) {
      return computeDirectly();
    }
    final int split = length / 2;
    final MaximumFinder left = new MaximumFinder(data, start, start + split);
    left.fork();
    final MaximumFinder right = new MaximumFinder(data, start + split, end);
    return Math.max(right.compute(), left.join());
  }

  private Integer computeDirectly() {
    System.out.println(Thread.currentThread() + " computing: " + start
                       + " to " + end);
    int max = Integer.MIN_VALUE;
    for (int i = start; i < end; i++) {
      if (data[i] > max) {
        max = data[i];
      }
    }
    return max;
  }

  public static void main(String[] args) {
    // create a random data set
    final int[] data = new int[1000];
    final Random random = new Random();
    for (int i = 0; i < data.length; i++) {
      data[i] = random.nextInt(100);
    }

    // submit the task to the pool
    final ForkJoinPool pool = new ForkJoinPool(4);
    final MaximumFinder finder = new MaximumFinder(data);
    System.out.println(pool.invoke(finder));
  }
}
The MaximumFinder class is a RecursiveTask which is responsible for finding the maximum number from an array. If the size of the array is less than a threshold (5) then find the maximum directly, by iterating over the array. Otherwise, split the array into two halves, recurse on each half and wait for them to complete (join). Once we have the result of each half, we can find the maximum of the two and return it.

Tuesday, August 14, 2012

Analysing a Java Core Dump

In this post, I will show you how you can debug a Java core file to see what caused your JVM to crash. I will be using a core file I generated in my previous post: Generating a Java Core Dump.

There are different ways you can diagnose a JVM crash, listed below:

The hs_err_pid log file
When a fatal error occurs in the JVM, it produces an error log file called hs_err_pidXXXX.log, normally in the working directory of the process or in the temporary directory for the operating system. The top of this file contains the cause of the crash and the "problematic frame". For example, mine shows:

$ head hs_err_pid21178.log
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000002b1d00075c, pid=21178, tid=1076017504
#
# JRE version: 6.0_21-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode linux-amd64 )
# Problematic frame:
# C  [libnativelib.so+0x75c]  bar+0x10
#
There is also a stack trace:
Stack: [0x000000004012b000,0x000000004022c000],  sp=0x000000004022aac0,  free space=3fe0000000000000018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libnativelib.so+0x75c]  bar+0x10
C  [libnativelib.so+0x772]  foo+0xe
C  [libnativelib.so+0x78e]  Java_CoreDumper_core+0x1a
j  CoreDumper.core()V+0
j  CoreDumper.main([Ljava/lang/String;)V+7
v  ~StubRoutines::call_stub
V  [libjvm.so+0x3e756d]
The stack trace shows that my java method, CoreDumper.core(), called into JNI and died when the bar function was called in native code.

Debugging a Java Core Dump
In some cases, the JVM may not produce a hs_err_pid file, for example, if the native code abruptly aborts by calling the abort function. In such cases, we need to analyse the core file produced. On my machine, the operating system writes out core files to /var/tmp/cores. You can use the following command to see where your system is configured to write out core files to:

$ cat /proc/sys/kernel/core_pattern
/var/tmp/cores/%e.%p.%u.core
$ ls /var/tmp/cores
java.21178.146385.core
There are a few, different ways to look at core dumps:

1. Using gdb
GNU Debugger (gdb) can examine a core file and work out what the program was doing when it crashed.

$ gdb $JAVA_HOME/bin/java /var/tmp/cores/java.14015.146385.core
(gdb) where
#0  0x0000002a959bd26d in raise () from /lib64/tls/libc.so.6
#1  0x0000002a959bea6e in abort () from /lib64/tls/libc.so.6
#2  0x0000002b1cecf799 in bar () from libnativelib.so
#3  0x0000002b1cecf7a7 in foo () from libnativelib.so
#4  0x0000002b1cecf7c3 in Java_CoreDumper_core () from libnativelib.so
#5  0x0000002a971aac88 in ?? ()
#6  0x0000000040113800 in ?? ()
#7  0x0000002a9719fa42 in ?? ()
#8  0x000000004022ab10 in ?? ()
#9  0x0000002a9a4d5488 in ?? ()
#10 0x000000004022ab70 in ?? ()
#11 0x0000002a9a4d59c8 in ?? ()
#12 0x0000000000000000 in ?? ()
The where command prints the stack frames and shows that the bar function called abort() which caused the crash.

2. Using jstack
jstack prints stack traces of Java threads for a given core file.

$ jstack -J-d64 $JAVA_HOME/bin/java /var/tmp/cores/java.14015.146385.core
Debugger attached successfully.
Server compiler detected.
JVM version is 17.0-b16
Deadlock Detection:

No deadlocks found.

Thread 16788: (state = BLOCKED)

Thread 16787: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=118 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=134 (Interpreted frame)
 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 (Interpreted frame)

Thread 16786: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
 - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=116 (Interpreted frame)

Thread 16780: (state = IN_NATIVE)
 - CoreDumper.core() @bci=0 (Interpreted frame)
 - CoreDumper.main(java.lang.String[]) @bci=7, line=12 (Interpreted frame)
3. Using jmap
jmap examines a core file and prints out shared object memory maps or heap memory details.
$ jmap -J-d64 $JAVA_HOME/bin/java /var/tmp/cores/java.14015.146385.core
Debugger attached successfully.
Server compiler detected.
JVM version is 17.0-b16
0x0000000040000000      49K     /usr/sunjdk/1.6.0_21/bin/java
0x0000002a9566c000      124K    /lib64/tls/libpthread.so.0
0x0000002a95782000      47K     /usr/sunjdk/1.6.0_21/jre/lib/amd64/jli/libjli.so
0x0000002a9588c000      16K     /lib64/libdl.so.2
0x0000002a9598f000      1593K   /lib64/tls/libc.so.6
0x0000002a95556000      110K    /lib64/ld-linux-x86-64.so.2
0x0000002a95bca000      11443K  /usr/sunjdk/1.6.0_21/jre/lib/amd64/server/libjvm.so
0x0000002a96699000      625K    /lib64/tls/libm.so.6
0x0000002a9681f000      56K     /lib64/tls/librt.so.1
0x0000002a96939000      65K     /usr/sunjdk/1.6.0_21/jre/lib/amd64/libverify.so
0x0000002a96a48000      228K    /usr/sunjdk/1.6.0_21/jre/lib/amd64/libjava.so
0x0000002a96b9e000      109K    /lib64/libnsl.so.1
0x0000002a96cb6000      54K     /usr/sunjdk/1.6.0_21/jre/lib/amd64/native_threads/libhpi.so
0x0000002a96de8000      57K     /lib64/libnss_files.so.2
0x0000002a96ef4000      551K    /lib64/libnss_db.so.2
0x0000002a97086000      89K     /usr/sunjdk/1.6.0_21/jre/lib/amd64/libzip.so
0x0000002b1cecf000      6K      /home/sharfah/tmp/jni/libnativelib.so
Useful Links:
Crash course on JVM crash analysis
Generating a Java Core Dump

Monday, August 13, 2012

Generating a Java Core Dump

This post demonstrates how you can generate a Java core dump manually (using JNI).

1. Create a Java class

/**
 * A class to demonstrate core dumping.
 */
public class CoreDumper {

  // load the library
  static {
    System.loadLibrary("nativelib");
  }

  // native method declaration
  public native void core();

  public static void main(String[] args) {
    new CoreDumper().core();
  }
}
2. Compile the Java class
$ javac CoreDumper.java
$ ls
CoreDumper.class  CoreDumper.java
3. Generate the header file
$ javah -jni CoreDumper
$ ls
CoreDumper.class  CoreDumper.h  CoreDumper.java
4. Implement the native method
Copy the method declaration from the header file and create a new file called CoreDumper.c containing the implementation of this method:
#include "CoreDumper.h"

void bar() {
  // the following statements will produce a core
  int* p = NULL;
  *p = 5;

  // alternatively:
  // abort();
}

void foo() {
  bar();
}

JNIEXPORT void JNICALL Java_CoreDumper_core
  (JNIEnv *env, jobject obj) {
  foo();
}
5. Compile the native code
This command may vary based on your operating system. On my Red Hat Linux machine, I use the following command:
$ gcc -fPIC -o libnativelib.so -shared \
            -I$JAVA_HOME/include/linux/ \
            -I$JAVA_HOME/include/ \
             CoreDumper.c
$ ls
CoreDumper.class  CoreDumper.h  CoreDumper.java libnativelib.so
6. Run the program
$ java -Djava.library.path=. CoreDumper
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000002b1cecf75c, pid=18919, tid=1076017504
#
# JRE version: 6.0_21-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode linux-amd64 )
# Problematic frame:
# C  [libnativelib.so+0x75c]  bar+0x10
#
# An error report file with more information is saved as:
# /home/sharfah/tmp/jni/hs_err_pid18919.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted (core dumped)
The core file
As shown above, running the program causes it to crash and a core file is produced. On my machine, the operating system writes out core files to /var/tmp/cores. You can use the following command to see what your core file directory is configured to:
$ cat /proc/sys/kernel/core_pattern
/var/tmp/cores/%e.%p.%u.core
$ ls /var/tmp/cores
java.21178.146385.core
In my next post, I will show you how can you perform some quick analysis on a core file to see what caused the crash.

Next Post:
Analysing a Java Core Dump

Thursday, August 09, 2012

Running a command on multiple hosts

There are different ways you can run a command on multiple machines.

1. For loop
If you want to execute the same command on a few hosts, you can use a for loop as shown below:

for host in host1 host2 host3
do
    ssh $host "hostname; who -b"
done
The example above iterates over a list of hosts, and runs two commands on each one to print the name of the host and the time it was rebooted.

2. While loop
If your list of hosts is stored in a file, you can use a while loop as shown below:

while IFS= read -r host
do
    ssh -n $host "hostname; who -b"
done < /tmp/myhosts
You must provide the -n option to ssh, otherwise it will only run on the first host in your file and then the loop will terminate.

3. Parallel ssh
Parallel ssh (pssh) allows you to run a command on several hosts at the same time and is much faster than using a sequential loop if the number of hosts is large. You can specify how many parallel processes it uses to ssh to the various hosts (default is 32).

$ pssh
Usage: pssh [OPTIONS] -h hosts.txt prog [arg0] ..

  -h --hosts   hosts file (each line "host[:port] [user]")
  -l --user    username (OPTIONAL)
  -p --par     max number of parallel threads (OPTIONAL)
  -o --outdir  output directory for stdout files (OPTIONAL)
  -t --timeout timeout in seconds to do ssh to a host (OPTIONAL)
  -v --verbose turn on warning and diagnostic messages (OPTIONAL)
  -O --options SSH options (OPTIONAL)

$ pssh -h /tmp/myhosts -o /tmp/output "hostname; who -b"

Saturday, August 04, 2012

bash error: value too great for base

I came across this interesting error today:
-bash: 08: value too great for base (error token is "08")
It was coming from a script which works out the previous month by extracting the current month from the current date and then decrementing it. The code looks like this:
today="$(date +%Y%m%d)"
month=${today:4:2}
prevmonth=$((--month))
This script throws an error only if the current month is 08 or 09. I found that the reason for this is that numbers starting with 0 are interpreted as octal numbers and 8 and 9 are not in the base-8 number system, hence the error. There are more details on the bash man page:
Constants with a leading 0 are interpreted as octal numbers. A leading 0x or 0X denotes hexadecimal. Otherwise, numbers take the form [base#]n, where base is a decimal number between 2 and 64 represent- ing the arithmetic base, and n is a number in that base. If base# is omitted, then base 10 is used.
To fix this issue, I specified the base-10 prefix as shown below:
today="$(date +%Y%m%d)"
month=10#${today:4:2}
prevmonth=$((--month))