Extracting metadata information from files using Apache Tika

December 11th, 2011 No comments

I recently discovered a useful library called Apache Tika that makes it easy to extract metadata information from many types of files.
The ECM Alfresco makes use of Apache Tika for both metadata extraction and content transformation.
With Apache Tika, you do not have to worry about which parser to use with a type of file. Apache Tika will look for a parser implementation that matches the type of the document, once it is known, using Mime Type detection.
Here is a basic usage of the library to extract metadata information from files such as documents (PDF/DOC/XLS), images (JPG), songs (MP3).

You can start from a maven archetype such as quickstart. Then all you need is to add the following two dependencies :

<dependencies>
...
 <dependency>
	            <groupId>org.apache.tika</groupId>
	            <artifactId>tika-core</artifactId>
	            <version>1.0</version>
 </dependency>
 <dependency>
	            <groupId>org.apache.tika</groupId>
	            <artifactId>tika-parsers</artifactId>
	            <version>1.0</version>
 </dependency>
</dependencies>

The org.apache.tika.parser.AutoDetectParser class is in charge of dispatching the incoming document to the appropriate parser. It is especially useful when the type of the document is not known in advance.

package net.celinio.tika.firstProject;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;

import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.sax.BodyContentHandler;

public class MetaDataExtraction {

	public static void main(String[] args) {
		 
		try {
			//String resourceLocation = "d:\\tempTika\\TikainAction.pdf";
			//String resourceLocation = "d:\\tempTika\\06-takefive.mp3";			
			String resourceLocation = "d:\\tempTika\\mariniere14juillet2011.jpg";
			//String resourceLocation = "d:\\tempTika\\02b-blank-timetable.doc";
			//String resourceLocation = "d:\\tempTika\\examstudytable.doc";
			//String resourceLocation = "d:\\tempTika\\timetable.xls";
			
			File file = new File(resourceLocation);
			 
			InputStream input = new FileInputStream(file);			 
			System.out.println( file.getPath());				
			
			Metadata metadata = new Metadata();
			 
			BodyContentHandler handler = new BodyContentHandler(10*1024*1024);
			AutoDetectParser parser = new AutoDetectParser();		
	
			parser.parse(input, handler, metadata);
			 /*
			String content = new Tika().parseToString(f);
			//System.out.println("Content: " + content);
			//System.out.println("Content: " + handler.toString());
			System.out.println("Title: " + metadata.get(Metadata.TITLE));
			System.out.println("Last author: " + metadata.get(Metadata.LAST_AUTHOR));
			System.out.println("Last modified: " + metadata.get(Metadata.LAST_MODIFIED));
			System.out.println("Content type: " + metadata.get(Metadata.CONTENT_TYPE));
			System.out.println("Application name: " + metadata.get(Metadata.APPLICATION_NAME));
			System.out.println("Author: " + metadata.get(Metadata.AUTHOR));
			System.out.println("Line count: " + metadata.get(Metadata.LINE_COUNT));
			System.out.println("Word count: " + metadata.get(Metadata.WORD_COUNT));
			System.out.println("Page count: " + metadata.get(Metadata.PAGE_COUNT));
			System.out.println("MIME_TYPE_MAGIC: " + metadata.get(Metadata.MIME_TYPE_MAGIC));
			System.out.println("SUBJECT: " + metadata.get(Metadata.SUBJECT));
			
			*/

			String[] metadataNames = metadata.names();
			
			// Display all metadata
			for(String name : metadataNames){
				System.out.println(name + ": " + metadata.get(name));
			}
			
			}
			catch (Exception e) {
				e.printStackTrace();
			}
			 
	}
}

Line 30, I am using the BodyContentHandler constructor that takes an argument because i need to increase the size limit. Otherwise the WriteLimitReachedException exception is raised when parsing the file TikainAction.pdf (16,4 MB) :

org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: 
Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).

Here is the output for the image file mariniere14juillet2011.jpg :

d:\tempTika\mariniere14juillet2011.jpg
Number of Components: 3
Windows XP Title: Popo
Date/Time Original: 2011:07:14 14:16:10
Image Height: 600 pixels
Image Description: Popo
Data Precision: 8 bits
Sub-Sec Time Digitized: 31
tiff:BitsPerSample: 8
Windows XP Subject: Moules
date: 2011-07-14T14:16:10
exif:DateTimeOriginal: 2011-07-14T14:16:10
Component 1: Y component: Quantization table 0, Sampling factors 2 horiz/2 vert
tiff:ImageLength: 600
Component 2: Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert
Component 3: Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert
Date/Time Digitized: 2011:07:14 14:16:10
description: Popo
tiff:ImageWidth: 800
Unknown tag (0xea1c): 28 -22
Image Width: 800 pixels
Sub-Sec Time Original: 31
Content-Type: image/jpeg
Artist: Popo;Cel
Windows XP Author: Popo;Cel

And the output for the song file 06-takefive.mp3 :

d:\tempTika\06-takefive.mp3
xmpDM:releaseDate: null
xmpDM:audioChannelType: Stereo
xmpDM:album: Take Five
Author: Dave Brubeck
xmpDM:artist: Dave Brubeck
channels: 2
xmpDM:audioSampleRate: 44100
xmpDM:logComment: null
xmpDM:trackNumber: 6/8
version: MPEG 3 Layer III Version 1
xmpDM:composer: null
xmpDM:audioCompressor: MP3
title: Take Five
samplerate: 44100
xmpDM:genre: null
Content-Type: audio/mpeg

And the output for the ebook TikainAction.pdf :

d:\tempTika\TikainAction.pdf
xmpTPg:NPages: 257
Creation-Date: 2011-11-09T12:20:20Z
title: Tika in Action
created: Wed Nov 09 13:20:20 CET 2011
Licensed to: Celinio Fernandes  <xxx@yyy.com>
Last-Modified: 2011-11-16T12:25:00Z
producer: Acrobat Distiller 9.4.6 (Windows)
Author: Chris A. Mattmann, Jukka L. Zitting
Content-Type: application/pdf
creator: FrameMaker 8.0

And the output for the Word document 02b-blank-timetable.doc :

d:\tempTika\02b-blank-timetable.doc
Revision-Number: 4
Comments: 
Last-Author: CeLTS
Template: Normal.dot
Page-Count: 1
subject: 
Application-Name: Microsoft Office Word
Author: CeLTS
Word-Count: 1921
xmpTPg:NPages: 1
Edit-Time: 3600000000
Creation-Date: 2006-02-09T00:31:00Z
title: Study Timetable
Character Count: 10951
Company: Monash University
Content-Type: application/msword
Keywords: 
Last-Save-Date: 2006-10-30T05:52:00Z

As you can see, the list of metadata information (title, author, image height, etc) is varying, depending on which parser is used and of course which type of document it is.
You can also search the content of the files as Apache Tika provides access to the textual content of files.
By the way, there is a Tika GUI which is a handy tool that makes it possible to extract metadata information by simply drag and dropping a file into it.
To launch it, just download the jar tika-app-1.0.jar and run it :

java -jar tika-app-1.0.jar --gui


Drag and drop a file into it and read the extracted metadata :

Links :

http://tika.apache.org/
The book Tika in Action (Manning)

Categories: Apache Tika, Maven Tags:

Handling form-based file upload with GWT and the Apache Jakarta Commons FileUpload library

November 27th, 2011 No comments

Uploading files to a filesystem, a remote server, a database, etc, is a frequent need in web applications.
These files are often multipart data (that is of varying types such as XML, HTML, plain text, binary … ).
With GWT, a good solution to handle this need is the use of the Apache Jakarta Commons FileUpload library.

First, generate a skeleton project using the gwt-maven-plugin archetype :

mvn archetype:generate  

Choose archetype number 298 which makes use of the gwt-maven-plugin and generates a simple hello world sample.

298: remote -> gwt-maven-plugin (Maven plugin for the Google Web Toolkit.)

You can easily import that project into Eclipse (File > Import …> Maven > Existing Maven projects).

Add the following dependency to the pom.xml file:

<dependency>
    <groupId>commons-fileupload</groupId>
    <artifactId>commons-fileupload</artifactId>
    <version>1.2.2</version>
</dependency>

In the client side, modify the onModuleLoad() method of the entry point class (called Firstmodule.java in my project) and add the following code at the end :

package com.mycompany.client;

import com.google.gwt.core.client.EntryPoint;
import com.google.gwt.core.client.GWT;
import com.google.gwt.event.dom.client.ClickEvent;
import com.google.gwt.event.dom.client.ClickHandler;
import com.google.gwt.event.dom.client.KeyCodes;
import com.google.gwt.event.dom.client.KeyUpEvent;
import com.google.gwt.event.dom.client.KeyUpHandler;
import com.google.gwt.user.client.rpc.AsyncCallback;
import com.google.gwt.user.client.ui.Button;
import com.google.gwt.user.client.ui.DialogBox;
import com.google.gwt.user.client.ui.FileUpload;
import com.google.gwt.user.client.ui.FormPanel;
import com.google.gwt.user.client.ui.HTML;
import com.google.gwt.user.client.ui.Label;
import com.google.gwt.user.client.ui.RootPanel;
import com.google.gwt.user.client.ui.TextBox;
import com.google.gwt.user.client.ui.VerticalPanel;
import com.mycompany.shared.FieldVerifier;

/**
 * Entry point classes define <code>onModuleLoad()</code>.
 */
public class Firstmodule implements EntryPoint {
...
 /**
   * This is the entry point method.
   */
  public void onModuleLoad() {
...
 final FormPanel form = new FormPanel();	  
    VerticalPanel vPanel = new VerticalPanel(); 
    // http://google-web-toolkit.googlecode.com/svn/javadoc/latest/com/google/gwt/user/client/ui/FileUpload.html
    form.setMethod(FormPanel.METHOD_POST);
    //The HTTP request is encoded in multipart format. 
    form.setEncoding(FormPanel.ENCODING_MULTIPART); //  multipart MIME encoding
    form.setAction("/FileUploadGreeting"); // The servlet FileUploadGreeting
    
    form.setWidget(vPanel);
    
    FileUpload fileUpload = new FileUpload();
    fileUpload.setName("uploader"); // Very important    
    vPanel.add(fileUpload);    
    
    Label maxUpload =new Label();
    maxUpload.setText("Maximum upload file size: 1MB");
    vPanel.add(maxUpload);
        
    vPanel.add(new Button("Submit", new ClickHandler() {
        public void onClick(ClickEvent event) {
                form.submit();
        }
    }));
    
    RootPanel.get("uploadContainer").add(form); 
...
}     
}

You need to add the FileUpload widget inside a FormPanel widget. Set the action (servlet) that will be called when the user submits the form.
Line 43 is very important. You need to set a name to the FileUpload widget, otherwise the upload will not work. In fact, all of the fields under the FormPanel that you want to use need to have a name so that the HttpServlet can identify them.
The HTTP request is encoded in multipart format (line 37).
The generated HTML code will contain the following line :

<form action="FileUploadGreeting" method="POST" enctype="multipart/form-data">

In the server side, create the servlet that will be called when the user clicks on the Submit button :

package com.mycompany.server.form;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Iterator;
import java.util.List;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.commons.fileupload.FileItem;
import org.apache.commons.fileupload.FileItemFactory;
import org.apache.commons.fileupload.FileUploadBase.SizeLimitExceededException;
import org.apache.commons.fileupload.FileUploadException;
import org.apache.commons.fileupload.disk.DiskFileItemFactory;
import org.apache.commons.fileupload.servlet.ServletFileUpload;

public class UploadFileHandler extends HttpServlet {
	
	private static final long serialVersionUID = 1L;
	
	public void doPost(HttpServletRequest request, HttpServletResponse response)
	throws ServletException, IOException {
			
	 System.out.println("Inside doPost");		
		
		// Create a factory for disk-based file items
		FileItemFactory factory = new DiskFileItemFactory();
		// Create a new file upload handler
		ServletFileUpload fileUpload  = new ServletFileUpload(factory);
		// sizeMax - The maximum allowed size, in bytes. The default value of -1 indicates, that there is no limit.
		// 1048576 bytes = 1024 Kilobytes = 1 Megabyte
		fileUpload.setSizeMax(1048576);  
		
		if (!ServletFileUpload.isMultipartContent(request)) {
		      try {
		    	
				throw new FileUploadException("error multipart request not found");
			} catch (FileUploadException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}
		 		  		
		try {

			List<FileItem> items = fileUpload.parseRequest(request);
			
			if (items == null) {			
                response.getWriter().write("File not correctly uploaded");
                return;
          }
			
			Iterator<FileItem> iter = items.iterator();

			while (iter.hasNext()) {
				FileItem item = (FileItem) iter.next();
				
				////////////////////////////////////////////////
				// http://commons.apache.org/fileupload/using.html								
				////////////////////////////////////////////////

				//if (item.isFormField()) {															
					String fileName = item.getName();
					System.out.println("fileName is : " + fileName);	
					String typeMime = item.getContentType();
					System.out.println("typeMime is : " + typeMime);	
					int sizeInBytes = (int) item.getSize();
					System.out.println("Size in bytes is : " + sizeInBytes);	
					//byte[] file = item.get();					
					item.write(new File("fileOutput.txt"));		        							
				//}
			}
			
			PrintWriter out = response.getWriter();
			response.setHeader("Content-Type", "text/html");
			out.println("Upload OK");
			out.flush();
			out.close();

		} catch (SizeLimitExceededException e) {
			System.out.println("File size exceeds the limit : 1 MB!!" );			
		} catch (Exception e) {
			e.printStackTrace();
			PrintWriter out = response.getWriter();
			response.setHeader("Content-Type", "text/html");
			out.println("Error");
			out.flush();
			out.close();
		}
		
	}
	
	public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
		doPost(request, response);
	}

}

You can easily set a size limit for uploaded files (line 36). An SizeLimitExceededException exception is raised if the size exceeds the limit (line 84).
The parseRequest(…) method, line 50, returns the list of items that were submitted.
The method isFormField() determines whether or not an item is a plain form field, as opposed to a file upload. I have commented it out at line 66 because the form only contains one field, which is the uploaded file.
You can also easily get information about the uploaded file (name, size, typeMime).
In the end, i simply write the uploaded file into a new file called fileOutput.txt which is saved at the root of the project.

Update the deployment descriptor file to declare the servlet and map it to an URL:

 <!-- Upload -->
	<servlet>
		<servlet-name>FileUploadGreeting</servlet-name>
		<servlet-class>com.mycompany.server.form.UploadFileHandler</servlet-class>
	</servlet>
	
	<servlet-mapping>
		<servlet-name>FileUploadGreeting</servlet-name>
		<url-pattern>/FileUploadGreeting</url-pattern>
	</servlet-mapping>

Finally compile and run the project in GWT Development Mode. Right-click anywhere in the Project Explorer and choose “Run As -> Maven Build…” and run the “gwt:run” goal:

Here is a screenshot of the page with the FileUpload widget added:

The whole code is available on GitHub : https://github.com/longbeach/GWTCommonsFileUpload

Links :
http://www.ietf.org/rfc/rfc1867.txt

Categories: Git, GWT, Maven, RIA and RDA, SCM Tags:

Blinking 8 leds with the Arduino microcontroller board

November 6th, 2011 No comments

I put on YouTube a short video that I made of the breadboard and the Arduino, with 8 leds blinking in a sequence.
The C code for that sketch is here : http://ardx.org/src/circ/CIRC02-code.txt
It is a very easy exercise but also where one can start writing his own programs, for instance to try to modify the sequence of the blinking lights.

Categories: Arduino, C Tags:

Lightning my first led with the Arduino Uno microcontroller board

November 5th, 2011 No comments

And now for something new – at least for me – here is some electronics programming.
I got the chance to meet a coworker who is very much into it and he was nice enough to provide me with the basic information to get started. So the first thing I did was ordering the SparkFun inventor’s kit.
I just received it and it comes with a booklet that provides a few exercises.
That kit also contains an Arduino Uno microcontroller that you can connect to a computer through the USB port.
There is an Arduino IDE that comes with samples. These samples are programs written in C.
A program is called a sketch. You can easily upload it to the Arduino microcontroller, through the Arduino IDE menu. Here is the code (very simple) to turn on an LED for 1 second only, then off for 5 seconds, repeatedly :

void setup() {                
  // initialize the digital pin as an output.
  // Pin 13 has an LED connected on most Arduino boards:
  pinMode(13, OUTPUT);     
}

void loop() {
  digitalWrite(13, HIGH);   // set the LED on
  delay(1000);              // wait for a second
  digitalWrite(13, LOW);    // set the LED off
  delay(5000);              // wait for 5 seconds
}

Actually, these methods need to be wrapped up into a main method and you need to include WProgram.h but the Arduino IDE does it for you. A plugin for Eclipse also exists, you can find it here.
Pluging the different parts (pin headers, led, wires, resistor) to the breadboard and the Arduino board is a piece of cake :

There is one layout sheet per exercise to pin to the breadboard (on the left).

Links :
http://www.arduino.cc
http://robotmill.com/2011/02/12/arduino-basics-blink-an-led/

Categories: Arduino, C Tags:

Generate the database schema with Hibernate3 Maven Plugin

October 19th, 2011 No comments

There is a nice Maven plugin for JPA/Hibernate that makes it possible to quickly generate the database schema (SQL) and save it in a file.
The artifactId of this plugin is hibernate3-maven-plugin.
It will scan all JPA annotations in the class files of the entities and generate the corresponding SQL queries.
A persistence.xml file is required.

  1. With version 2.2 :

Content of the pom.xml :


<build>
  <plugins>
...
<plugin>
				<groupId>org.codehaus.mojo</groupId>
				<artifactId>hibernate3-maven-plugin</artifactId>	
                                <version>2.2</version>			
				<configuration>
		       	   <components>
						<component>
							<name>hbm2ddl</name>
							<implementation>jpaconfiguration</implementation>																									
						</component>							
					</components>
				   <componentProperties>
                    <drop>true</drop>
                    <create>true</create>
                    <export>false</export>
                    <format>true</format>                    <outputfilename>schema-${DataBaseUser}-${DatabaseName}.sql</outputfilename>
                    <persistenceunit>myPU</persistenceunit>
                    <propertyfile>src/main/resources/database.properties</propertyfile>
                </componentProperties>
			  </configuration>		
			  <dependencies>
			  	<dependency>
					<groupId>com.oracle</groupId>
					<artifactId>ojdbc14</artifactId>
					<version>10.2.0.2.0</version>
				</dependency>			  
			  </dependencies>					
		</plugin>
	  </plugins>
	</build>

Read more…

Categories: Hibernate, JPA, Maven, Oracle Tags:

Load balancing with Apache, mod_jk and Jonas

October 9th, 2011 No comments

Here are the quick steps to configure a cluster of Jonas instances, with Apache and the module mod_jk.

Instance HTTP AJP Connector jvmRoute
Jonas 1 9000 8009 worker1
Jonas 2 9003 8010 worker2

The Apache web server receives the client requests and forwards them to one of the Jonas instances :

Read more…

Categories: Apache, Architecture, JOnAS Tags:

How to configure the module mod_jk with Apache and Jonas ?

October 6th, 2011 No comments

Apache is often the web server used in front of an application server (for instance Jonas).
You need a module like mod_jk to configure Apache with Jonas. Here are the details to configure that module (under RedHat Linux) :

1) Download and install the module mod_jk
mod_jk is an Apache module which can be used to forward a client HTTP request to an internal application server, using the Apache JServ Protocol (AJP).
To get it, type the following commands :

cd /usr/lib64/httpd/modules
wget http://archive.apache.org/dist/tomcat/tomcat-connectors/jk/binaries/linux/jk-1.2.31/x86_64/mod_jk-1.2.31-httpd-2.2.x.so
chmod 755 mod_jk.so
/etc/init.d/httpd restart

The module will show up in the modules folder of Apache : /etc/httpd/modules

2) Configure Apache and mod_jk
First, you need to load the mod_jk module into Apache when it starts.
In the main Apache configuration file, /etc/httpd/conf/httpd.conf, add the following line :

LoadModule jk_module modules/mod_jk.so

Then you need to specify the path to the workers properties file and configure the contexts and the workers which will handle these contexts :

<IfModule jk_module>

JkWorkersFile /etc/httpd/conf/workers.properties
JkShmFile /etc/httpd/logs/mod_jk.shm
JkLogFile /etc/httpd/logs/mod_jk.log
JkLogLevel debug
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
JkRequestLogFormat "%w %m %V %T"

JKMountCopy All

# Send requests for context /blabla/* to worker named worker1
JkMount /blabla/* worker1
# Send requests for context /blabla* to worker named worker1
jkMount /blabla* worker1
# Send requests for context /blabla to worker named worker1
jkMount /blabla worker1
# Send requests for context /blabla/ to worker named worker1
jkMount /blabla/ worker1
# Send requests for context /anotherContext* to worker named worker1
jkMount /anotherContext* worker1

The line JKMountCopy All is very important. It copies the mount point definitions in all the virtual hosts.
Create and add the file workers.properties to /etc/httpd/conf/ :

# Worker list
worker.list=worker1
# Define worker1
worker.worker1.port=8009
worker.worker1.host=127.0.0.1
worker.worker1.type=ajp13
worker.worker1.lbfactor=1
worker.worker1.cachesize=10
# Load-balancer
worker.loadbalancer.type=lb
worker.loadbalancer.balanced_workers=worker1
worker.loadbalancer.sticky_session=1
worker.loadbalancer.local_worker_only=1

Here I have created only one worker (worker1) because I just want to forward requests to an instance of Jonas. There is no load-balancing in this configuration since there is only one single instance.
loadbalancer is not a real worker, it is responsible for the management of several “real” workers.

3) Configure Jonas
To enable AJP connections to the 8009 port of the Jonas server, you need to create an AJP connector in the $Jonas_Base/conf/tomcat6-server.xml and $Jonas_Base/conf/tomcat7-server.xml files :

  <!-- Define an AJP 1.3 Connector on port 9009 -->    
    <Connector port="8009" protocol="AJP/1.3" redirectPort="9043" />
  <!-- An Engine represents the entry point (within JOnAS/Tomcat) that processes
         every request.  The Engine implementation for Tomcat stand alone
         analyzes the HTTP headers included with the request, and passes them
         on to the appropriate Host (virtual host). -->

    <!-- You should set jvmRoute to support load-balancing via AJP ie :
    <Engine name="Standalone" defaultHost="localhost" jvmRoute="jvm1">
    -->
    <Engine name="JOnAS" defaultHost="localhost" jvmRoute="worker1">

Here is a schema that sums things up:

Links :
http://tomcat.apache.org/connectors-doc/ajp/ajpv13a.html
http://jonas.ow2.org/current/doc/doc-en/integrated/configuration_guide.html#N11C7D
http://tomcat.apache.org/connectors-doc/generic_howto/loadbalancers.html

Categories: Apache, Architecture, JOnAS Tags:

Start JoNaS server from Jenkins with a shell script

September 19th, 2011 No comments

Continuous integration is a process that involves the build of the project but also the deployment of the artefacts (the term used with Maven). These artefacts are archives such as EAR, WAR, JAR files.
I had to write a shell script in Jenkins that would run just after the build and that would stop the JoNaS server, deploy the artefacts and restart JoNaS.

The project is based on Maven. However to create a job that executes a shell script, the first option “Build a free-style software project” seems best, instead of the “Build a maven2/3 project” option.
The process is quite simple :
1) stop JoNaS
2) delete the previous EAR file
3) copy the EAR file that was just built to the JoNaS deploy folder
4) restart JoNaS

I chose to poll the SCM every 10mn. So that means JoNaS checks every 10mn if a commit occured during the last 10mn. If so, it builds the project again. And deploys it.

Notice the highlighted line export BUILD_ID=dontKillMe. This is very important.
I spent about 2 hours wondering why JoNaS would start and then would stop after 20 or 30 seconds.
The reason is that Jenkins attempts to clean up after itself, so all processes that have that build ID are killed by default. So that processes do not accidentally leak and run the machine out of memory, for one.

The issue is described in this page with a funny title :
https://issues.jenkins-ci.org/browse/JENKINS-2729
Other interesting link :
http://wiki.hudson-ci.org/display/HUDSON/ProcessTreeKiller

Categories: Continuous integration, Jenkins, JOnAS Tags:

Proxy vs Reverse-proxy

September 12th, 2011 No comments

The other day I was asked by a coworker the difference between a proxy and a reverse-proxy.
These are 2 types of servers that are largely used in front of an application server. Many companies and schools filter their internal network through proxies.
So I made this drawing, I think it should come handy to a lot of people.
Many people are familiar with the proxy server that they need to configure in their browser to access the internet. But few people are familiar with the reverse-proxy server.

To sum things up :
1) The proxy server’s main job is to cache pages so it serves them if the client asks them again.
2) The reverse-proxy server’s main job is to secure the servers as it takes the incoming requests from the internet and forwards them to the servers. Other jobs are : load balancing, filtering and also caching.

The reverse-proxy can be located in a demilitarized zone (DMZ), that is a very secured area, between 2 firewalls for instance.
One thing to remember :
if the firewall is removed, the client still can access the internet. If the proxy is removed, the client cannot access the internet.

Categories: Architecture Tags:

Starting JOnAS as a Service on Linux

August 18th, 2011 No comments

Here is a startup script for JOnAS, on Linux. It is a nice way to automatically start JOnAS when Linux reboots. It is quite trivial to write but I am sure it will turn out useful for anyone who is still not familiar with startup scripts on Linux. Call it jonas and save it in the directory /etc/init.d. You must be root to do that.

#! /bin/bash
# chkconfig: 2345 95 20
# description: Description of the script
# processname: jonas 
#
#jonas Start the jonas server.
#

NAME="Jonas 5.2.1"
JONAS_HOME=/home/test/jonas-full-5.2.1
JONAS_USER=test
LC_ALL=fr_FR
export JONAS_HOME  JONAS_USER LC_ALL
cd $JONAS_HOME/logs
case "$1" in
  start)
    echo -ne "Starting $NAME.\n"
    /bin/su $JONAS_USER -c "$JONAS_HOME/bin/jonas -bg start "
    ;;

  stop)
    echo -ne "Stopping $NAME.\n"
    /bin/su $JONAS_USER -c "$JONAS_HOME/bin/jonas stop "
    ;;

  *)
    echo "Usage: /etc/init.d/jonas {start|stop}"
    exit 1
    ;;
esac

exit 0

Read more…

Categories: CentOS, JOnAS, Linux Tags:

Running 2 instances of JOnAS on the same Linux server

August 7th, 2011 No comments

To run another instance of JOnAS 5.2.1 on the same server one needs to change the ports of the new instance.
Initially, the JONAS_BASE environment variable points to /home/test/JonasInstances/instance1
To create a second instance, I just run the newjb command.
2nd instance with newjb
Read more…

Categories: JOnAS Tags:

Remotely accessing the database homepage from a browser

July 24th, 2011 No comments

The last step to complete the installation of Oracle usually requires to configure the database (users, schemas, tables etc) through the apex web page.
If your server is running locally, then all you need to do is point your browser to the following URL:
http://localhost:8080/apex (or another port if you did not use the default one, 8080).

However if you have installed Oracle on a remote server, this URL will not work.
In order to make it work, I found out that you need to enable remote HTTP connection with SQL command line :

[me@somewhere admin]# sqlplus
SQL*Plus: Release 10.2.0.1.0 - Production on Sun Jul 24 21:23:20 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Enter user-name: SYSTEM
Enter password: 
Connected to:
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production
SQL> EXEC DBMS_XDB.SETLISTENERLOCALACCESS(FALSE);
PL/SQL procedure successfully completed.

Link :
http://download.oracle.com/docs/cd/B25329_01/doc/admin.102/b25107/network.htm#BHCBCFBA

Categories: CentOS, Linux, Oracle Tags:

Running Oracle SQLPlus with Linux

July 23rd, 2011 No comments

Environment :
Linux kernel : 2.6.18-194.26.1.el5 (uname -r)
Distro : CentOS release 5.5 (Final) (cat /etc/issue)
Oracle : Oracle Database 10g Express Edition Release 10.2.0.1.0 – Production

If you get the following annoying message :

[me@somewhere]$sqlplus
Error 6 initializing SQL*Plus
Message file sp1.msb not found
SP2-0750: You may need to set ORACLE_HOME to your Oracle software directory

then do not waste your time installing patches, changing files and folders permissions etc.
The problem resides in the environment variables settings.
You need to set up the ORACLE_HOME variable correctly.

If after setting that ORACLE_HOME environment variable correctly, you get this other annoying message :

[me@somewhere ~]# sqlplus

SQL*Plus: Release 10.2.0.1.0 - Production on Sat Jul 23 17:48:03 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Enter user-name: SYSTEM
Enter password: 
ERROR:
ORA-12162: TNS:net service name is incorrectly specified

then you need to set up other environment variables (ORACLE_SID, NLS_LANG, LD_LIBRARY_PATH).

Fortunately Oracle provides a script that contains all these environment variables with the right values.
This script is called oracle_env.sh and is located here :
/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/bin

All you need to do is insert these lines in your .bash_profile and you’re ready to connect to SQLPlus in no time !

Categories: CentOS, Linux, Oracle Tags:

[Tutorial] Log4J with Maven profiles

June 25th, 2011 No comments

Here is a quick tutorial to get your hands dirty with the Log4j logging framework.
Log4j allows logging requests to print to multiple output destinations, also known as appenders.
There are several output destinations: console, files, sockets, emails …
First create a Maven project :
mvn archetype:generate
Choose archetype number 109 (quickstart)

Read more…

Categories: Log4j, Maven Tags:

[GWT] Reduce the number of permutations

May 31st, 2011 No comments

Here is a tip that I found on other blogs and I think is worth mentioning again. I tried it, it reduced the number of permutations from 15 to 3 only (1 permutation for IE, 1 permutation for the FR locale, 1 permutation for the EN locale). That means a compilation time of 1:05.750s instead of 2:02.828s.

In the module file (blabla.gwt.xml), you need to add this line :

<set-property name="user.agent" value="ie6" />

This of course will produce permutations for Internet Explorer 6 only.
There is also another property to define the locales :

<extend-property name="locale" values="fr" />

So this will generate 2 permutations only : 1 for IE and 1 for the FR locale.

Here is a list of user agents :
http://code.google.com/p/google-web-toolkit/source/browse/trunk/user/src/com/google/gwt/user/UserAgent.gwt.xml

Categories: GWT Tags:

Java puzzlers

May 29th, 2011 No comments

This week I attended the very first and very well organized conference What’s Next ? in Paris and one of the speakers was Neil Gafter, co-author of the book “Java™ Puzzlers: Traps, Pitfalls, and Corner Cases” (2005).
He shared with us 2 or 3 puzzlers which I will share here too :
Question 1) What will the following program print ?

import java.util.Random;

public class Rhymes {
    private static Random rnd = new Random();

    public static void main(String[] args) {
        StringBuffer word = null;
        switch(rnd.nextInt(2)) {
            case 1: word = new StringBuffer('P');
            case 2: word = new StringBuffer('G');
            default: word = new StringBuffer('M');
        }
        word.append('a');
        word.append('i');
        word.append('n');
        System.out.println(word);
    }
}

Read more…

Categories: Java Tags:

Eclipse Dali vs Hibernate Tools

May 8th, 2011 No comments

The process of mapping tables to entities is greatly simplified with tools like Eclipse Dali and Hibernate Tools, both available as Eclipse plugins. It avoids mapping them by hand, which in my opinion is prone to mapping errors and takes more time. And I really do not see why one should map them by hand when great tools like Eclipse Dali and Hibernate Tools are available.
In my book, I describe the use of the Eclipse Dali plugin to automatically generate the entities.
Lately I have also used Hibernate Tools and I have already noticed a few differences between these two tools.
I am going to list some of these differences.
Read more…

Categories: Eclipse, EJB, EJB 3.0, Hibernate, JPA Tags:

[GWT] Table with pagination and one sortable column

April 10th, 2011 No comments

I just added a very basic GWT project to my github account to display a table with pagination and one sortable column.
The code is mostly based on the official GWT tutorial :
http://code.google.com/intl/en/webtoolkit/doc/latest/DevGuideUiCellTable.html

I basically just added the SimplePage element to handle pagination :

SimplePager pager = new SimplePager();
pager.setDisplay(table);

Source :
https://github.com/longbeach/MyFirstCellTable
Demo :
http://tableaupagination.appspot.com/

Categories: Git, Google App Engine, GWT, SCM Tags:

How to rollback a transaction in GAE

April 10th, 2011 No comments

This has nothing to do with JPA.
I was trying to deploy a GWT webapp to GAE when I suddenly got an error.
As a remedy, i got the message :
"java.io.IOException: Error posting to URL: https://appengine.google.com/api/appversion/create?app_id=tableaupagination&version=2&
409 Conflict
Another transaction by user xxxxx is already in progress for this app and major version. That user can undo the transaction with appcfg.py's "rollback" command."

After digging into Google a bit, I found that the way to launch that rollback command under Windows is the following :
1) Figure out where the Google App Engine Java SDK directory is.
In my case, it’s under the Eclipse plugins directory :
D:\Dev\eclipse-jee-helios-SR1-win32\plugins\com.google.appengine.eclipse.sdkbundle.1.4.2_1.4.2.v201102111811

The bin folder contains the appcfg.cmd command.

2) Under a DOS command prompt, go to the workspace folder of your project and launch the following command :
"D:\Dev\eclipse-jee-helios-SR1-win32\plugins\com.google.appengine.eclipse.sdkbundle.1.4.2_1.4.2.v201102111811\appengine-java-sdk-1.4.2\bin\appcfg" rollback war

After that, you can try to deploy the app again and it should work.

Links :
http://code.google.com/intl/fr/appengine/docs/java/tools/uploadinganapp.html
http://code.google.com/intl/fr/appengine/docs/java/gettingstarted/uploading.html

Categories: Google App Engine, GWT Tags:

[Tutorial] Create your first GWT project and deploy it to GAE

April 9th, 2011 No comments

Here is a first quick and simple tutorial on how to develop and deploy your very first GWT web application to Google App Engine.

Requirements :
Eclipse
Google Web Toolkit plugin for Eclipse
A Google App Engine account + GMAIL account

So here are the steps :
Read more…

Categories: Google App Engine, GWT, Tutorials Tags:

Code for the book is now on GitHub

April 3rd, 2011 No comments

I finally added the code for the book “Les EJB 3 (avec Struts 2, JSF 2, JasperReports 3, Flex 3)” to GitHub.

To grab it, you have 2 options :
1) use GIT and type :
git clone git://github.com/longbeach/VenteEnLigne.git
2) use SVN and type :
svn co http://svn.github.com/longbeach/VenteEnLigne

As a matter of fact, and surprising as it might sound, you can use SVN to grab code from GitHub 🙂

Categories: EJB, EJB 3.0, EJB 3.1, Git, SCM, Subversion Tags:

Maven integration for Eclipse JDK Warning

March 26th, 2011 No comments

After installing the M2Eclipse plugin, I get this annoying warning message every time i start Eclipse :

Read more…

Categories: Eclipse, Maven Tags:

Installing Git on Cygwin

March 16th, 2011 No comments

Installing Git on the free Unix emulator Cygwin has become pretty easy. There is no need for compilation, you just need to download the packages. Here are the steps :
Read more…

Categories: Git, SCM Tags: