Login | Register
My pages Projects Community openCollabNet

Java Schema Parser

The javadoc within the package contains more comprehensive documentation regarding the classes mentioned below.

The JSaPar package is a java library that provides a parser for flat and CSV (Comma Separated Values) files. The concept is that a schema class denotes the way a file should be parsed or written. The schema class can be built by specifying a xml-document or it can be constructed programmatically by using java code. The output of the parser is usually a org.jsapar.Document object that contains a list of org.jsapar.Line objects which contains a list of org.jsapar.Cell objects.

Supported file formats:
  • Fixed width - Also refered to as flat file. Each cell is described only by its positions within the line. The type of the line is denoted by its position within the file.
  • Fixed width contol value - The same as Fixed width above except that each line type is denoted by a control value in the leading characters of each line.
  • CSV - (Comma Separated Values) Each cell is limited by a separator character (or characters). The type of the line is denoted by its position within the file.
  • CSV contol value - The same as CSV above except that each line type is denoted by a control value in the leading cell of each line.

Events for each line

For very large files there can be a problem to build the complete org.jsapar.Document in the memory before further processing. It may simply take up to much memory. In that case you may choose to get an event for each line that is parsed instead. You do that by registering a sub-class of org.jsapar.ParsingEventListener to the org.jsapar.input.Parser. That way you can process one line at a time, thus freeing memory as you go along.

Converter

If you are only interesting in converting a file of one format into another, you can use the org.jsapar.io.Converter where you specify the input and the output schema for the conversion. The converter uses the event mechanism under the hood, thus it reads, converts and writes one line at a time. This means it is very lean regarding memory usage.

Building java objects

Use the method org.jsapar.Parser.buildJava() in order to build java objects for each line in a file (or input). Note that in order to be able to use this feature, the schema have to be carefully written. For instance, the line type (name) of the line within the schema have to contain the complete class name of the java class to build for each line.

Converting java objects into a file

Use the class org.jsapar.input.JavaBuilder in order to convert java objects into a org.jsapar.Document, which can be used to produce the output file according to a schema.

Using xml as input

It is possilbe to build a org.jsapar.Document by using a xml document according to the XMLDocumentFormat.xsd (http://jsapar.tigris.org/XMLDocumentFormat/1.0). Use the class org.jsapar.input.XmlDocumentParser in order to convert a xml document into a org.jsapar.Document.

Examples

The files for the examples below are provided in the samples folder of the project. The JUnit test org.jsapar.JSaParExamplesTest.java contains a more comprehensive set of examples of how to use the package.

Example of reading CSV file into a Document object according to an xml-schema:

try(Reader schemaReader = new FileReader("samples/01_CsvSchema.xml");Reader fileReader = new FileReader("samples/01_Names.csv")){
  Xml2SchemaBuilder xmlBuilder = new Xml2SchemaBuilder();
  Parser parser = new Parser(xmlBuilder.build(schemaReader));
  Document document = parser.build(fileReader);
}

Example of converting a Fixed width file into a CSV file according to two xml-schemas:

Reader inSchemaReader = new FileReader("samples/01_CsvSchema.xml");
Reader outSchemaReader = new FileReader("samples/02_FixedWidthSchema.xml");
Xml2SchemaBuilder xmlBuilder = new Xml2SchemaBuilder();
Reader inReader = new FileReader("samples/01_Names.csv");
File outFile = new File("samples/02_Names_out.txt");
Writer outWriter = new FileWriter(outFile);
Converter converter = new Converter(xmlBuilder.build(inSchemaReader), xmlBuilder.build(outSchemaReader));
converter.convert(inReader, outWriter);
inReader.close();
outWriter.close();

Example of converting a CSV file into a list of Java objects according to an xml-schema:

Reader schemaReader = new FileReader("samples/07_CsvSchemaToJava.xml");
Xml2SchemaBuilder xmlBuilder = new Xml2SchemaBuilder();
Reader fileReader = new FileReader("samples/07_Names.csv");
Parser parser = new Parser(xmlBuilder.build(schemaReader));
List parseErrors = new LinkedList()
List people = parser.buildJava(fileReader, parseErrors);
fileReader.close();
If you want to run this example, you will need the class org.jsapar.TstPerson within your classpath. The class is not included in the jar file or in the binary package but it can be found in the source package. As an alternative you can create your own TstPerson class and modify the schema 07_CsvSchemaToJava.xml to use that class instead. The class should contain a default constructor plus getters and setters for all the attributes used in the schema.