Hyland Connect

Patrick_Sweeney · ‎10-09-2014

In previous versions of OnBase, the Unity API allowed users to get TIFF image data for a file. However, the Unity API handled how that data would be compressed behind the scenes by making determinations for the user based on the file being processed.

For many users it makes sense to leave the decision to OnBase rather than making a determination on compression types every time they want to export a document via the API. Having a greater amount of control however offers more options for users to retrieve the documents the way they want. Storing image heavy documents that are only being used for their text? It’s probably not a priority to store those in a color lossless format eating up storage space to no benefit. Storing important high resolution medical images? The user may want to ensure that this document is retrieved in the least compressed format as the smallest detail could be the most important.

As a customer requested feature we have added this kind of fine, TIFF, compression control in OnBase 14.

Implementation

This new compression control has been implemented using an ImageContentType enumeration that can be used in new overloads for the ImageDataProvider GetPage(), GetPages(), and GetDocument() methods.

This ImageContentType property can be set to TIFF, TIFFLossless, TIFFG4, or Jpeg. Generally speaking the compression types break down as follows:

TIFFG4 will use the most compression and will retrieve black and white page data.
TIFF will attempt to assess the document that is being retrieved and will try to adjust to the optimal compression level for the document (As a note, Unity will use this option by default when no compression is specified).
TIFFLossless will also attempt to adjust the compression level for the document, but uses less compression for color documents. As a result some document viewers have difficulty viewing documents produced by TIFFLossless.
Jpeg compression will always attempt to return the page data using Jpeg compression, although it should be noted that we do not support Jpeg compression for the GetDocument() at this time. Some document viewers may also have difficulty viewing documents produced by Jpeg compression.

A more detailed breakdown of the compression types is available in the Specifics section below.

Example

We mentioned that this new compression control has been implemented by use of a new property and new overloaded methods for GetPage(), GetPages() and GetDocument(). However a simple mention is never as helpful as a good example, so let’s take a look at some real code to use this new feature. First lets take a look at how you can specify the compression level:

ImageContentType compression = ImageContentType.Tiff;ImageContentType compression = ImageContentType.G4;ImageContentType copmression = ImageContentType.TiffLossless;ImageContentType compression = ImageContentType.Jpeg;

Here we're using a variable to store the compression type we want to use later, but you can also use the ImageContentType directly in a method call. For now though let's proceed with setting up the rest of our code to prepare to get the document. For all the document retrieval methods we'll need a valid Rendition object. For GetPage() and GetPages() we'll need a page number or a page range, and for GetPages() and GetDocument() we'll need an ImageGetDocumentProperties object.

ImageDataProvider provider = _app.Core.Retrieval.Image;PageRangeSet pageRange = provider.CreatePageRangeSet("1-1");ImageGetDocumentProperties imageGetProperties = provider.CreateImageGetDocumentProperties();long pageNumber = 1;Document doc = _app.Core.GetDocumentByID(docID);Rendition rendition = doc.DefaultRenditionOfLatestRevision;

Now that we've taken care of the setup and specified our compression type all that's left is to retrieve the document!

PageData pageDataGetPage = provider.GetPage(rendition, pageNumber, compression);PageDataList pageDataList = provider.GetPages(rendition, pageRange, imageGetProperties, compression);PageData pageDataGetDoc = provider.GetDocument(rendition, imageGetProperties, compression);

And here is an example of retrieving a document with G4 compression using the ImageContentType directly:

PageData pageData = provider.GetPage(rendition, pageNumber, ImageContentType.TiffG4);

Don't forget to dispose of PageData objects before leaving your scope!

Specifics

The ability to specify compression types give a greater degree of control to the user, but it would help to have a more in-depth understanding of what each compression type means.

TIFFG4

The ImageContentType with the most compression is TIFFG4. This compression type uses the Group 4 compression standard and, for the compression types available, will produce files of the smallest size. The caveat is that Group 4 compression assumes a bitonal image, so any documents retrieved this way will be retrieved in black and white.

TIFF

The default compression type, TIFF, will attempt to decide between G4 and LZW compression. If the original document is a Group 4 TIFF the TIFF compression type will use the G4 compression. Otherwise it will use the LZW compression. In practice this means that black and white text documents will often be rendered using the G4 compression, whereas color documents will be rendered using the LZW compression. The LZW format compresses less than G4, but outperforms our other lossless compression types for most files. The catch is that LZW compression can underperform for 16 bit and above files, such as high quality image documents, producing large file sizes.

TIFFLossless

TIFFLossless generally has less compression than TIFFG4 or TIFF, but can handle very high quality images more readily than both. Like TIFF, TIFFLossless determines if the original document is a Group 4 TIFF and in the case that it is, processes it using the G4 compression. For documents having a bit depth between 2 and 8, TIFFLossless will, like the TIFF ImageContentType, use LZW compression. But for color documents having a bit depth greater than 8, TIFFLossless will use old-style Jpeg compression. This means that TIFFLossless avoids LZW's difficulty with 16 bit files. However the user should be aware that the files produced using Jpeg compression will have a very low degree of compression making for large file sizes. Also the old-style Jpeg compression is not compatible with some document viewers like Windows Viewer. So to view some files saved with TIFFLossless compression the user may need an external viewer that supports old-style Jpeg compression such as IrfanView.

Jpeg

The Jpeg ImageContentType explicitly enforces the Jpeg compression with no attempt to use a higher compression based on document processing. So a Group 4 plain text TIFF will be processed in the same manner as a high color photoshop image. This compression ensures that the document is returned in its least compressed form, at the cost of accessibility and the largest output file sizes. As noted for TIFFLossless, all files produced using the Jpeg compression are not compatible with some viewers and the user may need to procure an external viewer to access the documents. As a further note, OnBase does not support using the Jpeg ImageContentType with the GetDocument() method. It is only possible to use Jpeg compression when accessing the GetPage() and GetPages() methods.

In Conclusion

For many instances the default TIFF compression will be sufficient to complete the task at hand, yet this new feature would be a welcome one for anyone trying to optimize the returned file size, guarantee the return of the highest quality image data, or ensure the compatibility of the returned document. The ability to specify the compression level of the returned file gives the user more options to determine what that file is and how that file should be used in the future.

That does it for our new feature this week; check out our feature on our new Unity API installer next time. If you any questions or comments, be sure to leave them in the comment area below.