In previous versions of OnBase, the Unity API allowed users to get TIFF image data for a file. However, the Unity API handled how that data would be compressed behind the scenes by making determinations for the user based on the file being processed.
For many users it makes sense to leave the decision to OnBase rather than making a determination on compression types every time they want to export a document via the API. Having a greater amount of control however offers more options for users to retrieve the documents the way they want. Storing image heavy documents that are only being used for their text? It’s probably not a priority to store those in a color lossless format eating up storage space to no benefit. Storing important high resolution medical images? The user may want to ensure that this document is retrieved in the least compressed format as the smallest detail could be the most important.
As a customer requested feature we have added this kind of fine, TIFF, compression control in OnBase 14.
This new compression control has been implemented using an ImageContentType enumeration that can be used in new overloads for the ImageDataProvider GetPage(), GetPages(), and GetDocument() methods.
This ImageContentType property can be set to TIFF, TIFFLossless, TIFFG4, or Jpeg. Generally speaking the compression types break down as follows:
A more detailed breakdown of the compression types is available in the Specifics section below.
We mentioned that this new compression control has been implemented by use of a new property and new overloaded methods for GetPage(), GetPages() and GetDocument(). However a simple mention is never as helpful as a good example, so let’s take a look at some real code to use this new feature. First lets take a look at how you can specify the compression level:
ImageContentType compression = ImageContentType.Tiff;ImageContentType compression = ImageContentType.G4;ImageContentType copmression = ImageContentType.TiffLossless;ImageContentType compression = ImageContentType.Jpeg;
Here we're using a variable to store the compression type we want to use later, but you can also use the ImageContentType directly in a method call. For now though let's proceed with setting up the rest of our code to prepare to get the document. For all the document retrieval methods we'll need a valid Rendition object. For GetPage() and GetPages() we'll need a page number or a page range, and for GetPages() and GetDocument() we'll need an ImageGetDocumentProperties object.
ImageDataProvider provider = _app.Core.Retrieval.Image;PageRangeSet pageRange = provider.CreatePageRangeSet("1-1");ImageGetDocumentProperties imageGetProperties = provider.CreateImageGetDocumentProperties();long pageNumber = 1;Document doc = _app.Core.GetDocumentByID(docID);Rendition rendition = doc.DefaultRenditionOfLatestRevision;
Now that we've taken care of the setup and specified our compression type all that's left is to retrieve the document!
PageData pageDataGetPage = provider.GetPage(rendition, pageNumber, compression);PageDataList pageDataList = provider.GetPages(rendition, pageRange, imageGetProperties, compression);PageData pageDataGetDoc = provider.GetDocument(rendition, imageGetProperties, compression);
And here is an example of retrieving a document with G4 compression using the ImageContentType directly:
PageData pageData = provider.GetPage(rendition, pageNumber, ImageContentType.TiffG4);
Don't forget to dispose of PageData objects before leaving your scope!
The ability to specify compression types give a greater degree of control to the user, but it would help to have a more in-depth understanding of what each compression type means.
The ImageContentType with the most compression is TIFFG4. This compression type uses the Group 4 compression standard and, for the compression types available, will produce files of the smallest size. The caveat is that Group 4 compression assumes a bitonal image, so any documents retrieved this way will be retrieved in black and white.
The default compression type, TIFF, will attempt to decide between G4 and LZW compression. If the original document is a Group 4 TIFF the TIFF compression type will use the G4 compression. Otherwise it will use the LZW compression. In practice this means that black and white text documents will often be rendered using the G4 compression, whereas color documents will be rendered using the LZW compression. The LZW format compresses less than G4, but outperforms our other lossless compression types for most files. The catch is that LZW compression can underperform for 16 bit and above files, such as high quality image documents, producing large file sizes.
TIFFLossless generally has less compression than TIFFG4 or TIFF, but can handle very high quality images more readily than both. Like TIFF, TIFFLossless determines if the original document is a Group 4 TIFF and in the case that it is, processes it using the G4 compression. For documents having a bit depth between 2 and 8, TIFFLossless will, like the TIFF ImageContentType, use LZW compression. But for color documents having a bit depth greater than 8, TIFFLossless will use old-style Jpeg compression. This means that TIFFLossless avoids LZW's difficulty with 16 bit files. However the user should be aware that the files produced using Jpeg compression will have a very low degree of compression making for large file sizes. Also the old-style Jpeg compression is not compatible with some document viewers like Windows Viewer. So to view some files saved with TIFFLossless compression the user may need an external viewer that supports old-style Jpeg compression such as IrfanView.
The Jpeg ImageContentType explicitly enforces the Jpeg compression with no attempt to use a higher compression based on document processing. So a Group 4 plain text TIFF will be processed in the same manner as a high color photoshop image. This compression ensures that the document is returned in its least compressed form, at the cost of accessibility and the largest output file sizes. As noted for TIFFLossless, all files produced using the Jpeg compression are not compatible with some viewers and the user may need to procure an external viewer to access the documents. As a further note, OnBase does not support using the Jpeg ImageContentType with the GetDocument() method. It is only possible to use Jpeg compression when accessing the GetPage() and GetPages() methods.
For many instances the default TIFF compression will be sufficient to complete the task at hand, yet this new feature would be a welcome one for anyone trying to optimize the returned file size, guarantee the return of the highest quality image data, or ensure the compatibility of the returned document. The ability to specify the compression level of the returned file gives the user more options to determine what that file is and how that file should be used in the future.
That does it for our new feature this week; check out our feature on our new Unity API installer next time. If you any questions or comments, be sure to leave them in the comment area below.