cancel
Showing results for 
Search instead for 
Did you mean: 

Allow multiple Keyword Values of the same type to be more efficiently added to a DocumentQuery in the Unity API

Tony_Rush
Confirmed Champ
Confirmed Champ

I was going to post this under 'Ideas' but there's no Idea category for Unity API that I saw, so I figured I'd start a discussion here to see if maybe there is already an existing Enhancement SCR out there.

What I'm looking for is the ability to allow multiple Keyword Values of the same type to be more efficiently added to a DocumentQuery in the Unity API. We are currently calling the following line of code 3000 times:

            docQuery.AddKeyword("KeyName", "KeyValue1-3000", KeywordOperator.Equal, KeywordRelation.Or);

The problem is that this is taking about 10 seconds just to generate the query; the code Adds all 3000 Keywords in a loop, then I call docQuery.ExecuteCount(), then I notice a lapse of 10 seconds before I see the query execute in SQL Query Analyzer.

I'm assuming the query building is taking so long because it's performing table look ups for each instance of AddKeyword, not to mention the Keyword in question here is in an MIKG so that complicates things. The resulting SQL query looks something like:

SELECT COUNT(DISTINCT itd.itemnum) FROM hsi.itemdata itd, hsi.keyrecorddata127 kg127_0 WHERE ((itd.status = 0)) AND ((itd.itemdate BETWEEN CONVERT(datetime, '1753-01-02 00:00:00.000', 121) AND CONVERT(datetime, '9999-12-30 23:59:59.997', 121))) AND ((itd.itemnum >= 0 and itd.itemnum <= 9223372036854775807)) AND (itd.itemnum = kg127_0.itemnum) AND (kg127_0.kg417 = N'1000004000' OR kg127_0.kg417 = N'1000517000' OR kg127_0.kg417 = N'1001041002' OR kg127_0.kg417 = N'1001051000' OR ........3000 more times.........)

The execution of the query itself is fine, taking just about 1-2 seconds. What I think would be more efficient is if there were an API method such as:

            docQuery.AddKeywords("KeyName", new string[] { "KeyValue1", "KeyValue2"....}, KeywordOperator.Equal, KeywordRelation.Or);

My question is two fold...

1) If this AddKeywords method were present in the API, would that allow Hyland to more efficiently build the query? (I don't know the back-end code, so am hoping that the MIKG and Keyword lookup would only need to occur once).

2) Is there by chance already an existing enhancement SCR that looks to address this?

3 REPLIES 3

Alex_French
Elite Collaborator
Elite Collaborator

Hi Tony,

I just did a quick test and didn't see what you're describing in our environment.  My creation of the query took no noticeable time.

We're using v17.0.1.57, with SQL Server 2014.  Our database is not huge (maybe 5 million documents).

I did make sure that I was querying an MIKG, not  a simple keyword.  I did see SQL roughly similar to yours in the Diagnostics Console.

Lots of other variables could be in play- have you tried look at all the SQL in Diagnostics Console, not just the resulting "real" query?

Tony_Rush
Confirmed Champ
Confirmed Champ

Thanks for the reply, Alex - that makes me feel hopeful that you're not experiencing the issue! We are seeing this issue at one of our customers (18.0.1.49), as well as in my own personal VM demo (18.0.1.45).

As a test, I just created a brand new Document Type and a new Alpha length 20 Keyword (single table, non-MIKG). I am getting the same results trying to add this standalone Keyword 3000 times. I also tried a random SYS DocType and the Description Keyword...same results.

Could you please try a test in version 18 if you have it available? If it's better to open a support ticket, let me know and  I can do that. Here's some code I just put together which replicates the issue on my end:

Console.WriteLine("Connected to OnBase: " + DateTime.Now);DocumentQuery docQuery = app.Core.CreateDocumentQuery();  DocumentType docType = app.Core.DocumentTypes.Find("TestKeyQuery");docQuery.AddDocumentType(docType);             KeywordType keyType = app.Core.KeywordTypes.Find("StandaloneKeyTest");foreach (string val in keyValues){    docQuery.AddKeyword(keyType.ID, val, KeywordOperator.Equal, KeywordRelation.Or);}Console.WriteLine("Keywords Done Adding: " + DateTime.Now);long queryCount = docQuery.ExecuteCount();                Console.WriteLine("Count Done: " + DateTime.Now);

I tested with 1000, 2000, and 3000 values for the keyValues array. In all cases, the WriteLines "Connected to OnBase" and "Keywords Done Adding" are the same timestamp. The ExecuteCount takes takes this many seconds:

1000 keyValues = 3 seconds

2000 keyValues = 8 seconds

3000 keyValues = 12 seconds

This is from my demo system (4000 documents with stats updated). The customer is even worse, where the 3000 keyValues tests (including the test against the SYS document/Description Keyword) takes about 35 seconds.

I did now check SQL in Diagnostics and there is no activity during the entirety of the Unity API code performing ExecuteCount(). Same with SQL Profiler: no activity during the lengthy delay. Once ExecuteCount() completes, then SQL Diagnostics immediately displays the aforementioned query having taken 15 milliseconds. So I think what this says is that the delay we're seeing is in the client-side Unity API code or within Application Server.

 

Alex_French
Elite Collaborator
Elite Collaborator

Wait... why can't I comment here... I don't understand this newfangled (now a year old?) Community 2.0 thing.

I don't have anything newer than 17.0.1.57 readily available here at Dartmouth.

I looked at your code sample and tried tweaking the only obvious difference - my sample was using .AddKeyword("keyword_type_name", ...) - with no change.  I also checked both a numeric and an alpha keyword type and values.

My next guess would have been to look at the behavior of things like having a Dataset for the Keyword Type, or having a Mask on the keyword.  I wouldn't think either of those *should* matter... and it sounds like your clean and simple test example would rule that out without knowing anything more about the real solution.


That leads me to guess that there's just a change from my version to your version that does something stupid inside the API code.  In some cases an intrepid adventurer can go decompiling and spelunking and then tell Hyland Support exactly where a problem is.

Hmmm... have you tried running the same thing multiple times, with and without disconnecting the API session, but without recycling or resetting iis/app pool/server cache?  I feel like I heard someone mention that some version (18?  Beyond 18?) would startup the AppServer faster because it would cache less data from the database at startup time, and save more loading for individual user login.  But that probably isn't relevant if you're not seeing other SQL traffic.