Hyland Connect

shailendra1 · ‎01-08-2014

During one of our performance tuning exercise, we observed that while querying for all variables using

Activity was using following query defined in the xml

<select id="selectTaskWithVariablesByQueryCriteria" parameterType="org.activiti.engine.impl.TaskQueryImpl" resultMap="taskAndVariablesResultMap">
    ${limitBefore}
    select distinct RES.*,
    VAR.ID_ as VAR_ID_, VAR.NAME_ as VAR_NAME_, VAR.TYPE_ as VAR_TYPE_, VAR.REV_ as VAR_REV_,
    VAR.PROC_INST_ID_ as VAR_PROC_INST_ID_, VAR.EXECUTION_ID_ as VAR_EXECUTION_ID_, VAR.TASK_ID_ as VAR_TASK_ID_,
    VAR.BYTEARRAY_ID_ as VAR_BYTEARRAY_ID_, VAR.DOUBLE_ as VAR_DOUBLE_,
    VAR.TEXT_ as VAR_TEXT_, VAR.TEXT2_ as VAR_TEXT2_, VAR.LONG_ as VAR_LONG_
    ${limitBetween}
    <include refid="selectTaskWithVariablesByQueryCriteriaSql"/>
    ${orderBy}
    ${limitAfter}
</select>

Having a look at the query it seems that the distinct part is unnecessary as using distinct in left outer join with select including values also from table OTHER THAN Base table does not make sense in my opinion. The rows of resultset of this query without distinct would be distinct anyway.
Because of distinct, the database has to work extra. For e.g., using explain plan on this query for MySQL, it shows that this uses a temporary table to copy first all the results to temp area and then process distinct which is un necessary extra work. If the act_ru_variable table size is large, then this involves huge processing.

Removing the distinct returns the same number of rows only this time the processing done by database is not required.
Of course if the query contains only columns from base tables then using distinct makes sense.

Have I overlooked something here or not aware of the reason why distinct is being used ?

shailendra1 · ‎01-08-2014

The resulting query fired is like this

select distinct RES.*,
    VAR.ID_ as VAR_ID_, VAR.NAME_ as VAR_NAME_, VAR.TYPE_ as VAR_TYPE_, VAR.REV_ as VAR_REV_,
    VAR.PROC_INST_ID_ as VAR_PROC_INST_ID_, VAR.EXECUTION_ID_ as VAR_EXECUTION_ID_, VAR.TASK_ID_ as VAR_TASK_ID_,
    VAR.BYTEARRAY_ID_ as VAR_BYTEARRAY_ID_, VAR.DOUBLE_ as VAR_DOUBLE_,
    VAR.TEXT_ as VAR_TEXT_, VAR.TEXT2_ as VAR_TEXT2_, VAR.LONG_ as VAR_LONG_
    from ACT_RU_TASK RES
    left outer join ACT_RU_VARIABLE VAR ON RES.PROC_INST_ID_ = VAR.EXECUTION_ID_ and VAR.TASK_ID_ is null;

jbarrez · ‎01-09-2014

I've removed the distinct: https://github.com/Activiti/Activiti/commit/efdcdf99549456dd9b982b3824940b747d96d4ed

All tests run green on our QA, so it seems safe to keep.
Thanks for pointing this out.

shailendra1 · ‎01-10-2014

Thanks Jorram,
Glad to contribute.

Hyland Connect

Activiti SQL query performance