cancel
Showing results for 
Search instead for 
Did you mean: 

Need opinions on data model

cesco75
Champ in-the-making
Champ in-the-making
Hi.
I'm designing a data model with following requirements:

1) Documents should be associated to a "person", that in Alfresco will by just a numerical ID. All other infos are stored in an external DB.
2) Some docs can be aassociated to more than one person.
3) I will need to manage 3-4 millions of docs
4) Documents associciated to a single person usually are below 100
5) I need to retireve documents just for a fixed person (ID).
6) Documents have other properties (like date, category, …) that I eventually use for stats, not for search.

I think the straight way to model it is a multi-value property on docs, but I have some concerns about performance.
Without requirement 2, a good trick  would have been to store contents in a folder structure where all docs of a single person are filed on same folder: in that a case a path search would have make all the work with no concerns on performance…

Any better solution?
Thanks,
Cesco
4 REPLIES 4

jpotts
World-Class Innovator
World-Class Innovator
I see three options:

1. Use a multi-value property that stores the person identifier (as you propose).

2. Use an association between the document object and an object representing the person.

3. Use a multi-value property on an object representing the person that stores the node refs of the docs.

The multi-value property approach will be more efficient from a search standpoint. Plus, it doesn't sound like you are storing anything about a person other than their identifier, so no need to waste an object on a person, which rules out options 2 and 3.

You said that a give person will only have about 100 documents related to them and that a given document could be related to more than one person. But how many people might a specific document be related to?

Jeff

cesco75
Champ in-the-making
Champ in-the-making
Sorry I did't answer before and thanks for your suggestions:

You said that a give person will only have about 100 documents related to them and that a given document could be related to more than one person. But how many people might a specific document be related to?
Jeff

98% of times 1 person is related to just one document but sometimes 1 document is related to a large number of people: tens to thousands in my current understanding

Cesco

mrogers
Star Contributor
Star Contributor
Your multi-valued field containing tens to thousands should be O.K.   But if it goes much above that I'd question the requirements.  In addition there is a Lucene limit of 1000 docs for a search.   I'd not like to see 10,000 or more values in a multi-valued field.

Where large numbers of users are associated with a document tends to be because they have a role rather than individual responsibility.  You could perhaps model that through group membership.

I'd be tempted to try to model the relationship with assocs (Jeff's option 2).   Just my preference Smiley Happy

unknown-user
Champ on-the-rise
Champ on-the-rise
Hi,

How did you solve this finally? I have an exact same requirement. We are building a Policy Management site in Alfresco. Each policy document needs to know who all agreed to that policy, so that when a user logs in, he will only see the pending policies for him.

The noSmiley Surprisedf users who agreed to a policy could be equvilaent to the noSmiley Surprisedf employees in the company which would be 5K to 10K. From a performance stand point, whats the best option multi-valued property or an association between person object? Another suggestion I got the from the forum was to use Preferences service to store the list of policies agreed by a user but it has some cons.

Thanks