cancel
Showing results for 
Search instead for 
Did you mean: 

Webinspect scan corrupts lucene indexes

chicks
Champ in-the-making
Champ in-the-making
We're moving Alfresco Labs 3.0Final to QC.  Security folks scanned with Webinspect.  After backup last night, Alfresco didn't start, complaining about invalid protocol in lucene index tree.  Sure enough, there are multiple nonsense directories under lucene-indexes and backup-lucene-indexes, like "http" and many with garbage characters.  Somehow, Alfresco, possibly in one or more of the sample web scripts, allowed the scan to create bogus directories in the lucene index trees.  Instead of simply ignoring these, Alfresco refuses to start up.

We had to drop the indexes and rebuild them on startup, an unacceptable process.  We'll try deleting the sample web scripts, hoping that this will prevent the issue.  However, it's obviously an oversight on the part of Alfresco's QC - surely they have run security scans on Alfresco, hard to believe this issue hasn't cropped up before.

Thanks for any insight into resolving this MAJOR issue.
2 REPLIES 2

chicks
Champ in-the-making
Champ in-the-making
Here's what lucene_indexes winds up looking like after a Webinspect scan:


,`@^*$;<iFrAmE sRc=hTtP                {folderId?}<iFrAmE sRc=hTtP            41076551                               system
,`@^*$;<iMg SrC=hTtP                   {folderId?}<iMg SrC=hTtP               archive                                user
{currentSpace}<iFrAmE sRc=hTtP         {noderef}<iFrAmE sRc=hTtP              avm                                    workspace
{currentSpace}<iMg SrC=hTtP            {noderef}<iMg SrC=hTtP                 baddir123                              www.webinspect.hp.com<iFrAmE sRc=hTtP
{eventId}<iFrAmE sRc=hTtP              {store_type}                           http                                   www.webinspect.hp.com<iMg SrC=hTtP
{eventId}<iMg SrC=hTtP                 {store_type}"><script>alert(097531);<  locks

We've discovered that it is ignored in lucene_indexes, but Alfresco refuses to start when this garbage gets copied to backup_lucene_indexes during the 3:00AM job.  Our workaround is to run our cold backup at 2:15AM, at which time we delete backup_lucene_indexes (it's only needed if you're doing warm backups).  Alfresco will then start right up before backup_lucene_indexes is re-created at 3:00AM.

These garbage directories are obviously being created by the built-in web services.  We've deleted all the sample web scripts, hoping for improvement, but unfortunately the built-in web services are causing the problem.  Webinspect tries to follow every exposed link, and passes in the variable names along with garbage characters.  Somehow these are creating directories in lucene_indexes. 

Alfresco, you have a MAJOR issue here, can someone please look into this?

mikeh
Star Contributor
Star Contributor
Could you please raise this in JIRA? IT will get much more attention there than on the forums.

Thanks,
Mike
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.