<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic safely crawl all documents via webscript in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/safely-crawl-all-documents-via-webscript/m-p/280985#M234115</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;what i am trying to do is:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;find all nodes in the repo and get their file size. also get all versions of the node and calculate the overall filesize of the node and its versions. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;how can i safely crawl every document in the repository?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;searchservice is going to hit the result-limit easily, even if i increase the limit, the searches wont return results.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;by traversing recursively through the repository i also seem to fill up the solr caches&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;private static void traverse(List&amp;lt;FileInfo&amp;gt; context) {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (FileInfo node : context) {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (node.isFolder()) {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; traverse(fileFolderService.list(node.getNodeRef()));&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // is file = do stuff&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;}&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;… :44,186&amp;nbsp; INFO&amp;nbsp; [solr.component.AsyncBuildSuggestComponent] [Suggestor-alfresco-1] Loaded suggester shingleBasedSuggestions, took 267411 ms&lt;BR /&gt;… :53,005&amp;nbsp; WARN&amp;nbsp; [cache.node.nodesTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.node.nodesTransactionalCache' is full (125000).&lt;BR /&gt;… :17,075&amp;nbsp; WARN&amp;nbsp; [cache.node.aspectsTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.node.aspectsTransactionalCache' is full (65000).&lt;BR /&gt;… :17,081&amp;nbsp; WARN&amp;nbsp; [cache.node.propertiesTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.node.propertiesTransactionalCache' is full (65000).&lt;BR /&gt;… :19,938&amp;nbsp; WARN&amp;nbsp; [alfresco.cache.contentUrlTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.contentUrlTransactionalCache' is full (65000).&lt;BR /&gt;… :19,991&amp;nbsp; WARN&amp;nbsp; [alfresco.cache.contentDataTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.contentDataTransactionalCache' is full (65000).&lt;BR /&gt;… :49,599&amp;nbsp; WARN&amp;nbsp; [org.alfresco.nodeOwnerTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.nodeOwnerTransactionalCache' is full (40000).&lt;BR /&gt;… :27,516&amp;nbsp; WARN&amp;nbsp; [cache.node.childByNameTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.node.childByNameTransactionalCache' is full (65000).&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;i understand it is an antipattern to grab everything at once, but i don't know of any service/api that allows me to page the results into batches/pages, &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;please enlighten me &lt;img id="smileysad" class="emoticon emoticon-smileysad" src="https://connect.hyland.com/i/smilies/16x16_smiley-sad.png" alt="Smiley Sad" title="Smiley Sad" /&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;version: 5.0.c&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 20 Jan 2016 10:44:40 GMT</pubDate>
    <dc:creator>jaeni</dc:creator>
    <dc:date>2016-01-20T10:44:40Z</dc:date>
    <item>
      <title>safely crawl all documents via webscript</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/safely-crawl-all-documents-via-webscript/m-p/280985#M234115</link>
      <description>what i am trying to do is:find all nodes in the repo and get their file size. also get all versions of the node and calculate the overall filesize of the node and its versions. how can i safely crawl every document in the repository?searchservice is going to hit the result-limit easily, even if i in</description>
      <pubDate>Wed, 20 Jan 2016 10:44:40 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/safely-crawl-all-documents-via-webscript/m-p/280985#M234115</guid>
      <dc:creator>jaeni</dc:creator>
      <dc:date>2016-01-20T10:44:40Z</dc:date>
    </item>
    <item>
      <title>Re: safely crawl all documents via webscript</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/safely-crawl-all-documents-via-webscript/m-p/280986#M234116</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;It's not the solr cache that is filling up, its the transaction cache.&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The node service does give you all results, so apart from being slow that is O.K.&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The problem is that your code is executing in a single transaction and at some point there will likely be a limit on the number of database rows that can be updated&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;What the alfresco code does itself in those situations is to use the batch processor to break up your huge transaction into smaller chunks.&amp;nbsp;&amp;nbsp;&amp;nbsp; That's probably what you want to do here.&lt;/SPAN&gt;&lt;BR /&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 20 Jan 2016 11:38:48 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/safely-crawl-all-documents-via-webscript/m-p/280986#M234116</guid>
      <dc:creator>mrogers</dc:creator>
      <dc:date>2016-01-20T11:38:48Z</dc:date>
    </item>
    <item>
      <title>Re: safely crawl all documents via webscript</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/safely-crawl-all-documents-via-webscript/m-p/280987#M234117</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Is there an easy example that i could re-use? if found user-rename tool but i am even more riddled now&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 20 Jan 2016 13:33:12 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/safely-crawl-all-documents-via-webscript/m-p/280987#M234117</guid>
      <dc:creator>jaeni</dc:creator>
      <dc:date>2016-01-20T13:33:12Z</dc:date>
    </item>
  </channel>
</rss>

