CSIRO Arch Intranet Search Engine Home
What is CSIRO Arch?
CSIRO Arch is an open source free enterprise search engine based on Apache Nutch, a popular general purpose search engine that is capable of indexing billions of web pages using clusters of computers. Arch uses Nutch and Solr software and adds additional features to provide a powerful and efficient search engine that is optimized for use in corporate web environments. Such environments typically have one or more web sites, with web content provided for external readers and internal use, and one or more "intranet" sites that provide content for internal use only. Arch can be used to search both the external access and restricted access sites and produces extremely high quality search results.
Corporate Search: Can We Just Get Google?
Corporate web environments are a challenging area for modern search engines. Whilst they may include multiple web sites and millions of pages, compared to the global Web they are much smaller and this makes them easier to index. However, the smaller scale of corporate environments and the more restricted access to information also make it harder to estimate the relative importance of documents found on corporate web sites. The search methods used to search the global Web generally do not work well on a smaller scale and this leads to frustration for companies who often find that searches on their intranets are of limited use.
Arch has been specifically designed to provide very high quality searches for intranet web environments. Arch makes use of web server logs and other information that is available within an organization, but not available to external search engines, to provide excellent search results. It is robust and easy for a webmaster to install and maintain, and is extremely efficient at providing relevant and up-to-date information.
Read more in the article Corporate Search: Can We Just Get Google?
Read more in Arch White Paper...