As the Hadoop ecosystem has exploded into many projects, searching for the right answers when questions arise can be a challenge. Thats why I was thrilled to hear about search-hadoop.com, from Sematext. It has a sister site called search-lucene where you can… search lucene!
Search-Hadoop.com searches across projects – JIRAs, source code, mailing lists, wikis, etc. so you can see design and API docs, as well as questions, answers and general documentation. Filtering by project is a big help – but search-hadoop also lets you see the similarities between projects.
Search Hadoop runs on Solr 3.6.1, but will be moving to Solr 4.0 this Fall. Solr 4.0, aka SolrCloud, is a fully distributed version of Solr (indices are sharded and replicated) that uses ZooKeeper for coordination.
The autocomplete feature is particularly cool. It offers several groups of suggestions separated by a lovely thin pink line, so one can easily pick the suggestion to follow. The motivation is that people searching for info often have an idea what type of content they want to see – issues, ML messages, wiki pages, etc.
A couple of cool features: You can also search by author by clicking on the author name in search results. e.g. http://search-hadoop.com/?q=&fc_author=Russell+Jurney. Queries starting with project names are automatically limited to the project name, e.g. http://search-hadoop.com/?q=pig+join will show only results from Pig.