Apache Nutch
Common Crawl – publicly available internet-wide crawls, started using Nutch in 2014.
DiscoverEd – Open educational resources search prototype developed by Creative Commons
Krugle uses Nutch to crawl web pages for code, archives and technically interesting content.