Search Research

Here we take a look at options for searching the federation and individual Fedwiki sites.

We're experimenting with full text search of the federation. For the moment this is both easier to code and easier to host than searching the web. search

This page is some initial thoughts and experiments with searching the Federated Wiki sites. What is id doing on Decentralised Academy site?

Here are some alternatives: Alternatives (A-Z order): # Node Based Alternatives - [ ] Norch - [ ] search-index # General Alternatives - [ ] Blekko - [x] ElasticSearch - [ ] IndexDen - [ ] SearchBlox - [ ] Searchify - [ ] Spinn3r - [ ] Websolr

# Google Site Search

Long story short, I started to play with Google Custom Search to create a tool which helps me to search the federation.

With Linked Custom Search Engine and Topical Engines we can create all sorts of interesting dynamically created search engines for wiki.

# Sitemaps

The Sitemap protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded - sitemaps.org

# Elasticsearch

Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents - wikipedia

Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine after Apache Solr, also based on Lucene - elastic.co

"ElasticSearch 101 – a getting started tutorial" is a nice tutorial which includes getting it up and running on OSX or Windows for testing the REST services - joelabrahamsson.com/

This year, for the first time, we have enough activity in the federation to make scraping a sound search strategy.

We have many kinds of things and they can move about. How will we ever find them, make sense of them, see what is missing, wonder at our accomplishment?

The search plugin contains a query as its text, performs that query on emit, and reports results in page as flags ordered by the titles that have them.

Demystifying SEO with experiments. January 27, 2015. Search engine optimization (SEO) has been one of the biggest drivers of growth for Pinterest. However, it wasn’t always easy to find winning strategies at our scale. post

We consider how we might index and query multiple sites within a distributed search database routinely hosted in federated wiki page server/editors.

Tree structures lend themselves to external searching, if we choose an appropriate representation for grouped nodes. From Knuth volume 3, section 6.2.4.

Our scraping experience suggests we should distribute the function across all servers and move slowly enough that site owners can direct the spider's progress.

A distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table. Any participating node can efficiently retrieve the value associated with a given key. wikipedia