I have three different solutions where i store documents with document_ids (search engine, nosql database and self developed semantic indexing application).
I am running queries against all different solutions and would like to merge them using something similar to SQL JOIN. This means I can sometimes have 3 or more different datasets that I need to join on the document_id.
Do you know if Map Reduce on Hadoop or something similar is the best way to solve this problem? These datasets can contain anywhere from 1 document_id to 100 000.
Thanx for your time!