Left join works, but inverse does not?

Hi All,

I’m new to data point, so I appreciate the help.

I’ve been familiarizing myself with cross-database queries, and was successful in creating a simple query like:

Select *

from DB_A.Schema_A.Table_A A

left outer join DB_B.Schema_B.Table_B B on A.Column_1=B.Column_2

The results return in 2-3 min

However when I do the inverse

Select *

from DB_B.Schema_B.Table_B B

left outer join DB_A.Schema_A.Table_A A on B.Column_2=A.Column_1

It ran for hours before I finally cancelled, some data loaded so I know it worked… (also tried the Right join on the first query above)

before I launch a full IT investigation, is there anything wrong with my joins or common pitfalls I should be aware of?

Thanks in advance

Hi Brian,

on first look your queries don’t look like complicated but you never know what exactly happen in background. For investigating this you can use Explain plan which should help you with comparing difference between steps in both queries. Let me know if it helps.

Filip

The order in which you join can make a difference. Generally you want to start with whatever will give you the smallest data set first, then join to bigger data sets only looking for what matches the smaller set of data (deciding what to put first is dependent on what kind and how much data is in each table). Some optimizers will figure out the best way automatically but I have found cross platform joins don’t do this very well. Cross database joins on the same platform or server usually do okay but sometimes you have to give the optimizer some help. With multiple tables being joined you can put parantheses around the join you want it to do first forcing the optimizer to create that joined dataset before joining to the next table. There are other things that can be done as well to help the optimizer but that is a whole study in itself. Using the explain plan as suggested above is the best way to see what is taking up all the resources and will often suggest what you should do to make it run faster (like build an index on this field). Sometimes it is faster to run a query into a temp table, index the temp table and then join to the temp table in a second query rather than using an inline table/sub query (seems illogical at first but can work wonders in some cases specially with Sql Server). Sometimes you just have to play around and see what works best.

Here is some more info on the subject. [:$]

http://www.toadworld.com/products/toad-data-point/b/weblog/archive/2012/11/21/how-to-write-a-cross-database-query