我有两张桌子-
bpm_agent_data - 40 Million records , 5 Columns
bpm_loan_data - 20 Million records, 5 Columns现在我在蜂巢里做了一个查询-
select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data where bpm_loan_data.id = bpm_agent_data.id;这需要很长时间才能完成。用蜂巢编写查询的理想方法应该是什么,这样Reducer就不会花费那么多时间。
发布于 2014-05-06 11:24:36
找到上述查询的解决方案,将其替换为ON
select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data ON( bpm_loan_data.id = bpm_agent_data.id);https://stackoverflow.com/questions/23492638
复制相似问题