Paper-Reading record paper notes Cross-Modal Relationship Inference for Grounding Referring Expressions