Interacting in Complex Environments through Large Multimodal Models