In this paper, we describe a distributed multimodal dialogue system architecture based on the concept of hybrid-VoiceXML. It utilizes a special hybrid-construct to integrate multiple multimedia, multimodal processes into one dialogue that includes VoiceXML as its voice modality. The hybrid-construct in our approach has several important functions. It provides an additional abstraction layer for dynamic dialogue generation, which can greatly improve the ef- ficiency and flexibility of the dialogue system. Under the proposed approach, the dialogue control between each interaction channel can be exchanged through the interface of a dynamic XML page. Several case studies are performed. It indicates that the proposed hybrid- VoiceXML approach is highly extensible. It can be used to form platform independent and distributed extensions for multimodal dialogue interaction beyond voice.