This paper summarizes our recent progress in improving the automatic transcription of Arabic broadcast audio data, and some efforts to address the challenges of the broadcast conversational speech. Our efforts are aimed at improving the acoustic, pronunciation and language models taking into account specificities of the Arabic language. In previous work we demonstrated that explicit modeling of short vowels improved recognition performance, even when producing non-vocalized hypotheses. In addition to modeling short vowels, consonant gemination and nunation are now explicitly modeled, alternative pronunciations have been introduced to better represent dialectical variants, and a duration model has been integrated. In order to facilitate training on Arabic audio data with non-vocalized transcripts a generic vowel model has been introduced. Compared with the previous system (used in the 2006 GALE evaluation) the relative word error rate has been reduced by over 10%.