There is a research paper on SSRN that discusses modeling VIX in a manner similar to the one I described last week. In particular they use Box-Cox transform to "normalize" VIX time series. The authors claim that optimal B-C parameter lambda is -0.4, much different from normal (1) or log-normal (0) value. However, being a devil's advocate I tried comparing forecasting results for (0) vs (-0.4) and did not find one parameter being clearly better than the other.
At this point I have not tried other suggestions from the paper, however I did find some ways to make the model more robust.