Wireless networks used for Internet of Things (IoT) are expected to largely involve cloud-based computing and processing. Softwarised and centralised signal processing and network switching in the cloud enables flexible network control and management. In a cloud environment, dynamic computational resource allocation is essential to save energy while maintaining the performance of the processes. The stochastic features of the Central Processing Unit (CPU) load variation as well as the possible complex parallelisation situations of the cloud processes makes the dynamic resource allocation an interesting research challenge. This paper models this dynamic computational resource allocation problem into a Markov Decision Process (MDP) and designs a model-based reinforcementlearning agent to optimise the dynamic resource allocation of the CPU usage. Value iteration method is used for the reinforcementlearning agent to pick up the optimal policy during the MDP. To evaluate our performance we analyse two types of processes that can be used in the cloud-based IoT networks with different levels of parallelisation capabilities, i.e., Software-Defined Radio (SDR) and Software-Defined Networking (SDN). The results show that our agent rapidly converges to the optimal policy, stably performs in different parameter settings, outperforms or at least equally performs compared to a baseline algorithm in energy savings for different scenarios.