Deep Reinforcement Learning in Othello: A Deep Q-Network Approach
Keywords:
Reinforcement Learning, Deep Q-Network, Othello AI, Self-Play, Epsilon-Greedy, Q-Learning, Game AI, Strategic LearningAbstract
This research presents the design, implementation, and analysis of a Deep Reinforcement Learning (DRL) agent capable of playing the strategic board game Othello with Deep Q-Network (DQN) architecture trained by playing with a random player. Although the state space of Othello is highly complex (~10²⁸), it has very simple rules, making it an ideal testbed for exploring pure learning properties of DRL algorithms. While modern studies pay attention to architectures such as Transformers and Graph Neural Networks to power Othello, a thorough baseline with the basic DRL algorithms is yet to be found in the literature. Our approach aims to seal this gap by making a full Othello environment and then training a DQN agent with tabula rasa input, with only the raw board state as input, and without any embedded tree search or human play data. The agent uses the epsilon-greedy exploration and linear decay strategy and learns through Q-value approximation through a custom neural network in NumPy. With about 300,000 training games in various epoch configurations, the agent wins 68.1% of the time against a random player, indicating a good deal of strategic learning. Observation demonstrates that there are emergent gameplay patterns that can be interpreted as Othello heuristics: corner control and mobility optimization, although these concepts are not explicitly programmed. The agent also shows good performance in the game as compared to human players, placing it at a hard level of difficulty. This paper presents a reproducible, missing DQN baseline of Othello, allowing fair comparison of more complicated algorithms. The implementation, outcomes, and training procedures provided lay the ground for future studies on the value-based reinforcement learning of board games and beyond.