Deep Reinforcement Learning in Othello: A Deep Q-Network Approach

Authors

  • Areej Fatemah Meghji Department of Software Engineering, Mehran University of Engineering and Technology
  • Dua Agha Department of Computer Science and Information Technology, NED University of Engineering and Technology, Karachi, Pakistan
  • Aleena Rafique Department of Software Engineering, Hyderabad Institute for Technology & Management Sciences, Hyderabad, Pakistan

Keywords:

Reinforcement Learning, Deep Q-Network, Othello AI, Self-Play, Epsilon-Greedy, Q-Learning, Game AI, Strategic Learning

Abstract

This research presents the design, implementation, and analysis of a Deep Reinforcement Learning (DRL) agent capable of playing the strategic board game Othello with Deep Q-Network (DQN) architecture trained by playing with a random player. Although the state space of Othello is highly complex (~10²⁸), it has very simple rules, making it an ideal testbed for exploring pure learning properties of DRL algorithms. While modern studies pay attention to architectures such as Transformers and Graph Neural Networks to power Othello, a thorough baseline with the basic DRL algorithms is yet to be found in the literature. Our approach aims to seal this gap by making a full Othello environment and then training a DQN agent with tabula rasa input, with only the raw board state as input, and without any embedded tree search or human play data. The agent uses the epsilon-greedy exploration and linear decay strategy and learns through Q-value approximation through a custom neural network in NumPy. With about 300,000 training games in various epoch configurations, the agent wins 68.1% of the time against a random player, indicating a good deal of strategic learning. Observation demonstrates that there are emergent gameplay patterns that can be interpreted as Othello heuristics: corner control and mobility optimization, although these concepts are not explicitly programmed. The agent also shows good performance in the game as compared to human players, placing it at a hard level of difficulty. This paper presents a reproducible, missing DQN baseline of Othello, allowing fair comparison of more complicated algorithms. The implementation, outcomes, and training procedures provided lay the ground for future studies on the value-based reinforcement learning of board games and beyond.

Author Biographies

Areej Fatemah Meghji, Department of Software Engineering, Mehran University of Engineering and Technology

Assistant Professor - Department of Software Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan

Dua Agha, Department of Computer Science and Information Technology, NED University of Engineering and Technology, Karachi, Pakistan

Dua Agha is a software engineer and currently serving as a Lecturer in the Dept. of Computer Science and Information Technology (CSIT) at NED University of Engineering and Technology, Karachi, Pakistan.

Aleena Rafique, Department of Software Engineering, Hyderabad Institute for Technology & Management Sciences, Hyderabad, Pakistan

Aleena Rafique is a Software Engineer and currently serves as a Lecturer and Program Coordinator of the Software Engineering program in the Department of Computer Science and Related Studies at the Hyderabad Institute for Technology and Management Sciences (HITMS), Hyderabad, Pakistan.

Downloads

Published

2025-03-30