Tic-tac-toe doesn’t call for reinforcement learning, except as an exercise or illustration. Recently, I saw several examples implementing Q-learning, all of which were rather long. I thought I’d give tic-tac-toe with Q-learning a try myself, using Python and TensorFlow, aiming for brevity. The project establishes two baseline strategies and then outperforms them with Q-learning. Many suggestions remain for extending the project further.