Harder, Better, Faster, Stronger

11/2/2022

Introduction

Evolution can be a slow process. However, focusing on your strengths can help speed evolution along. The theme of this milestone is unequivocally the song Harder, Better, Faster, Stronger by Daft Punk. For SingularAgent to improve as an artificial intelligence, I need to work hard in order to make it better so that SingularAgent can become faster and stronger.

Today I am announcing that SingularAgent has completed the Counting unit in the Early Math course at Khan Academy. The Counting unit contains 13 practice lessons, 2 quizzes, and 1 unit test. Each practice lesson contains 7 questions. Quiz 1 contains 7 questions, Quiz 2 contains 6 questions, and the Unit Test contains 13 questions (1 question for each practice lesson). After some training, SingularAgent was able to correctly answer each question. The YouTube videos for the first 3 practice lessons in the Counting unit are provided in my Many Methods Make Light Work blog post. The rest of the YouTube videos are embedded at the end of this page.

Harder

For this milestone I asked myself, what is the minimum that SingularAgent needs to learn before it can answer all the questions correctly? Given a question, what needs to happen in order to arrive at the correct answer? At the beginning, the work in between the question and answer is unknown.

Question -> [ ? ] -> Answer

SingularAgent knows how to do simple things very well. So if a hard task is broken down into enough pieces, then SingularAgent will be able to execute and arrive at the answer. Here is an example of a simple question:

First, I know that the Check button needs to be clicked after it is enabled.

Question -> [ ? ] -> [click Check button] -> Answer

Next, I know that the number in the question needs to be read.

Question -> [read number] -> [ ? ] -> [click Check button] -> Answer

Then, I need to drag the animal tiles into the gray box based on the number previously read.

Question -> [read number] -> [click and drag an animal tile into box, X number of times] -> [click Check button] -> Answer

Then, I need to continue splitting each step to make it even simpler to understand.

Question -> [get pixels on screen] -> [match pixels to number] -> [click and drag an animal tile into box, X number of times] -> [click Check button] -> Answer

I essentially keep breaking down each step in the process until each step is small and simple. At some point, you cannot make a step any simpler. It doesn't matter how hard the original question was to answer since now we have a simple step-by-step process to arrive at the answer. I can abstract away the complexity of a solution by doing this.

Better

As SingularAgent completed more practice lessons, I started to look for ways that I could improve. A lot of the questions contained radio buttons A, B, and C that needed to be clicked in order to answer the question. For example:

So, I created a generic process that looked for the screen coordinates of answer radio buttons. The answer radio buttons could be spaced more or less vertically depending on the size of the picture / text in the answer to the right of the radio button. Also, the answer radio buttons could be closer to the left side of the page or more towards the middle of the page depending on the question. Once I had a generic process that could give me the coordinates, SingularAgent was able to use that information on each question that had radio buttons and could more quickly arrive at an answer.

I also ended up creating a generic process that would look for the dividers between the answers as well. That was useful for being able to determine where one answer stopped and another answer started. Khan Academy would continually rotate the answers even if the same question popped up again so SingularAgent would have to dynamically determine the answer. In other words, answer A could now be in answer C's location, answer B could be in answer A's location, and etc.

Faster

SingularAgent is a console application that you type commands into in order to define the processes and goals it will follow. A command typically involves specifying a specific method, parameter, parameter value, or etc. After 10+ pages of manually typing commands for the Quiz 1 process, I knew it was time to create an import command. So now I have the ability to define the processes and goals in a text file and then import all the commands in a single go. This saves me tons of time and allows SingularAgent to quickly learn how to accomplish new things.

Stronger

SingularAgent is not perfect. There are many ways that it can grow to become even stronger:

  • faster reading skills

  • faster color/object recognition skills

  • smaller and more generic methods

Take for example this question:

For this style of questions, SingularAgent was given the answer but it first had to figure out what the answer was by reading the question text, reading the answers, looking at the image, and clicking the correct radio button. SingularAgent doesn't have any concept of buttons or a coat. This type of question is a great example of something for SingularAgent to shoot for in the future. After all, artificial intelligence is essentially the pursuit of doing what was previously thought to be impossible.

Demos

The following YouTube videos show SingularAgent answering simple counting math questions. SingularAgent took anywhere from 45 seconds to a couple of minutes to answer a single question. This is due to the speed of its current ability to read (seeing letters and numbers) and identifying objects (seeing and grouping colors). I have some strategies on how to improve that going forward. Since there is a lot of time in which you can see nothing happening, I provided links in the description to jump to the times in the video to see SingularAgent answering a question correctly.