The Power of Reading

11/12/2021

Introduction

It is impossible to live life without having to read. We read so often during the day that we don't even think about it. Reading is so critically important that parents are asked to read to their children everyday. In my previous projects using SingularAgent, I kept running into situations in which I needed to get information (specifically letters and numbers) off of a webpage on the computer screen. Some examples include: being able to read checking, savings, and credit card balances and being able to read the last day to watch a tv show or movie. So, I slowly started to realize that being able to read text would be helpful. And by "read", I mean converting pixels of varying colors into text that SingularAgent can use.

I was using two different approaches as a workaround. The first approach was instructing SingularAgent to use the mouse to click and drag to highlight the desired text and copy it to the clipboard. The second approach was having SingularAgent open up the Google Chrome Developer Tools (right click -> Inspect) to copy the HTML and parse it. Both approaches got the job done but are slightly inefficient. It would be much quicker if SingularAgent could just look at a section of the screen on a webpage (or in an image file) and read the text off of it. Today I am announcing that SingularAgent can read size 12 Arial font with over 90% accuracy. The algorithms that I created should be flexible enough to read different fonts at any size after some training.

Details

When I first tried to code the ability to read into SingularAgent, I thought it would be nice if it could recognize a letter or number at any size. All the letters and numbers would be resized to a particular height and width and then compared. This was doable to an extent with larger fonts simply because there is space between each letter. That would clearly define when one letter ends and another letter begins.

However if the average font size for a website is 14 pixels (or less), then the letters start touching each other.

How do you tell when one letter begins and another letter ends? So I decided to reduce the scope of the problem that I was trying to solve and settled on one font at a fixed size.

SingularAgent can read all of these keyboard characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . ? ! # $ % & ( ) , ' - ; @ [ ] ^ _ ` { } ~ + = | * \ : " / > <

I did a lot of testing using this sentence: The quick brown fox jumps over the lazy dog. I used it (in uppercase and lowercase) because it contains all the letters in the alphabet.

The following characters were slightly difficult to distinguish between each other:

  • c vs o

  • f vs t

  • r vs n

  • rn vs m

  • ' vs " (apostrophe versus double quote)

  • I vs l vs | (capital I versus lowercase l versus vertical bar)

I knew that giving SingularAgent the ability to read would be useful in the long run because most of the internet consists of text, images, and videos. I struggled for a little bit because even though I knew reading was important, I didn't want to spend to much time on one algorithm. I would rather have lots of algorithms that are small and reusable versus one algorithm that only does one thing well. So there was this balance of getting correct results in a timely manner versus having everything work perfectly.

Demo

Here is a demo of SingularAgent typing, reading, and speaking. The lyrics are from Muse's song "New Born" from their album Origin of Symmetry. Just to clarify, SingularAgent was told ahead of time what to type. But SingularAgent had to read the text on the screen to figure out what words to speak.