The easiest way of converting letters to numbers, commas to periods, numbers to letters, and other common OCR transcription and data entry errors is to use my free Google Sheets add-on, Paperpusher. However, if you’d rather do it manually, here’s how you do it below.
One of my favorite advantages of Google Sheets over Excel is that Google Sheets has proper regular expressions. Regular expressions are just a way to search “strings” in spreadsheets and manipulate them. So, for example, if you’ve ever used “Find & Replace” in a Word document (or just “Find”), you’ve used regular expressions.
Speaking of “Find & Replace”, let’s work with that first. I’ve found it’s underused in spreadsheets. Take the following table as an example:
There are three issues here, right? First issue is that my decimal points are commas. Perhaps my ledger is European, or perhaps my transcriptionist is subpar. Second issue is that the “8” in “28” has turned into a B. This is a pretty clear error. Third issue is that the zeroes in 2.00 have turned into “O”s.
These cells are readable to a person, but they are not readable to a computer. If you tried to sum up these cells, or check your math, you’d get an error. So we’d need to fix these cells.
The easiest option would be to use find and replace. So in the image below, as you can see, I simply typed in find “,” and replace with “.” I could do the same with the B and 8, and the O and 0. Voila, I’m done!
But this doesn’t always work. Let’s take a different option as an example, as I show in the next table.
I can’t just “find & replace” all commas with periods here. In that case, 140,000 would turn into 140.000, and 280,000 would turn into 28.000. This is when I have to use regular expressions, so I can target the right “strings”, and only replace commas when I want to.
In my cells, I’m going to look for a comma, followed by 3 numbers. If I can find that, I know I’ve found a legitimate use for a comma (to find a number in the thousands). If I can’t, then I’ve found a mistyped decimal point.
So, here’s my regular expression, using a Google formula: =REGEXMATCH(A1,”\,[0-9]+[0-9]+[0-9]+”)
To walk you through it, I’m looking to match a certain string in the cell A1. So I look for a comma, which I type in as “\,” to let Google know that I’m actually looking for a comma, and not just using a comma in the formula (it’s a symbol, not an operator). Then I look for any digit [0-9], repeated any number of times +, but there have to be at least 3 of them [0-9]+[0-9]+[0-9]+.
Before I run this on my 3 cells, I make sure that Google is recognizing my cells as text, rather than values. It gets fussy if I don’t do that. Check out the screenshot below to see how I do that. (Cell C1 is #VALUE!, because I’ve already jumped ahead to check it by multiplying A1 and B1).
Then, when I run my REGEXMATCH on my 3 cells, I get A1 is FALSE (Google did not find a comma followed by 3 digits), and A2 is true.
So what next? I use this fact to create an IF statement.
To explain this IF statement, it says If B2 is true (if I got a value of true for my regex match), I just put in cell B1. But, if it’s not, I’m going to look in B1 and replace the “\,” with a period.
This gives me the correct values at last of 140,000, 2.00, and 280,000.
For extra credit, I can combine both formulas into one massive formula that outputs the correct answer. It’s hard to read, but it prevents my formulas from taking up too many cells.
Of course, a much easier way to do all of this is to just download my free Google Sheets add-on, Paperpusher, which allows you to convert letters to numbers, numbers to letters, commas to periods, and fix other common data entry and transcription errors. If you have any questions about it, feel free to visit the subreddit!