Hello, and welcome back to the accessible PDF video series, This video focuses on optical character recognition, or OCR. What is OCR? It's a tool that converts, or attempts to convert text on images into real searchable text. So if you use a scanner to scan a page from a physical journal article, or from a book, and then upload it to a website or a canvas course, you're not really uploading a page of text, you're uploading just a picture of text. Users won't be able to do a control-f search for any of the text on the image and, if your users have disabilities and they rely on tools like screen readers, they won't be able to understand the content on this page, it'll be inaccessible to them. To solve these issues, we can use OCR tools, it reads the text off images and converts it into real text. Now, it's not magic, it is definitely not perfect, the quality of the resulting text is highly dependent upon the quality of the source material, so only go this route if you absolutely need to. Some tips for getting the best possible results from OCR tools: use computer generated text whenever possible; anything but the absolute neatest handwriting, it just does not get interpreted well by OCR tools, so avoid handwriting, including in the margins, and avoid any notes, highlights, or markings in the document such as discolorations, stains, and rips. Scan items in the correct orientation, OCR won't recognize text that is upside down or sideways. If your content is a physical document that can be easily removed from its bindings, do that, and if you have the ability to control your scanner's settings, use high quality scanner settings. It will probably make your file size really really big, and you don't want to upload huge PDFs, but what you can do is run the OCR tool on the original large file, and then reduce the file size before uploading it to the internet. There's tons of tools to reduce PDF file size, honestly, I think that a lot of them seem pretty dicey, but Adobe has one which is trustworthy, which is legitimate. I have it listed on this slide, or you can just Google "reduce pdf file size" and it'll be one of the top options. The second part of this video, I want to run through a couple demos of using the OCR tool. So I have Adobe Acrobat open, and this is an article from Willamette Week, and it's just that an image of a story, so there's no real text on here. To open up the OCR tool choose "scan and OCR" from the sidebar. You have two options for performing optical character recognition, enhance and recognize text. Enhance is pretty much the same thing as recognize text, but it also tries to de-skew the text. So if it's at a slight...if it's not exactly level, it'll try and level it out. In my experience, it isn't super reliable, so I just go with recognize text. I want to work on this file, and you have some setting options. Really the only one that's important is that the output reads "editable text and images", but I believe that is a default option, so you can just leave everything as-is and then choose recognize text. And this sometimes takes a minute to run. OK, here we go, it's finished running, and now I can see that, hey, I have real text, I can select text with my mouse, and I can search for some text. Let's see, I see the word "crash" in here, great, and and it works, so now, if you rely on a tool like a screen reader, it will be able to understand this content and read it out to you. As another example, I have a document, the Constitutio,n which obviously is a very old document, it has all kinds of markings, a lot of wear on it, some of the text is kind of funky, like for some reason these old documents their s's look like f's, and actually, now that I'm reading this, this isn't the constitution, it's the Declaration of Independence, so it was mislabeled online, but it'll still work for this purpose. It's an old document, and the text will not be read very easily with the OCR tool, but i just want to demonstrate what that looks like to you. So, same process, recognize text, and it'll churn for a minute... It'll turn for a long minuteĀ , sorry about this wait... There we go, sometimes larger PDF files will take a long time for optical character recognition to run. And OK, so it's done, and this looks exactly how I expected it to look, it actually looks worse than it did before, and that's because some of the letters were actually interpreted correctly by OCR, but most of them weren't, so you have like this weird mix of half real text and half images of text. It looks like it's it can be selected, but you get like all this stuff, I don't even know what it thinks is processed over here, you'll see that the actual lines of text like not even lined up to the actual text on the page, so so who knows what's actually allegedly contained on this, on this document. I'm going to search for a term that looks like it is interpreted well, so I see the word "people", that looks pretty legible, I'll see if I can find that. Great, yeah, so it does recognize some words, but some words I don't think it will recognize, like it has the word dissolve with these funny s's. I'm gonna do a search and what is this... OK, so this was one of the auto-completed words and it looks like it thinks this word reads d-i-f-p-o-f-c. So it's a pretty good example of when optical character recognition will fail, so I can see the value in having a scan of, well, this isn't the original declaration of independence, but it's having a scan of a document like this, because the image itself has some historical context that's valuable, but because it can't be read accurately by assistive technologies, if you want to use an image like this, make sure you have a have a plain text alternative also available for people who who can't see this content. And that's all that I want to show you for this video, thank you for watching, and I will see you in the next one.