What is OCR?
Optical character recognition (OCR) refers to the process of automatically identifying from an image characters or symbols belonging to a specified alphabet. In this post we will focus on explaining how to use OCR on Android.
Once recognized the text of the image, it can be used to:
-
- Save it to storage.
- Process or edit it.
- Translate it to other language.
Popularity of smartphones combined with ever better cameras has led to an increase in the use of this type of recognition techniques and a new category of mobile apps that make use of them.
On device or in the cloud?
Before using an OCR library, it is necessary to decide where the OCR process should take place, on the smartphone or in the cloud.
Depending on app requeriments, each approach has its advantages and disadvantages.
If the app requires, for example, performing character recognition without internet connection, the OCR engine will be launched on the device itself. In this way, sending images to a server could be avoided because cameras mounted on current devices can take large photos.
On the other hand, OCR libraries tend to occupy much space, being necessary to download each of the languages to recognize, as we will explain below.
What libraries can be used?
In the following link to Wikipedia there is a comparative table with all OCR libraries, supported platforms, programming languages used in its development and other relevant information.
Link: http://en.wikipedia.org/wiki/ List_of_optical_character_recognition_software
In this post we are going to use Tesseract library, that stands out above the rest. It is Open Source, has SDK, was created by HP and is currently developed by Google.
OCR on Android using Tesseract Library
Althoug Tesseract can be run on a Linux server as a cloud service, in this post we will implement Tesseract library in an Android app, launching the OCR engine on the device itself.
The original Tesseract project for Android is called Tesseract Android Tools and contains tools for compiling the Tesseract and Leptonica libraries for use on the Android platform, and a Java API for accessing to these natively-compiled libraries.
Link: https://github.com/rebbix/tesseract-android-tools/tree/master/tesseract-android-tools
For our example, we are going to use a fork of Tesseract Android Tools, which adds more functionality.
Link: https://github.com/rmtheis/tess-two
OCR Example on Android
We need a few simple steps to perform OCR on Android:
- Create a new Android Studio project.
- Add Tesseract library to the project adding the following lines to build.gradle:
dependencies { compile 'com.rmtheis:tess-two:6.0.0' }
- Create a class called TessOCR with the following code:
import android.content.Context; import android.graphics.Bitmap; import com.googlecode.tesseract.android.TessBaseAPI; public class TessOCR { private final TessBaseAPI mTess; public TessOCR(Context context, String language) { mTess = new TessBaseAPI(); String datapath = context.getFilesDir() + "/tesseract/"; mTess.init(datapath, language); } public String getOCRResult(Bitmap bitmap) { mTess.setImage(bitmap); return mTess.getUTF8Text(); } public void onDestroy() { if (mTess != null) mTess.end(); } }
- Constructor needs a context (for example MainActivity context) and the language to recognize that is used to start OCR engine. Language must be in 639-2/B ISO format. Example: spa (spanish), chi (chinese).
- Note:
- To recognize each language, it is necessary to download a file and save it on device storage. In our case, it will be stored in the app data directory followed by /tesseract/.
- These files that are used to recognize each language can be found in the next link: https://github.com/tesseract-ocr/tessdata.
- getOCRResult method will return the recognized text from the image we pass as argument.
- Import TessOCR class created in previous point to Main Activity and create a new recognition instance with the following line:
mTessOCR = new TessOCR (this, language);
- Add in MainActivity the method to perform character recognition.
private void doOCR (final Bitmap bitmap) { if (mProgressDialog == null) { mProgressDialog = ProgressDialog.show(ocrView, "Processing", "Doing OCR...", true); } else { mProgressDialog.show(); } new Thread(new Runnable() { public void run() { final String srcText = mTessOCR.getOCRResult(bitmap); ocrView.runOnUiThread(new Runnable() { @Override public void run() { // TODO Auto-generated method stub if (srcText != null && !srcText.equals("")) { //srcText contiene el texto reconocido } mTessOCR.onDestroy(); mProgressDialog.dismiss(); } }); } }).start(); }
- Firstly, this code starts a progress dialog indicating recognition status. Then launches a new thread of execution that will make recognition calling getOCRResult method. When recognition is finished, the dialog is dismissed and the recognized text is saved in srcText if everything has worked properly.
- Call step 5 method to start recognition.
doOCR(bitmap);
Considerations
- Recognition quality may vary depending on the image lighting conditions, camera resolution, text font, text size and others …
- To achieve the highest possible quality, it is very important to center the text in the image and image is properly focused.
Preview using OCR in a translator app
The following video shows part of the app I’m developing for my degree final project (TFG), where I use the OCR techniques described.
Related articles
Use Android Priority Job Queue library for your background tasks
Discovery of nearby Bluetoth devices in Android
ASO – Position your app in the App Store or Google Play
I’ve an error on ocrView. how do i declare the variable
Hi Paul, thank you for your comment.
In the project in which I wrote this code, ocrView was an activity, because I used a MVP (Model-View-Presenter) pattern and the code included in 4 and 5 points belongs to a Presenter, not to the Main Activity. I have to update it to avoid confusion.
Answering your question, you have two options:
1.Use the 4 and 5 points code in a presenter and declaring ocrView like below:
private final OCRActivity ocrView;
public OCRViewPresenter(OCRActivity view) {
ocrView = view;
}
2. Implement OCR on the activity itself, using this instead of ocrView in the first occurrence and YourActivityName.this in the second one (YourActivityName.this.ocrView.runOnUiThread)
scanning is not accuracy its giving some other values
Hi manish, thank you for your comment.
OCR techniques are likely not to be accurate if:
– Lighting conditions are poor.
– Camera resolution is low.
– Text font is not big enough.
So try to achieve the highest possible quality, center the text in the image and focus properly the image. If you have any question after that, please let me know.
How to make it on cloud?
i could not find any tutorial.
Hello Dusan, thank you for your comment. If you are interested in implementing OCR in a server, there are different alternatives. For example, you can have a look at https://github.com/tleyden/open-ocr, which uses Tesseract and includes a REST API to upload the image you want to scan.
Hope it will be useful
Very good work, congratulations!
Do you have any material to create this “mask” in the camera? to indicate where the person needs to align the camera
Thank you very much Renan, using a “mask” to help the person to align the camera is a good idea but I do not currently have any material about it. You can look for some Android third-party libraries that may implement it 😉
hello, after drawing the text from the image with tesseract already. I continue to want to use the translate API translate that text into how to do. Please help me. I need to hurry because I’m doing the final exam
Hi Trần Hữu Nghị, once Tesseract has processed the image and obtained the text, you will need a translator API to translate it, there are several such as the one from Google or Microsoft, among others.
hii. sorry i want to ask. regarding your material this, I dont understand in step 4. Can you explain a little more to me how to do the 4th step?
Hi Putri, there was a problem with the code set in step 4, it was split into 4 different lines when it should be just one, have a look at it again.
hi David ! thanks for this , may I ask if is it possible to convert the translated texts to augmented reality , and can you suggest tools that I can use ? Thank you 🙂
Hi Franz, I think that translating texts to augmented reality could be possible but I’ve never tried it before, there’s some libraries from Google to work with AR, have a look at https://github.com/google-ar
Hey, Which API is used in this project?
Hi Kamya, do you mean the translator API? There are several ones such as Google and Microsoft translators.
hello david
what is it available arabic/arabian language? and then translate to other languages?
Hi Tri Rahmat, the arabic/arabian language is in https://github.com/tesseract-ocr/tessdata/blob/master/ara.traineddata and to translate to other languages you can use any translations API.
Thanks a lot man, your code works, helps me a lot!
Hello Mr. David,
I have stored trained data inside assets folder following by tesseract folder,
but error says : IllegalArgumentException: Data path does not exist!
Thank you for help in advance.