Skip to main content

course: use PaddleOCR Realize container number detection and recognition

The complete tutorial is available https://hyperai.com/console/open-tutorials/containers/XJsxhLTnKNu You can directly「clone」use.

Project Introduction

Container number refers to the container number of the container used for shipping export goods,This field is required when filling out the consignment note.Basic concepts of standard container number composition:use ISO6346(1995)standard

The standard container number is determined by 11 Bit code composition,as:CBHU 123456 7,Including three parts:

  1. The first part consists of 4 Composed of English letters.The first three codes mainly indicate the box owner、Operator,The fourth code indicates the type of container.Column as follows CBHU The standard container at the beginning indicates that the container owner and operator are COSCO Shipping
  2. The second part consists of 6 Composed of digits.It's the box registration code,Unique identifier used for holding a container body
  3. The third part is the checksum from the front 4 The letters and 6 The digits are calculated through verification rules,Used to identify whether errors occur during verification.That is, the 11 Position number

This tutorial is based on PaddleOCR Perform container number detection and recognition tasks,Train detection separately using a small amount of data、Identification model,Finally, connect them together to achieve the task of container number detection and recognition

Environmental preparation

1、stay HyperAI Start a「model training」The container,Environmental selection paddlepaddle-2.3,Resource selection vGPU Or other GPU container

2、stay Jupyter Open one in the middle Terminal window,Then execute the following command:

cd PaddleOCR-release-2.5 #get into PaddleOCR-release-2.5 folder
pip install -r requirements.txt #install PaddleOCR Required dependencies
python setup.py install #install PaddleOCR

Dataset Introduction

The materials used in this tutorial Container Number Dataset,This data contains 3003 The resolution is 1920×1080 Container image

1、PaddleOCR The annotation rules for training the detection model are as follows,Middle use"\t"separate:

" Image file name                    json.dumps Encoded image annotation information"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]

among json.dumps The image annotation information before encoding contains multiple dictionaries list,In the dictionary points The coordinates of the four points representing the text box (x, y),Arrange clockwise starting from the point in the upper left corner. transcription Indicate the text of the current text box,When its content is“###”When,Indicates that the text box is invalid,Skipping during training.

2、PaddleOCR The annotation rules for identifying model training are as follows,Middle use"\t"separate:

" Image file name                 Image annotation information "

train_data/rec/train/word_001.jpg simple and reliable
train_data/rec/train/word_002.jpg Make the complicated world simpler through technology

Data organization

Preparation of data required for model detection

Convert the dataset 3000 Press the picture 2:1 Divided into training set and validation set,Run the following code

from tqdm import tqdm
finename = "all_label.txt"
f = open(finename)
lines = f.readlines()
t = open('det_train_label.txt','w')
v = open('det_eval_label.txt','w')
count = 0
for line in tqdm(lines):
if count < 2000:
t.writelines(line)
count += 1
else:
v.writelines(line)
f.close()
t.close()
v.close()
100%|██████████| 3003/3003 [00:00<00:00, 37908.32it/s]

Data preparation required for identifying models

We follow the annotations in the testing section,Crop the dataset to include only text and images as recognition data as much as possible,Run the following code

from PIL import Image
import json
from tqdm import tqdm
import os
import numpy as np
import cv2
import math

from PIL import Image, ImageDraw

class Rotate(object):

def __init__(self, image: Image.Image, coordinate):
self.image = image.convert('RGB')
self.coordinate = coordinate
self.xy = [tuple(self.coordinate[k]) for k in ['left_top', 'right_top', 'right_bottom', 'left_bottom']]
self._mask = None
self.image.putalpha(self.mask)

@property
def mask(self):
if not self._mask:
mask = Image.new('L', self.image.size, 0)
draw = ImageDraw.Draw(mask, 'L')
draw.polygon(self.xy, fill=255)
self._mask = mask
return self._mask

def run(self):
image = self.rotation_angle()
box = image.getbbox()
return image.crop(box)

def rotation_angle(self):
x1, y1 = self.xy[0]
x2, y2 = self.xy[1]
angle = self.angle([x1, y1, x2, y2], [0, 0, 10, 0]) * -1
return self.image.rotate(angle, expand=True)

def angle(self, v1, v2):
dx1 = v1[2] - v1[0]
dy1 = v1[3] - v1[1]
dx2 = v2[2] - v2[0]
dy2 = v2[3] - v2[1]
angle1 = math.atan2(dy1, dx1)
angle1 = int(angle1 * 180 / math.pi)
angle2 = math.atan2(dy2, dx2)
angle2 = int(angle2 * 180 / math.pi)
if angle1 * angle2 >= 0:
included_angle = abs(angle1 - angle2)
else:
included_angle = abs(angle1) + abs(angle2)
if included_angle > 180:
included_angle = 360 - included_angle
return included_angle



def image_cut_save(path, bbox, save_path):
"""
:param path: Image path
:param left: The distance between the pixel in the upper left corner of the block and the left boundary of the image
:param upper:The distance between the pixel in the upper left corner of the block and the upper boundary of the image
:param right:The distance between the pixel at the bottom right corner of the block and the left boundary of the image
:param lower:The distance between the pixel at the bottom right corner of the block and the upper boundary of the image
"""
img_width = 1920
img_height = 1080
img = Image.open(path)
coordinate = {'left_top': bbox[0], 'right_top': bbox[1], 'right_bottom': bbox[2], 'left_bottom': bbox[3]}
rotate = Rotate(img, coordinate)

left, upper = bbox[0]
right, lower = bbox[2]
if lower-upper > right-left:
rotate.run().convert('RGB').transpose(Image.ROTATE_90).save(save_path)
else:
rotate.run().convert('RGB').save(save_path)
return True

#Read detection labels and create recognition datasets
files = ["det_train_label.txt","det_eval_label.txt"]
filetypes =["train","eval"]
for index,filename in enumerate(files):
f = open(filename)
l = open('rec_'+filetypes[index]+'_label.txt','w')
if index == 0:
data_dir = "RecTrainData"
else:
data_dir = "RecEvalData"
if not os.path.exists(data_dir):
os.mkdir(data_dir)
lines = f.readlines()
for line in tqdm(lines):
image_name = line.split("\t")[0].split("/")[-1]
annos = json.loads(line.split("\t")[-1])
img_path = os.path.join("/input0/images",image_name)
for i,anno in enumerate(annos):
data_path = os.path.join(data_dir,str(i)+"_"+image_name)
if image_cut_save(img_path,anno["points"],data_path):
l.writelines(str(i)+"_"+image_name+"\t"+anno["transcription"]+"\n")
l.close()
f.close()
100%|██████████| 2000/2000 [02:55<00:00, 11.39it/s]
100%|██████████| 1003/1003 [01:30<00:00, 11.05it/s]

experiment

Due to a relatively small dataset,For better and faster convergence of the model,Choose here PaddleOCR In the middle PP-OCRv3 Model detection and recognition.PP-OCRv3 stay PP-OCRv2 On the basis of,Chinese scene end-to-end Hmean Compared to indicators PP-OCRv2 promote 5%, End to end improvement of English numerical models 11%.For detailed optimization details, please refer to PP-OCRv3 Technical report.

detection model

Detecting model configuration

PaddleOCR Provides many detection models,On the path PaddleOCR-release-2.5/configs/det You can find the model and its configuration files below.If we choose a model ch_PP-OCRv3_det_student.yml,The configuration file path is in:PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml.Necessary settings must be made before use,Like training parameters、Dataset path, etc.Display some key configurations as follows:

#Key training parameters
use_gpu: true #Do you use a graphics card
epoch_num: 1200 #train epoch number
save_model_dir: ./output/ch_PP-OCR_V3_det/ #Model save path
save_epoch_step: 200 #Every training session 200epoch,Save the model once
eval_batch_step: [0, 100] #Train every iteration 100 second,Perform a verification once
pretrained_model: ./PaddleOCR-release
2.5/pretrain_models/ch_PP-OCR_V3_det/best_accuracy.pdparams #Pre trained model path
#Training set path setting
Train:
dataset:
name: SimpleDataSet
data_dir: /input0/images #Image folder path
label_file_list:
- ./det_train_label.txt #Label Path

Model fine-tuning

stay notebook Run the following command in the middle to fine tune the model,among -c The path to the configured model file is passed in

%run PaddleOCR-release-2.5/tools/train.py \
-c PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml

Use default hyperparameters,Model ch_PP-OCRv3_det_student Train on the training set 385 individual epoch after,The model on the validation set hmean achieve:96.96%,There was no significant growth thereafter

[2022/10/11 06:36:09] ppocr INFO: best metric, hmean: 0.969551282051282, precision: 0.9577836411609498,
recall: 0.981611681990265, fps: 20.347745459258228, best_epoch: 385

Identification model

Identify model configuration

PaddleOCR It also provides many recognition models,On the path PaddleOCR-release-2.5/configs/rec You can find the model and its configuration files below.If we choose a model ch_PP-OCRv3_rec_distillation,The configuration file path is in:PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml.Necessary settings must be made before use,Like training parameters、Dataset path, etc.Display some key configurations as follows:

#Key training parameters
use_gpu: true #Do you use a graphics card
epoch_num: 1200 #train epoch number
save_model_dir: ./output/rec_ppocr_v3_distillation #Model save path
save_epoch_step: 200 #Every training session 200epoch,Save the model once
eval_batch_step: [0, 100] #Train every iteration 100 second,Perform a verification once
pretrained_model: ./PaddleOCR-release-2.5/pretrain_models/PPOCRv3/best_accuracy.pdparams #Pre trained model path
#Training set path setting
Train:
dataset:
name: SimpleDataSet
data_dir: ./RecTrainData/ #Image folder path
label_file_list:
- ./rec_train_label.txt #Label Path

Model fine-tuning

stay notebook Run the following command in the middle to fine tune the model,among -c The path to the configured model file is passed in

%run PaddleOCR-release-2.5/tools/train.py \
-c PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

Use default hyperparameters,Model ch_PP-OCRv3_rec_distillation Train on the training set 136 individual epoch after,The accuracy of the model on the validation set reaches:96.11%,There was no significant growth thereafter

[2022/10/11 20:04:28] ppocr INFO: best metric, acc: 0.9610600272522444, norm_edit_dis: 0.9927426548965615,
Teacher_acc: 0.9540291998159589, Teacher_norm_edit_dis: 0.9905629345025616, fps: 246.029195787707, best_epoch: 136

Model reasoning

Detecting model inference

stay notebook Run the following command in the middle to use the fine tuned model to detect text in the test image,among: Global.infer_img For image path or image folder path, Global.pretrained_model For the fine tuned model, Global.save_res_path Save the path for inference results

%run PaddleOCR-release-2.5/tools/infer_det.py \
-c PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.infer_img="/input0/images" Global.pretrained_model="./output/ch_PP-OCR_V3_det/best_accuracy" Global.save_res_path="./output/det_infer_res/predicts.txt"

Identification model inference

stay notebook Run the following command in the middle to use the fine tuned model to detect text in the test image,among: Global.infer_img For image path or image folder path, Global.pretrained_model For the fine tuned model, Global.save_res_path Save the path for inference results

%run PaddleOCR-release-2.5/tools/infer_rec.py \
-c PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
-o Global.infer_img="./RecEvalData/" Global.pretrained_model="./output/rec_ppocr_v3_distillation/best_accuracy" Global.save_res_path="./output/rec_infer_res/predicts.txt"

Detection and recognition model serial reasoning

Model conversion

Before performing serial inference, it is necessary to first convert the trained and saved model into an inference model,Just execute the following detection commands separately.among,-c Pass in the configuration file path for the model to be converted,-o Global.pretrained_model For the model file to be converted,Global.save_inference_dir To convert and obtain the storage path of the inference model

# Detecting model conversion
%run PaddleOCR-release-2.5/tools/export_model.py \
-c PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.pretrained_model="./output/ch_PP-OCR_V3_det/best_accuracy" Global.save_inference_dir="./output/det_inference/"

W1011 07:10:20.363173 544 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.4, Runtime API Version: 11.1 W1011 07:10:20.366801 544 gpu_context.cc:306] device: 0, cuDNN Version: 8.0. W1011 07:10:22.629678 544 gpu_context.cc:506] WARNING: device: �. The installed Paddle is compiled with CUDNN 8.1, but CUDNN version in your machine is 8.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.

[2022/10/11 07:10:24] ppocr INFO: load pretrain successful from ./output/ch_PP-OCR_V3_det/best_accuracy [2022/10/11 07:10:27] ppocr INFO: inference model is saved to ./output/det_inference/inference

# Recognition model transformation
%run PaddleOCR-release-2.5/tools/export_model.py \
-c PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
-o Global.pretrained_model="./output/rec_ppocr_v3_distillation/best_accuracy" Global.save_inference_dir="./output/rec_inference/"

[2022/10/11 07:10:33] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3_distillation/best_accuracy [2022/10/11 07:10:35] ppocr INFO: inference model is saved to ./output/rec_inference/Teacher/inference [2022/10/11 07:10:36] ppocr INFO: inference model is saved to ./output/rec_inference/Student/inference

Model concatenation reasoning

After conversion is complete,PaddleOCR Provides a cascading tool for detecting and identifying models,It is possible to concatenate any trained detection model and any recognition model into a two-stage text recognition system.Input image undergoes text detection、Correction of detection box、Text recognition、Score filtering involves four main stages of outputting text positions and recognition results.The execution code is as follows,among image_dir Path for a single image or a collection of images,det_model_dir To detect inference The path of the model,rec_model_dir To identify inference The path of the model.Visual recognition results are saved by default to ./inference_results Inside the folder.

%run PaddleOCR-release-2.5/tools/infer/predict_system.py \
--image_dir="OCRTest" \
--det_model_dir="./output/det_inference/" \
--rec_model_dir="./output/rec_inference/Student/"

[2022/10/11 07:10:46] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320 [2022/10/11 07:10:48] ppocr DEBUG: dt_boxes num : 2, elapse : 1.0023341178894043 [2022/10/11 07:10:48] ppocr DEBUG: rec_res num : 2, elapse : 0.02405834197998047 [2022/10/11 07:10:48] ppocr DEBUG: 0 Predict time of OCRTest/1-122700001-OCR-LF-C01.jpg: 1.041s [2022/10/11 07:10:48] ppocr DEBUG: TTEMU3108252, 0.864 [2022/10/11 07:10:48] ppocr DEBUG: 22G1, 0.843 [2022/10/11 07:10:48] ppocr DEBUG: The visualized image saved in ./inference_results/1-122700001-OCR-LF-C01.jpg [2022/10/11 07:10:48] ppocr DEBUG: dt_boxes num : 1, elapse : 0.047757863998413086 [2022/10/11 07:10:48] ppocr DEBUG: rec_res num : 1, elapse : 0.016452789306640625 [2022/10/11 07:10:48] ppocr DEBUG: 1 Predict time of OCRTest/1-122720001-OCR-AH-A01.jpg: 0.073s [2022/10/11 07:10:48] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-AH-A01.jpg [2022/10/11 07:10:48] ppocr DEBUG: dt_boxes num : 2, elapse : 0.05301952362060547 [2022/10/11 07:10:48] ppocr DEBUG: rec_res num : 2, elapse : 0.020509719848632812 [2022/10/11 07:10:48] ppocr DEBUG: 2 Predict time of OCRTest/1-122720001-OCR-AS-B01.jpg: 0.081s [2022/10/11 07:10:48] ppocr DEBUG: EITU1786393, 0.990 [2022/10/11 07:10:48] ppocr DEBUG: 45G1, 0.963 [2022/10/11 07:10:49] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-AS-B01.jpg [2022/10/11 07:10:49] ppocr DEBUG: dt_boxes num : 2, elapse : 0.049460411071777344 [2022/10/11 07:10:49] ppocr DEBUG: rec_res num : 2, elapse : 0.020053625106811523 [2022/10/11 07:10:49] ppocr DEBUG: 3 Predict time of OCRTest/1-122720001-OCR-LB-C02.jpg: 0.077s [2022/10/11 07:10:49] ppocr DEBUG: LTU1, 0.814 [2022/10/11 07:10:49] ppocr DEBUG: 45G1, 0.997 [2022/10/11 07:10:49] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-LB-C02.jpg [2022/10/11 07:10:49] ppocr DEBUG: dt_boxes num : 2, elapse : 0.051781654357910156 [2022/10/11 07:10:49] ppocr DEBUG: rec_res num : 2, elapse : 0.020511150360107422 [2022/10/11 07:10:49] ppocr DEBUG: 4 Predict time of OCRTest/1-122720001-OCR-RF-D01.jpg: 0.081s [2022/10/11 07:10:49] ppocr DEBUG: EITU1786393, 0.966 [2022/10/11 07:10:49] ppocr DEBUG: 45G1, 0.939 [2022/10/11 07:10:49] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-RF-D01.jpg [2022/10/11 07:10:49] ppocr DEBUG: dt_boxes num : 0, elapse : 0.04465031623840332 [2022/10/11 07:10:49] ppocr DEBUG: rec_res num : 0, elapse : 1.430511474609375e-06 [2022/10/11 07:10:49] ppocr DEBUG: 5 Predict time of OCRTest/1-122728001-OCR-AH-A01.jpg: 0.049s [2022/10/11 07:10:49] ppocr DEBUG: The visualized image saved in ./inference_results/1-122728001-OCR-AH-A01.jpg [2022/10/11 07:10:49] ppocr INFO: The predict total time is 2.9623537063598633