目录
pyinstaller打包的基于paddleOCR的可执行文件启动报错
很多人的图片转文字功能都采用 paddleOCR项目,为了发布给用户使用,往往要借助pyinstaller等打包工具。使用pyinstaller打包paddleOCR为可执行后,很多开发者遇到,可执行文件启动报错:未找到模块。
网友kerneltravel 综合分析了多个issue和pyinstaller的报错信息后,找到这个问题的原因,并给出了解决方法,同时向paddleOCR官方提交了修复代码(见 PR1 和 PR2 ),以PR2 为准。
下面对这个问题做具体分析:
问题表现:
-
打包后,paddleocr应用启动报错信息1:
Traceback (most recent call last): File "main.py", line 5, in File "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module File "paddleocr_init_.py", line 14, in File "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module File "paddleocr\paddleocr.py", line 33, in File "importlib_init_.py", line 126, in import_module ModuleNotFoundError: No module named 'tools' [11752] Failed to execute script 'main' due to unhandled exception! -
启动报错信息2:
raceback (most recent call last): File "yes .py",line 1, inFile"PyInstaller\loader pyimod02 importers.py", line 385,in exec moduleFileFile"paddleocrinit .py",line 14,in File"pyInstaller loader pyimod02 importers.py", line 385,in exec module File"paddleocr\paddleocr.py",line 34,in File"importlib\ init .py",line 127,in import module ModuleNotFoundError: No module named"ppocr' [23216] Failed to execute script 'yes' due to unhandled exception! -
启动报错信息3:
Traceback (most recent call last): File "/home/pc/Music/PD_OCR/demo.py", line 1, in from paddleocr import PaddleOCR,tools File "/home/pc/.local/lib/python3.10/site-packages/paddleocr/init.py", line 14, in from .paddleocr import * File "/home/pc/.local/lib/python3.10/site-packages/paddleocr/paddleocr.py", line 37, in from tools.infer import predict_system ModuleNotFoundError: No module named 'tools.infer'runtime environment:
Ubuntu: 22:04 paddleocr : 2.6.1.3 paddlepaddle : 2.4.2 -
启动报错信息4:
Traceback (most recent call last): File "main.py", line 34, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "wkr\AllWorker.py", line 31, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "wkr\ItrWorker.py", line 26, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "paddleocr_init_.py", line 14, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "paddleocr\paddleocr.py", line 21, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "paddle_init_.py", line 62, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "paddle\distributed_init_.py", line 15, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "paddle\distributed\spawn.py", line 24, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "paddle\distributed\utils\launch_utils.py", line 27, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "paddle\distributed\fleet_init_.py", line 31, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "paddle\distributed\fleet\fleet.py", line 33, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "paddle\fluid\ir.py", line 28, in File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "paddle\fluid\proto\pass_desc_pb2.py", line 16, in ModuleNotFoundError: No module named 'framework_pb2' -
启动报错信息5:
Traceback (most recent call last): File "main.py", line 2, inFile "PyInstaller\loader\pyimod02_importers.py", line 493, in exec_module File "libs\ocr.py", line 2, in ModuleNotFoundError: No module named 'paddleocr'
问题原因
从报错信息分析,都是运行时无法找到paddleocr库下面的具体的模块(paddleocr.tools、paddleocr.tools.ppocr、 paddleocr.tools.infer 等多层级的模块)。
为了找到依据,打包时启用更多调试信息:
例如使用 -d all 参数,可以将可执行文件启动时加载的模块的过程打印出来:
pyinstaller.exe -F -d all --add-data .\paddleocr;.\paddleocr --add-data .\mklml.dll;. main.py
经过以上打包得到的可执行文件,运行时会有详细的日志输出信息,可看到在哪里失败了,例如:
# paddle.text.datasets.uci_housing not found in PYZ
# code object from 'd:\\path\\to\\dist\\main\\paddle\\text\\datasets\\uci_housing.pyc'
import 'paddle.text.datasets.uci_housing' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2EF9E8880>
# paddle.text.datasets.wmt14 not found in PYZ
# code object from 'd:\\path\\to\\dist\\main\\paddle\\text\\datasets\\wmt14.pyc'
import 'paddle.text.datasets.wmt14' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2EF9E8B50>
# paddle.text.datasets.wmt16 not found in PYZ
# code object from 'd:\\path\\to\\dist\\main\\paddle\\text\\datasets\\wmt16.pyc'
import 'paddle.text.datasets.wmt16' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2EF9F20D0>
import 'paddle.text.datasets' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2EF9DE1C0>
import 'paddle.text' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2EF9D0BE0>
import 'paddle' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2CBAF43A0>
# tools not found in PYZ
Traceback (most recent call last):
File "main.py", line 4, in
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 783, in exec_module
File "", line 219, in _call_with_frames_removed
File "paddleocr\__init__.py", line 14, in
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 783, in exec_module
File "", line 219, in _call_with_frames_removed
File "paddleocr\paddleocr.py", line 33, in
File "importlib\__init__.py", line 127, in import_module
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'tools'
[220960] Failed to execute script 'main' due to unhandled exception!
[220960] LOADER: OK.
[220960] LOADER: Manually flushing stdout and stderr
[220960] LOADER: Cleaning up Python interpreter.
# clear builtins._
# clear sys.path
# clear sys.argv
# clear sys.ps1
# clear sys.ps2
# clear sys.last_type
# clear sys.last_value
# clear sys.last_traceback
# destroy paddleocr.paddleocr
# destroy paddleocr
# clear sys.path_hooks
# clear sys.path_importer_cache
# clear sys.meta_path
# clear sys.__interactivehook__
# restore sys.stdin
# restore sys.stdout
# restore sys.stderr
# cleanup[2] removing sys
当给pyinstaller加上 --collect-all paddleocr 参数时,以上5种错误都没了。因为这个参数会把site-packages下的paddleocr 模块的所有文件都复制到打包文件目录下。
如果不一次性解决完全,想逐步验证,当一个模块hidden-import,而它的子模块没有hidden-import进来的时候,是否只出现子模块无法找到的提示。如果是,那么就说明是hidden-import进来的模块不全(实际–collect-all是将指定模块及其所有子模块都一次性导入,更全)。
那么可以采用下面的打包命令:
r'pyinstaller.exe -F --hidden-import paddleocr --hidden-import paddleocr.paddleocr --hidden-import paddleocr.ppocr --hidden-import paddleocr.ppocr.* --hidden-import paddleocr.ppstructure --hidden-import paddleocr.ppstructure.* --hidden-import paddleocr.tools --hidden-import paddleocr.tools.* -d all --add-data .\paddleocr;.\paddleocr --add-data .\mklml.dll;. main.py'
以上命令可以随机去掉其中的 --hidden-import paddleocr.ppstructure 等句子,如果最终可执行文件提示 paddleocr.ppstructure 模块未找到,而没有再提示paddleocr模块未找到,则说明问题就处在hidden-import 也就是需要 --collect-all paddleocr 这个参数,显式指定导入整个paddleocr模块。
为什么只有paddleocr需要额外指定导入参数?
因为paddleocr 的模块没有 pyd模块,(不像 paddlepaddle模块 有 libpaddle.pyd 模块文件),所以运行时如果没有 paddleocr的 .pyd 文件,那就只能通过 将paddleocr的全量文件--collect-all 打包入可执行文件目录的方式解决。
paddleocr 能否生成pyd 文件?
暂时未知。
解决方式
pyinstaller加上 --collect-all paddleocr 参数进行打包。且注意paddleOCR 的代码中已应用了 https://github.com/PaddlePaddle/PaddleOCR/pull/10502 这个PR的2个代码文件的修改。
回顾总结
这个问题在paddleOCR 的github官网上连续2-3年有人不断遇到,但由于很多开发者没找到问题规律,所以没有复现。经kerneltravel和其他网友的共同努力,终于分析清楚了原因,也给出了准确的解决方式。
回想在解决这个问题之前,也参考过网上很多其他文章,但大部分文章的内容互相抄袭,有的给出的解决方式有很大局限性(比如打包后只能给自己电脑用,比如这篇 文章虽然也提到了一些关键解决方式,但该作者采用的
pathex=['D:/python/JobRunner/venv/Lib/site-packages/paddleocr', 'D:/python/JobRunner/venv/Lib/site-packages/paddle/libs'],
等参数,导致打包的可执行文件在别人电脑上仍然找不到依赖的包。
虽然该作者用--hidden-import= 的方式引入了skimage 等依赖的包,该作者用--hidden-import导入的包也只适用于他自己的情况,并未总结清楚。
总之,希望本文能对你解决pyinstaller打包paddleOCR有真正的帮助。如果仍有疑问,欢迎在评论区留言,或在github.com上给kerneltravel私信留言。
