罗宁的博客

xpdf读取pdf文件

xpdf官网:http://www.foolabs.com/xpdf/download.html

1
2
下载xpdf软件
wget ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.04.tar.gz

安装xpdf

1
2
3
4
5
6
yum install freetype
tar -zxvf xpdf-3.04.tar.gz
cd xpdf
./configure --with-freetype2-includes=/usr/include/freetype2 --prefix=/usr/local/xpdf
make
make install

pdf命令:

1
2
3
4
pdf2dsc pdf2ps pdfdetach pdffonts pdfimages pdfinfo pdfopt pdftohtml pdftoppm pdftops pdftotext
pdftotext 为解析pdf为文本的命令
pdftotext /home/www/web/coding.pdf one.txt
解析coding.pdf并将解析后的文本写入onw.txt文件

解析pdf中文支持

1
2
3
4
5
wget ftp://ftp.foolabs.com/pub/xpdf/xpdf-chinese-simplified.tar.gz
tar -zxvf xpdf-chinese-simplified.tar.gz
mkdir -p /usr/local/share/xpdf/chinese-simplified
cp -a xpdf-chinese-simplified/* /usr/local/share/xpdf/chinese-simplified
cat xpdf-chinese-simplified/add-to-xpdfrc >> /usr/local/etc/xpdfrc

解析中文pdf命令

1
pdftotext /home/www/web/3.pdf /tmp/one.txt -enc UTF-8