Abstract
Our aim is to develop a watermarking method for formatted text documents that is robust to printing and scanning, but with a greater capacity than has previously been achieved. Our end goal is to develop a system with sufficient capacity to embed a simple authenticating watermark. Our method is based on our previous technique of multi-set embedding but now makes use of all word spaces and treats the document as one long line. As part of the modifications we propose a new method of calculating the threshold between letter and word spaces, based on frequency distributions, and have modified the way we conduct threshold buffering. Experiments have been carried out at two different resolutions, 150dpi and 300dpi, on 9 different documents using three different watermarks. We have seen an average increase in capacity of 20 percent whilst also improving the level of robustness to printing and scanning. © 2007 IEEE.