"Digitized by Google"

jgray · 09-07-2020, 02:17 AM

The following information is obviously for use with public domain books.

It seems that books that were scanned by Google are not stamped with a watermark. Instead, each page has an image placed on it. Software that removes watermarks will not remove these images.

I downloaded a sample book to see what I could do about this.

https://ia803203.us.archive.org/16/i...s_Practice.pdf

At the bottom of each page is an image that says "Digitized by Google". Through some investigation, I found that this is really just one image in the PDF, that is referenced multiple times, on each page. If we remove this image object, the image should be removed from all pages.

The following may not work for all books that are scanned by Google, but I'm assuming that since they automated the process, that all scanned books should have the same image in them.

1 - Using a free program, "GUIpdftk", uncompress the PDF.

2 - Using Notepad++ (or other text editor that can open large files), search for "/Width 1034", which is the width of the Google image. I found it in two consecutive places.

3 - In the sample PDF, objects 24 and 25 have the specified width, and the binary data (stream / endstream) appears to be exactly the same in both. In other scanned PDF files, the object numbers will probably be different.

NOTE: it seems that object 25 is the one that is used to stamp every page in this PDF. It's removal was sufficient. However, since object 24 looks to have the same image data, I felt it was best to remove it, also.

4 - Using the text editor, delete both objects entirely. This starts with "24 0 obj" and ends with the "endobj", after "25 0 obj". You are removing two similar (but not identical) code blocks from the PDF.

5 - Recompress the edited PDF with GUIpdftk. You should no longer see "Digitized by Google" on any pages.

BTW, you can also remove the Google title page with GUIpdftk, Use the "Remove" button. Specifying "2-end" removed the Google title page, leaving the rest of the document. This works on the compressed or uncompressed PDF.

PoP · 09-07-2020, 11:59 AM

I have a 617 pages book which I once edited in "Adobe Acrobat Pro", removing these object references, page by page. Nedeless to say it was tedious and error prone.

I tried your effortless method on my book, it worked beautifully.

Thanks for sharing.

MonkCanatella · 03-03-2023, 08:42 PM

You're a total life saver. This helped me get rid of all the "Digitized by Google" pages that were showing up on every page on my epubs converted from pdfs. Signed up just to thank you!

willus · 03-04-2023, 08:50 AM

Nice analysis. I had never head of GUIpdftk. There are many other command-line utilities that will also decompress and recompress a PDF, including qpdf, cpdf, and mubusy / pdfclean (from MuPDF distro). Some of them might even be able to remove the object you talk about without having to use a text editor, but I'd have to experiment. Thanks for the post.

sricochet · 10-10-2023, 10:52 PM

This method works to remove the watermark very easily, however after opening the pdf in acrobat, I get an error message: an error exists on this page. Acrobat may not displaytthe page correctly...

The pdf opens just fine in okular or Sumatra pdf, so is it fair to assume this is Adobe's bug?

barbiedolphin · 05-19-2024, 09:12 AM

Very easy method in Acrobat:
http://zengrain.com/how-to-remove-go...-domain-scans/

From the buttons on the right side of the main Acrobat window, open "Print Production > Preflight"
Select the Profiles tab (I've also selected "Acrobat Pro DC 2015 Profiles" from the topmost dropdown)
In the list of profiles, find "Create PDF layers > Put all transparent objects on layers" and double click the profile
After it's done, close the Preflight window, and open "Layers" (button on the left side of the main Acrobat window)
All the Google watermarks are now in their own layer - click the eye icon next to it in order to hide them all

Three alternatives to finish removal:

Right click the layer and open its' properties, set the "Default state" to "Off", then save the file (seemingly easiest and fastest)

Or:

From the "Options" dropdown menu button (right above the list of layers), select "Flatten Layers", then save the file

Or:

From the Preflight window, find and double click "PDF fixups > Flatten transparency (high resolution)" - this alternative might sometimes result in better filesizes (just as often as not)

09-07-2020, 02:17 AM	#1
jgray Fanatic Posts: 549 Karma: 2928497 Join Date: Mar 2008 Device: Clara 2E & Sage	"Digitized by Google" The following information is obviously for use with public domain books. It seems that books that were scanned by Google are not stamped with a watermark. Instead, each page has an image placed on it. Software that removes watermarks will not remove these images. I downloaded a sample book to see what I could do about this. https://ia803203.us.archive.org/16/i...s_Practice.pdf At the bottom of each page is an image that says "Digitized by Google". Through some investigation, I found that this is really just one image in the PDF, that is referenced multiple times, on each page. If we remove this image object, the image should be removed from all pages. The following may not work for all books that are scanned by Google, but I'm assuming that since they automated the process, that all scanned books should have the same image in them. 1 - Using a free program, "GUIpdftk", uncompress the PDF. 2 - Using Notepad++ (or other text editor that can open large files), search for "/Width 1034", which is the width of the Google image. I found it in two consecutive places. 3 - In the sample PDF, objects 24 and 25 have the specified width, and the binary data (stream / endstream) appears to be exactly the same in both. In other scanned PDF files, the object numbers will probably be different. NOTE: it seems that object 25 is the one that is used to stamp every page in this PDF. It's removal was sufficient. However, since object 24 looks to have the same image data, I felt it was best to remove it, also. 4 - Using the text editor, delete both objects entirely. This starts with "24 0 obj" and ends with the "endobj", after "25 0 obj". You are removing two similar (but not identical) code blocks from the PDF. 5 - Recompress the edited PDF with GUIpdftk. You should no longer see "Digitized by Google" on any pages. BTW, you can also remove the Google title page with GUIpdftk, Use the "Remove" button. Specifying "2-end" removed the Google title page, leaving the rest of the document. This works on the compressed or uncompressed PDF.

05-19-2024, 09:12 AM	#6
barbiedolphin Member Posts: 21 Karma: 50 Join Date: Jan 2019 Device: none	Very easy method in Acrobat: http://zengrain.com/how-to-remove-go...-domain-scans/ From the buttons on the right side of the main Acrobat window, open "Print Production > Preflight" Select the Profiles tab (I've also selected "Acrobat Pro DC 2015 Profiles" from the topmost dropdown) In the list of profiles, find "Create PDF layers > Put all transparent objects on layers" and double click the profile After it's done, close the Preflight window, and open "Layers" (button on the left side of the main Acrobat window) All the Google watermarks are now in their own layer - click the eye icon next to it in order to hide them all Three alternatives to finish removal: Right click the layer and open its' properties, set the "Default state" to "Off", then save the file (seemingly easiest and fastest) Or: From the "Options" dropdown menu button (right above the list of layers), select "Flatten Layers", then save the file Or: From the Preflight window, find and double click "PDF fixups > Flatten transparency (high resolution)" - this alternative might sometimes result in better filesizes (just as often as not) Last edited by barbiedolphin; 05-19-2024 at 08:39 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Google seeks patent to add "triggered sounds" to e-books	Alexander Turcic	News	51	09-27-2013 06:51 PM
"U.S. court throws out Google digital books class status"	John F	News	50	07-10-2013 07:20 PM

09-07-2020, 11:59 AM	#2
PoP curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ Posts: 3,008 Karma: 50506927 Join Date: Dec 2010 Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀" Device: K3₃.₄.₃ PW3&4₅.₁₃.₃	I have a 617 pages book which I once edited in "Adobe Acrobat Pro", removing these object references, page by page. Nedeless to say it was tedious and error prone. I tried your effortless method on my book, it worked beautifully. Thanks for sharing.

03-03-2023, 08:42 PM	#3
MonkCanatella Member Posts: 16 Karma: 10 Join Date: Mar 2023 Device: iPad Pro	You're a total life saver. This helped me get rid of all the "Digitized by Google" pages that were showing up on every page on my epubs converted from pdfs. Signed up just to thank you!

03-04-2023, 08:50 AM	#4
willus Fuzzball, the purple cat Posts: 1,286 Karma: 11087488 Join Date: Jun 2011 Location: California Device: iPad	Nice analysis. I had never head of GUIpdftk. There are many other command-line utilities that will also decompress and recompress a PDF, including qpdf, cpdf, and mubusy / pdfclean (from MuPDF distro). Some of them might even be able to remove the object you talk about without having to use a text editor, but I'd have to experiment. Thanks for the post.

10-10-2023, 10:52 PM	#5
sricochet Enthusiast Posts: 26 Karma: 10 Join Date: Sep 2014 Device: Kindle Scribe	This method works to remove the watermark very easily, however after opening the pdf in acrobat, I get an error message: an error exists on this page. Acrobat may not displaytthe page correctly... The pdf opens just fine in okular or Sumatra pdf, so is it fair to assume this is Adobe's bug?