01-22-2011, 05:17 AM | #1 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
new search & replace - great, but couple of suggestions
I did a pdf to epub and was easily able to strip header & footer ( including page numbers) with the new feature & wizards.
but I'd like to be able to toggle S&R so that when I now do e.g. epub to mobi, the S&R is not reapplied. I can opt for reset to defaults on search replace but then I lose my carefully constructed expressions. I "should" not need them again but keeping them would preserve my "start over" path if I screw up the epub, later. so an option to disable, but preserve, field contents would be good. & keep the source ( pdf) used by the SR wizard in memory after constructing 1st epxression, so that when needed again for 2nd expression there is no delay AFAIK, conversion options are only stored once per book, not by source type within book ( which would be better ) ? Last edited by cybmole; 01-22-2011 at 05:34 AM. |
01-22-2011, 10:23 AM | #2 |
creator of calibre
Posts: 44,174
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
your regexes are unlikely to match both pdf input and EPUB input, so just leave them in.
|
Advert | |
|
01-22-2011, 11:23 AM | #3 | |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
a greater slowdown is waiting for pdttohtml sub process to run 3 times in order to set up 3 regex filters- if that could be held in memory to reuse for filters 2 & 3 ... ? I have to say again, that in general this is very slick & MUCH better that the previous header / footer options. I now feel confident that I can filter pretty much any PDF header / footer setup |
|
01-22-2011, 11:44 AM | #4 |
creator of calibre
Posts: 44,174
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I'll leave that one up to user_none, the regex wizard is his baby.
|
01-22-2011, 04:08 PM | #5 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
In most cases the search and replace regex will not match between different inputs. Usually the regex used is also not complex enough to see any slow down by having it run without finding any matches.
pdttohtml only runs once per PDF input. Each search and replace expression is run over the html produced by pdftohtml. It actually runs over the html produced by any input plugin. Each expression is run once per page. In the case of PDF input only one page is generated so it is only run once. In the case of an EPUB book (or an input plugin that creates multiple pages within the OEB) there is no way to avoid running the search and replace multiple time. It has to run over each independent page. If you're seeing the search and replace run multiple times it's because the document has multiple pages and it has to be run over each page. When I say page I'm talking about the independent xhtml files found with an OEB (EPUB). Also, there is a bug I'm aware of with the regex input field. It should be saving all previous regex used in the drop down. It's not and I plan to look into why. Once that's fixed you can put your expressions, save, then remove it, then later find it in the list to reenable it. |
Advert | |
|
01-23-2011, 03:13 AM | #6 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
thanks for response - maybe I was not totally clear about the pdftohtml bit.
I am at the stage of setting conversion fields for pdf to epub 1. I take a large pdf ask and the wizard to help me create S&R 1- to do that it runs pdftohtml and takes a few seconds. 2 I am happy with that, so I ask the wizard to now help me create S&R2. This is where it seems to go back to the beginning and spend many seconds preparing the PDF for wizard use ( even though it's done it once already in step 1). in task manager I see that pdftohtml is running & using lots of CPU during these seconds. what I am suggesting is keep the file used for wizard / testing from step 1 so that it does not have to be re-created for step 2 of the setup test - . I am not talking about the actual conversion to epub. does that make sense ps on saving previous S&R into a drop down - will that then work across different books - as some expressions may be re-usable elsewhere? Ideally it would be good to accumulate a user's mini library of expressions somewhere, or to have a way of sharing them on this forum Last edited by cybmole; 01-23-2011 at 03:16 AM. |
01-23-2011, 01:16 PM | #7 | ||
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
Quote:
However, it sounds more like you want a global settings scratch pad where you can see all stored settings and probably what book they were stored for... If you have some idea of how it would work open a ticket for it too. Someone might like the idea (or be bored) and implement it. |
||
01-23-2011, 04:07 PM | #8 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
i keep a personal scratch pad in notebook++
once I get slicker with regex then I may not need it, but for now it is handy I am not too fussed about the htmltopdf issue - I happened to notice the delay on one book but I have cleared my backlog of pdf converts for now & have no more planned. so I don't need either feature badly enough to take up a slot on the to-do lists, unless others want them also. |
01-24-2011, 07:01 AM | #9 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
I've added support for caching the document in the wizard. It converts once and uses the result across all regex wizards in search and replace. It wasn't as invasive a change as I though it would be.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Search & Replace - Regular expression | oldbwl | Calibre | 2 | 01-09-2011 09:33 AM |
Search & Replace Suggestion | Philosopher | Calibre | 6 | 12-31-2010 11:55 AM |
Search & Replace: Destination series_index? | Starson17 | Calibre | 0 | 12-09-2010 01:12 PM |
Search & Replace | Pat Nickholds | Sigil | 2 | 10-21-2010 11:18 PM |
Search & replace TEXT | ToeRag | Calibre | 3 | 04-10-2010 01:44 PM |