|
|
Thread Tools | Search this Thread |
11-10-2011, 07:03 AM | #1 |
Member
Posts: 13
Karma: 10
Join Date: Nov 2011
Device: kindle
|
using templates/pyhon and custom columns to extract specific data from tags
Hi Guys
I recently found out how to copy all data from one column (tags) to another customer column by using search and replace, thanks Chaley! However i only want to copy specific data from the tags, using a template/python function so i dont have to do it manually. I started to learn about templates and python last night and got pretty far: ie i created a column called #testcomposite first i tried templates to extract only known genres from the tags column: Code:
{#testcomposite:'list_intersection(field('tags'),'Adult, Adventure, Anthologies, Biography, Childrens, Classics, Drugs, Fantasy, Food, Football, Health, History, Historical, Horror, Humour, Inspirational, Modern, Music, Mystery, Non-Fiction, Poetry, Political, Philosophy, Psychological, Reference, Religion, Romance, Science, Science Fiction, Self Help, Short Stories, Sociology, Spirituality, Suspense, Thriller, Travel, Vampires, War, Western, Writing, Young Adult',',')'} function:getgenre, 1 param Code:
def evaluate(self, formatter, kwargs, mi, locals, val): list1 = val list2 = 'Adult, Adventure, Anthologies, Biography, Childrens, Classics, Drugs, Fantasy, Food, Football, Health, History, Historical, Horror, Humour, Inspirational, Modern, Music, Mystery, Non-Fiction, Poetry, Political, Philosophy, Psychological, Reference, Religion, Romance, Science, Science Fiction, Self Help, Short Stories, Sociology, Spirituality, Suspense, Thriller, Travel, Vampires, War, Western, Writing, Young Adult' separator = ',' l1 = [l.strip() for l in list1.split(separator) if l.strip()] l2 = [icu_lower(l.strip()) for l in list2.split(separator) if l.strip()] res = [] for i in l1: if icu_lower(i) in l2: res.append(i) return ', '.join(res) Code:
{#testcomposite:'getgenre(field('tags'))'} However, what i would like is something like this: Extract all the known genres from the tags (like above) but also if i come across a tag which contains *mystery* (like Mystery & Detective) then add genre "Mystery" to the #testcomposite column so something like this if tag item like '*horror*' or tag item='Scarey' or tag item='Spooky' then add 'Horror' etc Any help is appreicated, in either template or python (or both!) PS, i am a programmer, but python and calibre is all very new to me and a little lower level language than im used to. PPS, im amazed at home flexable this program is, hats off to the creator(s)! Thanks very much! Last edited by smoothrolla; 11-10-2011 at 12:38 PM. |
11-10-2011, 01:27 PM | #2 |
Grand Sorcerer
Posts: 12,032
Karma: 7257323
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
First, you can make the python function faster by changing the code as follows:
Code:
def evaluate(self, formatter, kwargs, mi, locals, val): list1 = val l2 = ['adult', 'adventure', 'anthologies', 'biography', ..., 'young adult'] l1 = [l.strip() for l in list1.split(',') if l.strip()] l1lcase = [icu_lower(l) for l in l1] res = set() for idx,item in enumerate(l1lcase): if item in l2: res.add(l1[idx]) return ', '.join(res) You can do the 'like' examples using something like: Code:
for item in l1lcase: if 'horror' in item or item in ['scary', 'spooky']: res.add('Horror') break for item in l1lcase: if 'mystery' in item or 'detective' in item: res.add('Mystery') break The set is necessary here because the added item might already be in the result, thus adding it more than once. |
Advert | |
|
11-10-2011, 04:22 PM | #3 |
Member
Posts: 13
Karma: 10
Join Date: Nov 2011
Device: kindle
|
Thanks Charley!
I got the speeded up script working great, thanks for that. I half understand it, slowly getting there I decided the code needed reworking so it looks for partial matches for all the genres i have provided (41 of them), and then the new code to map scarey to Horror etc (rather than have 41 for loops) Here is the new code: Code:
def evaluate(self, formatter, kwargs, mi, locals, val): list1 = val l2 = ['adult', 'adventure', 'anthologies', 'biography', 'childrens', 'classics', 'drugs', 'fantasy', 'food', 'football', 'health', 'history', 'historical', 'horror', 'humour', 'inspirational', 'modern', 'music', 'mystery', 'non-fiction', 'poetry', 'political', 'philosophy', 'psychological', 'reference', 'religion', 'romance', 'science', 'science fiction', 'self help', 'short stories', 'sociology', 'spirituality', 'suspense', 'thriller', 'travel', 'vampires', 'war', 'western', 'writing', 'young adult'] l1 = [l.strip() for l in list1.split(',') if l.strip()] l1lcase = [icu_lower(l) for l in l1] res = set() for idx,item in enumerate(l1lcase): if item in l2: res.add(l1[idx]) for item in l1lcase: for item2 in l2: if item2 in item: res.add(item2) break for item in l1lcase: if 'scary' in item or 'spooky' in item: res.add('Horror') break return ', '.join(res) Code:
for item in l1lcase: for item2 in l2: if item2 in item: res.add(item2) break res.add(titlecase(item2)) but that thows an error Maybe i need to keep the list2 in titlecase and lowercase it as i go, ill try to figure it out but if you can put me on the right path i would really appreciate it. Thanks! Last edited by smoothrolla; 11-10-2011 at 04:53 PM. |
11-10-2011, 05:03 PM | #4 |
Member
Posts: 13
Karma: 10
Join Date: Nov 2011
Device: kindle
|
Ok i got a solution, probably inelegant though
i create another list in titlecase of the tags i want to do a partial search for, as i decided i didnt want to search for them all (for example science is in science fiction so i got both tags which i didnt really want) Code:
def evaluate(self, formatter, kwargs, mi, locals, val): list1 = val l2 = ['adult', 'adventure', 'anthologies', 'biography', 'childrens', 'classics', 'drugs', 'fantasy', 'food', 'football', 'health', 'history', 'historical', 'horror', 'humour', 'inspirational', 'modern', 'music', 'mystery', 'non-fiction', 'poetry', 'political', 'philosophy', 'psychological', 'reference', 'religion', 'romance', 'science', 'science fiction', 'self help', 'short stories', 'sociology', 'spirituality', 'suspense', 'thriller', 'travel', 'vampires', 'war', 'western', 'writing', 'young adult'] l1 = [l.strip() for l in list1.split(',') if l.strip()] l1lcase = [icu_lower(l) for l in l1] res = set() for idx,item in enumerate(l1lcase): if item in l2: res.add(l1[idx]) l3 = ['Adult', 'Adventure', 'Anthologies', 'Biography', 'Childrens', 'Classics', 'Drugs', 'Fantasy', 'Food', 'Football', 'Health', 'History', 'Historical', 'Horror', 'Humour', 'Inspirational', 'Modern', 'Music', 'Mystery', 'Non-Fiction', 'Poetry', 'Political', 'Philosophy', 'Psychological', 'Reference', 'Religion', 'Romance', 'Science fiction', 'Self Help', 'Short Stories', 'Sociology', 'Spirituality', 'Suspense', 'Thriller', 'Travel', 'Vampires', 'War', 'Western', 'Writing', 'Young Adult'] for item in l1lcase: for item2 in l3: check = item2.lower() if check in item: res.add(item2) break for item in l1lcase: if 'scary' in item or 'spooky' in item: res.add('Horror') break return ', '.join(res) Thanks again for your help! |
11-10-2011, 05:04 PM | #5 | |
Grand Sorcerer
Posts: 12,032
Karma: 7257323
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
chaley, not charley.
Quote:
Code:
for idx,item in enumerate(l1lcase): for item2 in l2: if item2 in item: res.add(l1[idx]) break The enumerate operator returns the index and the value (a tuple in python terms), which in this case is the index and the lowercase version of the value. Because l1lcase and l1 are parallel arrays, the l1[idx] gets the equivalent item for the one in l1lcase, which is the cased version. |
|
Advert | |
|
11-10-2011, 05:18 PM | #6 | ||
Member
Posts: 13
Karma: 10
Join Date: Nov 2011
Device: kindle
|
Quote:
Quote:
I come up with a slightly different way of doing it, posted a minute before your reply so not sure if you saw it, its probably laughable mind you thanks |
||
11-10-2011, 06:12 PM | #7 |
Member
Posts: 13
Karma: 10
Join Date: Nov 2011
Device: kindle
|
I thought i would post my final approach here incase someone else finds it usefull in the future
i need to add some more tags->genre mappings (like football->sports etc) but you get the idea Code:
def evaluate(self, formatter, kwargs, mi, locals, val): # turn the tags into an array and create a lowercase version tagslist = [l.strip() for l in val.split(',') if l.strip()] tagslistlcase = [icu_lower(l) for l in tagslist] # my list of genres i want, and create a lowercase version genrelist = ['Adult', 'Adventure', 'Anthologies', 'Biography', 'Childrens', 'Classics', 'Drugs', 'Fantasy', 'Food', 'Football', 'Health', 'History', 'Historical', 'Horror', 'Humour', 'Inspirational', 'Modern', 'Music', 'Mystery', 'Non-Fiction', 'Poetry', 'Political', 'Philosophy', 'Psychological', 'Reference', 'Religion', 'Romance', 'Science', 'Science Fiction', 'Self Help', 'Short Stories', 'Sociology', 'Spirituality', 'Suspense', 'Thriller', 'Travel', 'Vampires', 'War', 'Western', 'Writing', 'Young Adult'] genrelistlcase = [icu_lower(l) for l in genrelist] res = set() # loop through the genres for idx,genre in enumerate(genrelistlcase): # loop through the tags and see if the genre is contained in a tag for tag in tagslistlcase: if genre in tag: # dont add science if it was found in science fiction if genre != 'science' or (genre == 'science' and 'science fiction' not in tag): # add to array res.add(genrelist[idx]) break # final loop through the tags to look for specific tags i want to map to a genre for tag in tagslistlcase: if 'religious' in tag or 'christian' in tag: res.add('Religion') if 'children' in tag: res.add('Childrens') # join the array into a string and return return ', '.join(res) Last edited by smoothrolla; 11-10-2011 at 07:13 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
how to move tags data into a new custom column | smoothrolla | Library Management | 6 | 05-30-2018 08:19 AM |
Custom Columns - How are you using yours? | nynaevelan | Library Management | 19 | 04-18-2011 01:42 AM |
Can custom book data be displayed in a custom column? | kiwidude | Development | 9 | 03-02-2011 06:35 AM |
Techniques to use plugboards, custom columns and templates | kovidgoyal | Library Management | 0 | 01-26-2011 05:21 PM |
ADD Books & extract tags from title? | johnb0647 | Calibre | 3 | 01-08-2011 06:36 PM |