To select a PDF document to be indexed, click the "+" icon.
You can also select a directory in the process. Then all PDF documents within the directory will be loaded.
To delete a document, select it and click on the "-" icon.
You can change the order of the documents using drag & drop.
For each source document you can specify from which or to which page the indexing should happen. Enter the information in the columns "From" and "To". If you leave the fields empty, the entire document will be indexed.
Double-click on the file name of a source document to see a preview of the document.
Page count
In order to properly assign page numbers, it is important that IndexMaker knows which physical page the number 1 page is on.
If the first document starts with page 1 or a higher page number, enter it in the text box and select the first option.
If the page count starts on a later page with page count 1, select the second option and enter the page of the PDFs where the page count starts. All pages before that, will be indexed with Roman numerals.
If you index several documents at once, you can specify that the numbering should be continuous. Otherwise, it will start again with page number 1 for each document.
You must make these settings before you create the index!
Specify which should be the allowed characters when indexing and which characters should serve as word separators. Usually the basic settings are sufficient and do not need to be changed.
However, you can add additional characters if necessary. If you also want Greek, Hebrew or Cyrillic characters or Katakana or Sanskrit to be recognized, you can add them to the allowed characters by checking the checkbox.
You can specify that email addresses should not be separated when indexing, even though they contain word separator characters (@ and period). You can specify this for prices (11,99) or times (14:45) as well.
Or you can formulate your own condition using Regular Expressions.
Indexing is always case-sensitive.
You can specify whether or not to be case sensitive when comparing your target and stop words to the text of the source documents.
If you place one term hierarchically below another, it will no longer be in its original position in the index but only below the hierarchical head term. You can, however, place a cross-reference in its original place, such as "see: ...". In the reference text text box, enter the text you want your references to begin with.
You can set the font size of the edit table here.
Column 1:
In the filter panel you have many options to reduce the number of items in the index or change them.
Instead of reducing and filtering the full text index step by step, you can alternatively define target words, i.e. specify a positive selection of words that should appear in the index.
Target words are collected in the list editor in the target word list and appear in the index list in green.
With the last two options, the word is automatically added to the target word list. Likewise, all its substitutions - if any - will be added to the target word list. In the filter palette you can see the current number of target words.
.If you select "hide others" for the target words in the filter palette, only the entries that were defined as target words and found in the PDF will be shown.
If no target words have been defined yet or none of the target words were found in the index, the index is empty.
You can enter names of persons here in two forms: As "John Doe" or as "Doe, John". Accordingly, the entry appears in the index. If you specify "Doe, John", "John Doe" will also be searched for in the text.
Stop words are words that are so meaningless that they should not appear in the index, such as: "in, at, with, and, the, that,..." etc. They are grayed out in the index list and not included in the preview or export.
To specify a word as a stop word you have the following options:The word will then be automatically added to the stop word list. Likewise, all its substitutions - if any - will be added to the stop word list.
In the filter palette you can see the current number of stop words. You can prevent stop words from being displayed by checking the appropriate box in the Filter Palette.
The substitution list is used to replace words with another word. This is useful if words are to be traced back to their basic form, for example (e.g. tree -> tree). These words are displayed in gray in the index list, since their addresses are added to their basic forms in the preview or during export.
Substitutions automatically become target words as well.
The word is then automatically added to the substitution list. In the filter palette you can see the current number of substitutions.
.
You can use this list to group terms under a generic term. For example, if the terms oak, beech, maple, etc. should appear under the generic term tree. In the index, the sub-terms are displayed as follows: Oak (-> Tree). The hierarchy can contain several levels. To avoid cicle references, sub-terms cannot be their own super-terms.
To make a word a hierarchical subheading of another you have the following options:The word is then automatically added to the hierarchy list. In the filter palette you can see the current number of hierarchies.
.
The font filter can be used to remove words from the index that occur in a specific font/size in the document. All fonts occurring in the document are listed. To remove words of a certain font from the index, the corresponding checkbox must be deactivated.
This can be helpful, for example, to exclude headings or footnotes. The prerequisite for this is that these are present in a delimitable font.
This function is usable with indexes created from version 5.1.
The raw index initially contains redundant addresses (details of where they were found). You can determine the formatting of the address specification. Five formats are available for this purpose.
.
If you index several documents at the same time, you can display the file names in the addresses.
If you check the "Only failed spell check" box in the filter palette, only the words for which the spelling correction failed will be displayed. This feature is good for correcting any spelling errors.
You can disable the spell checker in the preferences.
If you select "Do not index numbers" in the filter palette (see 3.2.0), you can exclude entries that contain numbers. With the gear icon you can define:
.
With the menu item "Edit > Reduce Index" you can reduce the size of the index.
This permanently deletes all entries marked as stop words and all other filtered-out words such as numbers, etc. from the index and can no longer be reconstructed.
However, a smaller index is processed faster and saves memory.
Since many PDF documents represent the character string "- " as a word breaker, the IndexMaker cannot distinguish between enumeration ("in- or outbound") and hyphenation ("im- printed"). The IndexMaker pulls these words together because separations are much more common. The problem also arises when a hyphen crosses a line boundary. E.g. Goethe-Symposium. Where "Goethe-" is at the end of one and "Symposium" at the beginning of the next line.
The reason is - unlike the typewriter - the word processing programs do not know hard line changes. So the text can be rearranged if the font size changes.
Separations across pages can also not be recognized because there is no distinction between text and footnotes.
A hyphen between the main words is taken into account: Art-Fair remain with a hyphen.
Unfortunately, the UNICODE character 00AD cannot be processed.
If you want to index full names, you can specify names of people in two forms in the connected words list: As "John Doe" or as "Doe, John". The entry in the index will appear accordingly. If you specify "Doe, John", "John Doe" will also be searched for in the text.
In the preview mode you can format your index for the printing or for export.
Select the components you want to output and their order. Depending on which components you select, the corresponding palettes appear below to specify the details.
You can also attach your index to the original document here and add links if necessary.
You can format the finished index in different ways, such as:
Keep in mind that bold and italic font styles are not available for every font.
Numbers in the index:
In the preview, the index entries are sorted alphanumerically (1,2,...9,10,11,...20...A,B,C...Z).
Leading zeros in front of numbers are ignored (002x -> 2).
You can also combine the numbers in the preview so that all numbers appear under the initial "0...". To do this, place a checkmark next to "Summarize numbers".
Page numbers can also be right-aligned if left-alignment or justification has been selected for the font.
Once you have made changes, click on "Apply" and the layout of the index will be recalculated.
If you leave the field for the separator empty, an entry will only be separated from its addresses by a space.
Alternatively, you can specify any character or string as a separator.
Specify the title and settings for the glossary here
Specify the title and settings for the list of illustrations here
Note that entries only appear here if something has been entered in the text field in Edit mode.
In the preview, under the tab Analysis, an evaluation of the word frequencies can be created.
.
The list is grouped by frequency and can be sorted in ascending and descending order.
The further appearance of the analysis depends on the settings of the index.
Once you have made changes, click on "Apply" and the layout of the analysis will be recalculated.
A graph can also be generated for evaluation.
The graph appears at the end of the evaluation. You can specify the axis labeling and the grid.
For the PDF export you can specify here with which page number the index should start. This way you can better attach the index to the source document later.
.
The position of the page number can also be defined by specifying the distances from the outer and lower margins.
Select the checkbox if no page number should be displayed.
Also set the font size of the page numbers here.
You can set the distances between text and page bounds.
If you like to switch the inner and outer distance for even and odd page numbers select the "Switch left/right pages" checkbox.
Once you have made changes, click on "Apply" and the layout will be recalculated.
In the HTML output, an A-Z navigation can be included at the beginning of the document.
You can open indexes that have already been edited and saved with "Open Index".
You can merge an already existing index with another one by adding another one to the opened one with the menu item "Add Index...". All settings (address formatting, maximum word length, etc.) will be retained from the first index.
When saving, all settings and lists are saved with in the document. The file format extension is .idxm
.If you later want to index other PDF documents with the same set of stop and target words, hierarchies and substitutions, select "Save lists without index". This will only save the lists you have created, which can then be reused.
The index can be exported in numerous formats.
Specify here in which format your index should be exported.
Which character should be inserted between word and annotation, annotation and addresses and how should both be indented?
Also specify how indentation should look like per hierarchical level.
With the CSV export, you can process the data of the index in a database.
The SQL command for creating a corresponding table could look like this:
CREATE TABLE words ( word tinytext, frequency int(11) DEFAULT NULL, annotation text, addresses text, basicForm tinytext, stoppword tinytext, targetword tinytext, id int(11) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
You can also use the CSV_Export to sort the index by the frequency of occurrence of the words.
.
Unfortunately it is currently not possible to sort the index by frequencies by clicking in the table header. But you can export your index as a CSV file. You can then import this file into Excel or Numbers and then sort it accordingly.
You can export your index in Microsoft Word format.
Note that this does not preserve the column settings, but they are easy to reconstruct in Word.
With the IndexMaker it is easily possible to create a list of figures.
.To do this, select the entry List of Figures or Option-Command-F in the Edit menu. The IndexMaker tries to recognize all images in the PDF and lists a small preview.
With the checkbox you can exclude individual entries. On the right side you can set the preview size or filter the images by size.
If an image is missing, you can add more lines or delete excess lines via the menu. Added lines can be assigned to a page number.
An essential tool for editing the index are the lists used to filter the raw index.
.
You open the List Editor in the Edit menu by clicking the List Editor button, in the Filter palette, or by clicking the List icon in the window header. You can use this editor to edit or create stop word, target word, substitution, and hierarchy lists at any time.
Words that were not considered during indexing are displayed in gray.
In the menu of the editor's header, select the list you want to see or edit.
In the footer of the editor you can:You can also make all the words in the index target words or stop words, and then make a negative selection.
If you want to import a table in which the first and last names are in separate columns into IndexMaker as a list for target words, they must first be linked.
.
Then select the column (C in this case) and copy it.
Then open the TextEdit program and first convert the blank document to "plain text" under Format. Now paste the copied table column and save the document with the extension .txt. When saving, make sure that "Unicode (UTF-8)" is selected as the encoding.
To create a target word list from MS Word document, you have to save the list as a UTF-8 encoded plain text document.
In the Word palette, you can affect the entry selected in the index as follows:
The search dialog "Context" can be called up via the window menu, via cmd-F or the search icon in the header of the window.
If you have previously selected a term in the raw index, the context search will open immediately with the corresponding found locations.
.
You can use it to search the index for terms. You will then receive a list with all references including the information in which document and on which page the term can be found.
With the eye symbol you can deactivate or activate each finding place.
Next to it you see a short text excerpt with the occurrence of the term. With the slider you can determine the size of the text section.
You can also use a wildcard (*) for the search, which you can specify before or after the search term. Alternatively, you can search for phrases by entering multiple words.
If you select the checkbox "show original font", the found locations are displayed in their original font.
This function can be used with indexes created from version 5.1 onwards.
In the lower part of the context search you can see graphically where in the document the current search term occurs. (Each line represents 100 pages)
This function can be used with indexes created from version 5.1 onwards.
Click on an entry in the index
double click | open context | |
Shift-Click | Target Word on/off | entry in the index turns green |
Option-Click | Stop Word on/off | entry in the index turns gray |
Ctrl-Click | Clicked word becomes the substitute of the current selection | The basic form is shown behind an arrow ? and the word turns gray |
Cmd-Click | Clicked word becomes the hierarchical subitem of the current selection | The umbrella term is shown in brackets after an arrow ? |
ctrl-s | Make the current selection in the index list the stop word td |
ctrl-z | Make the current selection in the index list the target word td |
Not all names are found although a name appears several times in the text. This may be due to the apostrophe characters.
Only the names without apostrophe are found.
One possible solution is to define multiple entries in the target word list.
The best solution here is to copy the apostrophe from the PDF and set it as a word separator in the preferences and restart the index.
Since there are a number of similar looking apostrophes, this is the safest method.
If you have entries such as the name "Thomas Schmidt" that are found in the index, but they are two different people with the same name, you can also split the index entry.
To do this, select the entry in the raw index and duplicate it in the Word palette. Give the duplicate a unique name (e.g. "Thomas Schmidt (Hamburg)"). For the original entry, specify a substitution (e.g. "Thomas Schmidt (Munich)").
In the context search, you can now show or hide the corresponding occurrences for each of the entries.
If, for example, 11 occurrences are shown for a term in edit mode, but fewer entries can be found in the preview, this may be due to the address format. For example, formats with "f." summarize subsequent pages.
Problem: An entry in the index is to be differentiated again. In this example, there are passages in the text that refer to John as a farmer and other passages that refer to John as a preacher. These are to be output individually.
1) Select "John" and create two duplicates in the word palette ("Duplicate item") One duplicate is called "as preacher", the other "as farmer".
2) Double-click on "as preacher" in the list. The context search opens. By clicking on the eye symbol, the addresses that do not belong in this category can now be removed. The same applies to "as farmer"
3) Select "as preacher" and enter "John" as an umbrella term in the word palette. Likewise for "as farmer".