Ad blocker interference detected!
Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers
Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.
INTRODUCTION: MOVING THEORY INTO PRACTICE / ANNE R. KENNEY AND OYA Y. RIEGER Holistic approach to digitizing: each stage affects the next. Selection > processing and processing Communication and planning within the institution for planning and long-term success. Archivists, librarians, systems, administrators, funders, scholars, general users Survey: USERS: 1) Users impatient with long waits 2) need both text and images DISPLAY: 3) content with current image quality (120 to 200 dpi, 800 x 600 pixels, scroll one way only) 4) thumbnail 50 – 100 pixels + screen size 400 pixels high + optional full-size image 5) full-size images such as maps need zoom, pan, and peripheral view NAVIGATION 6) Simple navigation to get to image (flatten hierarchies, minimize scrolling and jumping, provide preliminary info) and 7) navigation between images (previous, next, page, and table of contents) 8) manipulate images (zoom, compare 2 images, save, sort, export, print, annotate images, image editing) SEARCH: 9) variety of searched 10) variety of access points (artist, title, style or period) 11) database choices
SELECTION FOR DIGITAL CONVERSION / PAULA DE STEFANO 1) Copyright – collections with stated agreements, with implied agreements, public domain, images linked to copyright information with further instructions on how to obtain permission on using the images. P. 12 has copyright basics from Georgia Harper, UT System copyright attorney 2) Accessibility – institutional and collaborative projects 3) New / better access – a) data recovery of damaged documents b) OCR text searchable 4) Content - Virtual collection need for critical mass (comprehensive) Can use RLG’s microfilm selection model, but microfilming (sometimes low use materials) for preservation vs. digitizing for access are different goals. Scholarly input 5) Preservation – scanning for preservation alone not as good as microfilming or photocopying, but scanning/printing for preservation and add benefits of access (possibly also OCR/text searchable). Preservation of digital files also a concern
DIGITAL BENCHMARKING FOR CONVERSION AND ACCESS / ANNE R. KENNEY1) Digital masters > access images 2) Informational content a) conservation, will the original be damaged for a good quality image? b) higher quality > higher cost > slower production but may translate to lower long-term costs c) current vs future technology d) user needs 3) Proposed methodology – info on source documents in digital equivalencies, determining tolderances and deviation values, calibrate and test capture systems, visually inspect against originals, record metrics, process, settings and requirements 4) Digital copies from originals vs. microfilm or other intermediary – need to determine based on case by case basis, decision determined by material and quality of intermediary 5) Informational content – is the detail captured in a high quality scan really necessary for the extra effort? Eg scan simple black and white line drawing at color 1200 dpi or complex color map at 300 dpi ; legibility vs. fidelity 6) Assess document attributes: a. physical type, size and presentation i. reflective (paper) - bound or single sheets, include border? include ruler? Include color or grayscale bar? ii. transparent (slides) – slide or negative roll iii. will items fit on available flatbed scanner or require other equipment? iv. physical size affects file size/quality. Larger item = larger file size or larger item = same file size, lower quality (color, resolution etc.) b. physical condition i. scan without damaging original or special handling ii. problems: bleed through, faded ink, staining etc. iii. archival imaging (capturing item as artifact) vs. preservation imaging (capturing text or information on the item) c. document classification i. printed text, simple line art > manuscripts > halftone items > continous tone > mixed = bitonal low resolution >> color high resolution ii. Reflective properties of items themselves (eg gold leaf) iii. Tonal ranges (lightest light > darkest dark) d. Color i. Original vs “aged” color ii. Limited colors (poster) vs full color range (photograph) iii. Purpose of color: aesthetic, artistic or informational (maps and pie charts) iv. Importance of maintaining color (any shade of blue for map or specific blue for a Chagall painting) v. Details about measuring color etc. e. Detail i. Measuring detail by the height of the smallest letter or smallest line using the QI (quality index) method ii. QI = (dpi x .039h)/2 (/3 for bitonal) iii. Details about measuring detail requirements and scanner quality (not just dpi) such as MTF (modular Transfer Function) f. Conversion requirements for printed output: Scanning resolution = printer resolution x enlargement factor g. Image processing: Controversial of how much to do to archival file. Generally acceptable are reduction of greater than 8-bit/channel linear data to 8-bit nonlinear data, contrast stretching, minimal adjustments for color and tone, descreening/rescreening of halftones h. Tif file format accepted archival file. Chart on p. 53-54 with file formats and properties i. Compression for archival files – controversial, even lossless compression may lose data during file migration or other circumstances j. Benchmarking for Access: delivering access files to researchers how they want them: Speed of delivery, Resolution to fit on screen or standard dpi, Bit-depth, Compression, Multi-image files (pages of a book), Increasing bandwidth, complete image, quality/detail of image (can conflict with complete image) Of all of these, often have to reduce one to get the other k. Problems represented by users monitors, color representation
ESTABLISHING A QUALITY CONTROL PROGRAM / OYA Y. RIEGER o Quality control – by digitizing staff during production o Quality review / assurance – final inspection by project staff o Tempting not to have a QC program, especially for in-house, but important to factor in effect of poor image quality, file longevity, institution’s web presence without one o Create QC that will suit budget, technical infrastructure, staff qualifications, materials, and available time
1. Prerequisites a. Identify products and goals – master and derivative images, printouts, storage, metadata, converted text, and marked-up files. Review results based on product eg 100% of OCR text conversion, but 10% of TIFF images b. Agree on standards eg paper copies to replace originals for display, specify paper stock, size, toner, printer resolution etc. Low standards – must be rigourous in review to ensure meet those minimums. Higher standards may allow more room for tolerance c. Determine reference points: original newspaper > microfilm > tif > access file. Compare access file to original or tif to microfilm? d. Keep in mind future tools and technologies Sidebar p. 64-65 color (RGB, CMYK, Hue, Saturation, Brightness) 2. Setup a. Scope – all images or sampling. If not, return all or just defective b. Methods – view at 100% on monitor (caution must be aware of limitations of monitor, better to either re-scan or adjust access file not the tiff file), use grayscale and color targets to evaluate color visually and by checking RGB values, use histograms and charts to evaluate, examine printouts by itself or compare printouts next to originals, use image quality control reports generated by quality control software c. Controlling QC environment – be aware of limitations of hardware, software and human viewers. Monitors - large monitor with full color capabilities (some monitors color purity and brightness vary as much as 25% from center to the corners) LCD vs CRT, analog connection vs digital connection. Software must be compatible (If use Adobe Photoshop to view Kodak Photo CD need a plug-in to ensure correct mapping of colors). Monitor must be calibrated. Human - viewing conditions (eg not facing window, etc.) Wearing neutral colors (gray, black, white) to avoid clothing reflecting in monitor. Staff may take color vision tests to evaluate viewer’s vision. d. Evaluate system performance – scanner resolution, linearity, flare, scanner noise, artifacts (moiré, banding, etc) color reproductions e. Codify Inspection Procedures – who, how, at what stages, how extensive, instruments, hardware, software used, units of measurement. How to reject unacceptable producuts. In-house: include in instructions or workforms. Outsourced – specify during initial agreement. Who pays for correcting unacceptable products, what is returned – only defective or whole batch, what kinds of comments are necessary, what is the timeline for returning corrected versions, is a second round of QC required, how will the corrected products be integrated Sidebar p. 72-73 image quality metrics (resolution, color, tone, noise, artifacts) 3. Assessing image quality a. Resolution – scan resolution target to evaluate system p. 74 has chart of different targets b. Modulation Transfer Function (MTF) more reliable for grayscale or color. Still being developed when book was written c. Color and tone - more important for photographs, esp. highlights and shadows, may be subjective and harder to judge esp. considering variations of monitors and printers. Use histograms to evaluate tonal reproduction. Use color and grayscale targets to test system
METADATA: PRINCIPLES, PRACTICES, AND CHALLENGES / CARL LAGOZE AND SANDRA PAYETTE o Data about data – should be structured and computer-readable 1. Challenges a. Scale – webpages vs. books b. Functionality – one size fits all (eg MARC) doesn’t fit more complex needs eg rights, content rating, administration, preservation c. Permanence – disappearing URLS, website changes d. Integrity – bad metadata from bad/no training or malicious eg index-spamming e. Multiple metadata dimensions and forms – resource discovery, rights management, preservation, administration, provenance, content rating f. Extensibility – as grow, discover new needs eg audience ratings (PG, R) g. Community-specific – legal needs vs. subject specialists h. Functionality and simplicity – should be simple enough for easy creativity, but should not come at the cost of functionality 2. Metadata interoperability a. Semantic – if different schemas use same vocabulary for different purposes b. Structural interoperability – computers are very specific, won’t recognize ‘Bill Gates’ and ‘Gates, William H.’ as the same unless told c. Syntactic interoperability – exchanging use different structures and semantics eg HTML, RDF, XML 3. Functional Uses a. Resource discovery – how to allow people to find what they are looking for Scalability – sheer number of webpages and word confusion (search ‘archives’ in google, get back issues, data archives) Intellectual property – hidden behind access controlled barriers Format – limited to text searching. Some products being developed for better image searching Context – Medline & coronary vs Gale & heart b. Dublin Core – 15 elements, element qualifiers (creator – illustrator) and value qualifiers (1998-02-04 = February 4, 1998). For discovery, not presentation or navigation. c. Creating structure through 1) storing images in hierarchical file structure that mirrors objects, or 2) relational database or 3) document encoding (XML, TEI) Note: this book apparently written before EAD XML developed d. Metadata for rights management and access control – 1) protect intellectual property rights, 2) provide access controls, 3) facilitate transactions, 4) protecting digital materials from malicious or intentional attack, 5) monitor events in the life-cycle of digital resources. All must be written explicitly so that computers can understand and follow instructions e. Metadata for administration and preservation – RLG developed 16 metadata elements that describe key aspects of image files and the digitization process including 1) characteristics and features eg file format, spatial resolution, resolution, 2) production and reformatting features eg scanner make, model and serial number and 3) identification and integrity eg integrity checksums, links to other metadata 4. Implications for Digital Imaging Program a. How will users a) locate, b) interact with eg presentation and navigation, c) protect rights will still providing access, d) assure permanence eg persistant identifiers b. Metadata created for one function may also serve another, should be flexible, interoperable, extensible. Use well conceived data models, structures, standards and community-based semantics and rules
ENHANCING ACCESS TO DIGITAL IMAGE COLLECTIONS: SYSTEM BUILDING AND IMAGE PROCESSING / JOHN PRICE-WILKIN
1. Projects must be driven by cost-effective strategies, interoperability, and user-needs a. Iterative system building 1) Know your users – conduct user-centered designs and needs assessment, 2) review other projects online, 3) get started quickly with small amount of images, 4)review use of those images as digitize additional images, 5) revise as suggested by user-comments etc – don’t expect it to be perfect first time out b. Master / Derivative – a rich master file can lead to more derivatives with a wider range of uses. Recommend TIFF as lossless community standard. PDF contains all info, but proprietary, JPEG is community, but lossy. Use appropriate derivative files (see p. 52 for comparative table) and the software to use them. One software may be very powerful, but slow or very fast but without capabilities needed i. Criteria – proprietary vs. open; Variant views eg zoom; cost-effective in long term, performance of host system and user’s desktop ii. Fit the file to the purpose – pan and zoom high resolution large format images, just-in-time derivative creation c. Enhancing access with text – joint image / text displays i. OCR, Corrected OCR, keyboarding – costs to have low error files. From straight OCR (more errors) to keyboarding agencies (more costly) to type in text
IMAGE MANAGEMENT SYSTEMS AND WEB DELIVERY / PETER B. HIRTLE 1. Typical features of an image management system a. Record storage location of master images and derivatives usually by maintaining pointers to the file locations of images in the image management system. Image may also be included b. Search for and retrieve images c. Provide access interface frequently via the web d. Provide a contextual framework for the images eg table of contents e. Track and control source material and generated images during digitization project f. Create an audit trail of modifications to the images and metadata g. Control access to images h. Provide an inventory of completed work i. Automate and validate data entry j. Facilitate use of controlled language k. Provide a home for the metadata needed for long-term access to the images
May have combination of systems, eg, one for in-house management, one for public search and display
1. How will the system be used? a. Purpose – public or in-house? Static or frequently up-dated? b. Size – number of images, size of images, number of simultaneous users. Can start small, but be sure can upgrade later c. Complexity and volatility – one small collection fully digitized and described, once it’s done it’s unlikely to change, simple keyword searching > multiple collections with complex descriptions and searching, frequently updated with title, creator, date searching d. Authentication and audit requirements – who has access to add / edit / delete images on staff. Who has access to metadata only, small, low resolution images, or large high resolution images e. Expected demand and performance – how many people will access the system at the same time? How long can you expect them to wait during peak usage? At the same time, don’t want too robust of a system and not have any images or users f. Available technical infrastructure: hardware, software and personnel – what is available at your own institution. Explore collaborative programs. Take into account staff skills when selecting a system (home-grown vs. off-the-shelf). How will you maintain the system if as personnel leave? g. Cost – take both long-term and short-term costs into account. Initial costs, staff time for all aspects 2. System design features a. Flexibility and scalability to grow with system or migrate to a larger system b. Single vs. multiple databases – May use different databases for management of physical items (how acquired, when digitized, how digitized) and another for public access to images c. Adherence to data system, structure, and content standards – make it easier to exchange information between systems and aid in the migration of data. Use XML, Z39.50, MARC, Dublin Core etc. when possible d. Expected life span of the data and potential for migration. Even non-standard database should be designed and data entered to promote future migration. Even if don’t use standard like Dublin Core, make sure you’re consistent in data entry. Eg not Jan. 2, 2007 ; January 2, 2007, 1-2-07; 3. Image management systems in practice (note: one example of outdated chapter, but the advantages / disadvantages are still good to review) a. In-house systems using desktop databases (MS Access, FileMaker Pro, Corel Paradox). Advantages: Usually already have software, more likely to find in-house technical support and other users for advice. Disadvantages: have to design own database, and have to design it well (see #2 above). Often limited in size and user access b. In-house client / server architecture – 4th Dimension, Oracle, Informix, Sybase. Advantages: more robust than desk-top databases Disadvantages: more expensive than desktop systems. Generally better to use extra expense to get specialized databases already designed for digital imaging management c. Specialized desktop system – Canto Cumulus, Image AXS Pro, and Extensis. Advantages: already designed for novices and home users. Disadvantages: design may not be good, may not be flexible d. Specialized client / server applications. Many library catalog systems are installing imaging systems. Advantages: metadata entry and searching already built in, already have staff who know system Disadvantages: many library systems still require MARC format which does not easily accommodate all metadata (how scanned, multiple derivatives). Create one record for multiple images, or 300 records for each image. Can also use museum-oriented system (Multi MIMSY, EmbARK, ARGUS) Advantages: built for images for image management. Disadvantages: may not be based on standards, may not be flexible and scalable, e. SGML/XML based solutions – “relatively new approach” this section 4. Web delivery – Quality of the web interface in many ways determines the ultimate success of the image management system
PROJECTS TO PROGRAMS: DEVELOPING A DIGITAL PRESERVATION POLICY / OYA Y. RIEGER
Retaining digital image collections in a usable and interpretable form for the long-term. 1. Goals of digital preservation a. bit identity – ensure content (bit stream) integrity by i. monitoring for corruption to data fixity and authenticity ii. protecting the content from undocumented alteration iii. securing the data from unauthorized use and providing media stability b. technical context – maintain interactions among the elements of the wider digital environment by i. preserving the context (ensuring the metadata files and scripts that link these files to the images) ii. maintaining the integrity of links iii. monitoring dynamic document creation (on-the-fly conversion or periodic metadata updates) c. provenance – maintain record of the content’s origin and history – describing the source of the digital content and alterations that have taken place since its creation such as image and metadata updates or changes to the storage media d. references and usabilitiy – ensure users can easily locate, retrieve, and use the digital image collection indefinitely 5. Access vs. Preservation – preserve deteriorating originals or enhance access 6. Digital images as objects – content, metadata, and accompanying (eg MS Access) files 7. Digital images as artifacts – although the content of the images is the same, digital images allow you to navigate and zoom and do other things cannot do with traditional images 8. Components of a digital preservation policy a. Organizational infrastructure – establish digital preservation as an institutional responsibility with committee financial and staff support b. Seek opportunities for cooperation and collaboration c. Policies for selection, conversion, and reselection – select and create digital image collections with long-lasting value that take into account standard selection policies and long-term preservation concerns, are based on standards, are well-documented with metadata d. Preservation actions i. Store digital media with care – use industry standard digital tape or other approved reliable media, store off-site backup copies, store multiple versions with different software to prevent corruption from malfunctions, viruses or bugs, check periodically ii. Evaluate and implement preservation strategies based on factors such as image attributes, user and access requirements etc. Understand that current strategies are merely tools to control risks iii. Consider hybrid approach because there is no clear ideal of “long term digital” also create analog backup such as microfilm and printouts e. Technology forecasting - assess risks by monitoring technological changes 9. Digital preservation strategies a. Refreshing – copying all bits exactly to an identical medium as preventive care to control aging and decay, avoid media obsolence, take advantage of more efficient and reliable storage technologies. Refreshing may unintentionally change bits, if merge into large medium (eg CD to DVD) need to ensure labeling is correct b. Migration – transfer digital information from one hardware and software setting to another or from one computer generation to subsequent generations to increase system functionality as the technology evolves. Migration requires thorough planning and analysis, it does not scale well. c. Emulation – recreates the technical environment required to view and use a digital collection. Maintain information about the hardware and software requirements so the system can be reengineered to recreate its original functionality, look and feel, develop a scalable and extensible preservation strategy. Emulation may be costly with no guarantees, and presumes that users will understand how to use old systems d. Technology preservation – preserves the technical environment including the software and hardware – like keeping a technological museum. Likely to be expensive and impractical e. Digital archaeology – rescue content from damaged media or from obsolete or damaged hardware and software environments – not documented, likely to be expensive 10. Promising digital preservation initiatives List of projects then current in 2000 BIIF (Basic Image Interchange Format) ; CEDARS (Curl Exemplars in Digital Archives) ; Collection-based long-term preservation ; Data provenance ; OAIS (ISO Reference model for an open archival information system) ; InterPARES Project (International Research on Permanent Authentic Records Electronic Systems) ; Kultararw Heritage Project ; NEDLIB (Networked European Deposit Library) ; PRISM (Preservation, Reliability, Interoperability, Security, Metadata) ; Preservation management of digital materials ; PANDORA (Preserving and accessing networked documentary resources of Australia) ; Preserving Cornell’s Digital Image Collections ; RLG-DLF Task Force on policy and practice for long-term retention of digital materials ; Stanford archival digital library architecture: archival digital libraries repositories
PROJECTS TO PROGRAMS: MAINSTREAMING DIGITAL IMAGING INITIATIVES / ANNE R. KENNEY. Transitioning from projects to programs. Digital collections are institutional assets, must shift thinking away from short-term experimental to long-term value. Change view of “flashy” often soft-money programs to institutional collection complete with policies (selection, access etc.) and organizational funding.
1. Develop formal policies to encompass the life cycle of digital resources. Need comprehensive policies to cover all issues from access and selection to user retrieval a. Tie policies to institutional resources b. Analyze current digital imaging projects for efficiencies and economies c. Document the process and the product d. Undertake the doable e. Staff the program with well-trained people f. Incorporate flexibility into transition g. Develop mult-institutional collaborations 2. Financing digital imaging programs – it will not pay for itself or be cost-effective! a. Costs – no consensus, available figures vary tremendously depending on types of materials, software, hardware, etc. i. Personnel ii. Equipment iii. Cataloging iv. Supplies v. Contingency (trouble-shooting, training, systems support) vi. Overhead / indirect (space, utilities, administrative support) b. Is digitization economically viable i. Reducing institutional expenses – access to virtual files vs. physical files. Still need to take into account storage of physical files, even if digital images are primarily accessed ii. Recovering costs – charging to providing digital images c. Do not undertake digitizing program to save costs, but to provide additional access to materials for changing clientele