Scratchpad

If you are new to Scratchpad, and want full access as a Scratchpad editor, create an account!
If you already have an account, log in and have fun!!

READ MORE

Scratchpad
Advertisement

Building digital archives, descriptions and displays / Frederick Stielow © 2003


Summary:[]

Intended for museums, libraries, and archives seeking to offer digital archives through the World Wide Web. Based on skill levels and typical duties are

  1. First, archivists provide the primary descriptive model and have the most experience with digital archives. Archivists are the editors who organize, describe, and title groups of materials with unique and enduring value.
  2. Second, librarians are the most technologically advanced
  3. Third, museums and historical associations typically have the lowest technological skills, but interpretive skills and web displays are the most advanced

In the virtual world these 3 may share a commitment to display holdings in the www. The typical online patron will often blur the distinction between the archives, libraries and museums.

This book builds on existing professional knowledge, and is designed to teach how to “think web”.

It is very much of a "how to" manual and overview written for the beginner to intermediate user, with detailed information down to screen shots of how to do various tasks.

I have glossed over some areas, indicated by [brackets]


Detailed notes: []

Chapter 1. Planning the digitization project

  1. Team setup - Select a team to determine the following
    1. Content determination
    2. Description and cataloging
    3. Project management
    4. Graphic design
    5. Computer and web applications
  1. All projects should follow these steps
    1. Preparation
    2. Data gathering
      1. Internal
      2. External
      3. Technology
    3. Analysis
    4. Solution
    5. Pilot
      1. Alpha Stage
      2. Beta Testing
    6. Implementation


  1. Preparation - Problem definition – make sure everyone on the team is working toward the same goal. Divide the problem into current and retrospective projects
  2. Gather data
    1. Internal – what resources, staff, and institutional framework is available
    2. External – what are similar institutions doing? What resources are available from other organizations? What are the standards and recommended practices?
    3. Technology – what are the standards and what is available in the market place?
  3. Analysis - Careful review of data – what do you want to do? Do you have or can you get the resources? What will it cost? Will you have to give something else up in return?
  4. Solutions - Consider all options from above. In general, select the easiest and least complicated solution first and try to get the biggest bang for the buck.
  5. Pilot – lay the groundwork and implement. Return to analysis if necessary
  6. Implementation – start with short or easy projects

Metadata – provides structural, administrative and descriptive information

[Book provides short history of www, http, XHTML]

SGML – mid 1980s, commissioned by ANSI. Open-source code to automate book production. Used ASCII and kernel of tags and rules.

XML – 1996 working group of 12 assigned by W3C to design “well-formulated” documents with clearly defined content. Group re-designed SGML – streamlined and conformed to the http daemon. Created a flexible, hierarchical system. Results published in 1997.

XML Pros – speedier, allows for comparisons among institutional records and between multiple institutions, scalable to web, PDAs, cell-phones and future devices. Offers authentication and transaction, uses Unicode which extends ASCII to include all the world’s writing systems. XML Cons – verbose and resource-greedy, rule-driven. Requires parser to review and validate tags. Requires stylesheets (XSL) to render fonts, margins, headers on the web.

For more info go to www.w3.org find the XML in the school zone

RDF – Resource Description Framework – metaresources for all metalanguages. Builds XMLs namespaces xmlns and allows linkages to trusted formats like EAD

Chapter 2. Understanding digitized finding aids

  1. Preparation
    1. Divide project into 1) New and 2) Retro projects
    2. Take familiar objects that are not © protected and put them on the web

[Book shows samples of finding aids re-formatted with EAD tabs, and an explanation of the typical tags such as Title, Date, Size, etc.]

  1. Gather data
    1. Internal
      1. Administrative factors - Are there financial implications, are there charges for reproducing the finding aids? Does the existing staff have, or can they gain the knowledge for creating digital finding aids? Should outside expertise be hired?
      2. Usage – examine existing finding aids. How many collections are processed vs. unprocessed. What are the most popular finding aids? What collections have other concerns eg copyright or other restrictions?
      3. Production – how many finding aids are typically produced in a year? How many are already in electronic format (eg Word) vs. typewritten only? How long and how detailed are the finding aids?
      4. Parsing the layout – are the finding aids in a standardized format? Are they already laid out in EAD format? Will it be easy to identify the Title, Dates, Size, Biography or History, Scope and Contents, etc.?
    2. External research – look at other archives finding aids, attend training, look to LC, SAA and other organizations for standards and recommended practices http://www.loc.gov/ead
  2. Analysis –
    1. Audience – must be understandable by users. Staff must be comfortable with data entry and design. Consider using established standards and procedures and training
    2. Content – review optional elements as pertaining to your institution and finding aids. Do you want to add extras such as authority control for names and places?
    3. Automation – do you have existing software? Do you need to update or purchase new software? Does it have a solid commercial base and offer good customer support? Do you have IT staff for assistance?

Recommendation: stay with what you know such as a word document that offers .rtf

    1. Output – Print offers the advantages of having a copy at the desk for the researcher, can skim and get in-depth data. Electronic at the monitor offers keyword searching, but limited to what can see on the screen and scrolling up an down.
    2. Recon options - review usage, importance, size and complexity
      1. Do not transfer to electronic
      2. Scan typewritten finding aids using OCR to bring into electronic format
      3. Reformat electronic finding aids to EAD
  1. Solutions

Automation – recommend word processing for simplicity, learning curve, and cost

    1. File naming options
      1. Running number 1, 2, 3…
      2. 2 position year and access number 98-013
      3. multi-unit project and records management ex. 246-uaw-vp = Collection 246 of the United Automobile Workers Vice President’s office
    2. Modeling – keep layout simple, uncluttered and close to the print version
    3. Authority control – LCSH, Getty Provenance Index, Thesuarus of Geographic Names, ISAAR (International Standard of Archival Authority Records)
  1. Pilot
    1. Alpha Testing – establish model based on established conventions
      1. Open word processor, create template, keep blank template and enter data on copies
      2. Create file directory structure with folder 'holdings' and the subfolders: done, old, tool_box, working

For Recon files, keep a copy of the old finding aid in the holdings/old, and save as copy in the holdings/working as an .rtf file

      1. Create detailed instructions for staff on how to create finding aids
    1. Beta Testing –
      1. Keep backups!
      2. Get experienced processor to help with beta testing
      3. Begin with small, simple collections
      4. Have processor follow instructions created in Alpha stage
      5. Have processor keep notes of progress and problems
      6. Review progress
  1. Implementation
    1. Revise and correct instructions throughout the Pilot stage.
    2. Train staff and begin with staff open to learning new procedures
    3. Begin with small and simple finding aids

Look for ways these procedures will improve other areas within archives, such as name and author lists for description and reference


Chapter 3. Creating effective web finding aids Continues Ch. 2 adding html, hypertext, and web design. Takes in-house finding aids and documents and places them on the web

  1. Preparation
  2. Data gathering
    1. Research other institutions finding aids, pay attention to layout, terminology, hypertext
    2. Use software to convert .rtf to .html files. Can use anything from Notepad, or Word to Dreamweaver
    3. Test on different platforms Explorer, Netscape, Firefox, etc.
    4. Be kind to 56K modems. Try to keep files smaller than 100K
  3. Analysis
    1. Analyze existing web pages, existing staff, knowledge, and expertise, and existing finding aid usage. Take into account patrons, what they want and how they search for it
    2. Software – recommend word processing as the simplest route. Convert .rtf to .html. Ch. 4 discusses XML EAD
    3. Output
      1. .rft “Save As” to .html
      2. HEFA HTML-Encoded Finding Aids - add additional html coding for navigation
      3. Compare costs vs. benefits vs. direct conversion and HEFAS
      4. Consider mixed, direct conversion for majority, HEFAS for selected recon conversions and all new finding aids
  4. Solution
    1. Direct conversion “Save As web page” Minimum recommendations: Title tag metadata. Title, Dates and Repository information. eg “Mary Smith Papers, 1918-1964. University Archives.”
    2. HEFAs – create precoded templates. Can set up EAD standards (Title, Scope and Contents) and links Series I (a=href…). “Return to top” without actually encoding in EAD.
    3. Added encoding: Title and Keyword Metadata, Related Material, File to subfile connectors and content connectors.
  5. Pilot
    1. Alpha Stage
      1. Keep list of finding aids and status.
      2. Create another folder: /holdings/web. This will keep the “/holdings/done” as another backup.
      3. Provide links from central or home page
      4. Write instructions and train staff
    2. Beta Testing
      1. Review finding aids with team – how did they turn out? What were the problems, were the solutions adequate or does something need to be revisited? How much time did it take? Is there anything that needs to go back to analysis or solution?
  6. Implementation
    1. Update schedule if necessary based on experience
    2. Publicize new finding aids

[Detailed information on working with MS Word] Beware of MS Word’s “Bloatware” all the extra fonts and other stylistic flourishes

Chapter 4. Considering SGML, EAD, XML and Database options [Skipped pure SGML information]

  1. EAD
    1. EAD Cookbook by Michael Fox has ready-set templates
    2. Library of Congress http://www.loc.gov/ead
    3. DTD’s Document Type Definitions
      1. Tag Library of rules and acceptable terms
      2. EAD’s Tag Library has 7 components
        1. Tag: short, mnemonic code within < >
          1. Element: human name for tag
        2. Description: narrative definition and relationship to other elements
        3. May contain: list of elements may be wrapped in or included within this element
        4. May occur within: list of possible parent elements
        5. Attributes: list of descriptive terms that can be added
    4. EAD has 146 elements
      1. General (title, date, etc.)
      2. <c01> to <c12> for hierchical relations
      3. digital archival objects <dao>
      4. formatting: <head> <tbody>
      5. “miscellaneous” <odd> other descriptive data
    5. XML requires XSL Stylesheets to create displays (headers, fonts, bold, italics, margins and tabs, etc.
  2. DBMS Alternative
    1. Primarily for institutions which already have a database, not recommended for new finding aids
    2. Common DMBS: Oracle, Informix, Lotus Notes, Sybase, Filemaker Pro, FoxPro
    3. Can export from DBMS to XML for web display

Chapter 5. Exploring leading library and museum automation systems

  1. Libraries and cataloging – overview of automation in libraries. Most recognized automated description. Relies on short descriptions and international standards. Used by Library of Congress, OCLC, RLG, Vendors
    1. MARC field for Archives
  035  ID
  100 / 110 author
  245 title
  300 size, physical description
  520 scope, content
  545 biography, history
  600/610.611/650/651 Subjects (personal name, corporate, subject, geographic)
  854 repository information
  856 link to repository or finding aid
    1. NUCMC – offers cataloging resources to those institutions which cannot input directly through OCLC or RLG and who are open to all researchers
    2. Archives USA – Chadwyck Healy (ProQuest) can add your own repository and collection info
    3. Z39.50 – allows to search other institutions, across multiple institutions
    4. Vendors develop programs such as library catalogs and modules and database aggregates for journals
    5. Online bibliographic utilities: RLG, OCLC
    6. Dublin Core – 15 core descriptors for online documents
    7. OCLC’s Digital Preservation Initiative to find grants and cooperative ventures for digitizing
    8. D-LIB – Digital Library Federation
  1. Museums
    1. Document and authenticate objects
    2. Marketplace smaller than libraries, but beginning to standardize
    3. [PastPerfect used as an example]
    4. MIDIIS XML : Museum Initiative for Digital Information Interchange Standards, the museum’s version of EAD

Chapter 6. Establishing policies and techniques for digital imaging

“On the threshold of this digital age we have one foot in the hardcopy world of physical artifacts and the other searching for solid ground in the digital world”

  1. Policies
    1. Copyright: post copyright information on any self-copiers, on images, in registration packets. State that uses beyond Fair Use such as publications requires written permission.
    2. Intellectual freedom – see ALA’s Accessible Use Policies
    3. Privacy – do not keep personally identifiable information beyond bare minimum needed. Keep with users consent, keep statistics without personally identifiable information
    4. Donor rights – keep in mind difference between legal and “good will”. Keep donor trust by adhering to their stipulations in contract or deed. May also want to restrict posting sensitive documents on web, even if can do so legally
    5. Appraisal / selection for initial scanning and web publication
    6. Digitization criteria
      1. Value – most important and valuable
      2. Current and potential use
      3. Physical condition (rapidly deteriorating materials) although keep in mind digitization may not be enough considering additional factors
      4. Ownership – copyright issues? On loan?
      5. Ethical – relationship of donors and community
      6. Promotional
      7. Politics, grants, and funding
    7. Selecting guides are available online
    8. Outsourcing
      1. On-site: look at available space, staff, equipment
      2. Off-site: vendors have all of the above, but also consider shipping costs and dangers
  2. Procedures
    1. Scanning basics
      1. Bitmap: dots or pixels indicate presence or absence of content
      2. Resolution: ppi or dpi. The higher the resolution, the better the image, the more memory required, the longer to scan or retrieve. 300 was the standard, now 600 indicated as the standard, may go higher again. Web image is typically 72 dpi

Chapter 7. Creating the web site and display options

  1. [Site overview: general website design tips]
  2. Audience: in addition to the trained researchers and archives users, the web may attract other general public users. Don’t use terminology, use commonly understood terms, write at college-level or possibly 8th grade level, start inventories with short description instead of just straight box and folder numbers, sort by themes instead of straight original order
  3. Narrative elements: keep text short for scanning text, reduce scrolling on the monitor, substitute chronologies and organizational charts instead of long narratives
  4. Ongoing description: keep it changing with new and updated information, links to related materials, publications, etc.
  5. Media addition: add sound, video, etc.
  6. Online exhibits: focus on theme with images as primary display, include short descriptions and possibly links to longer explanations, keep users in mind with small monitors and slow connections, provide various navigation options: alphabetic, thematic, chronological, etc.
  7. K-12 document packets: include key documents, subject guides, and possibly lesson plans
  8. Courseware such as Blackboard and WebCT
  9. Consider outsourcing

Chapter 8. Maintaining and preserving digital archives

  1. [Web Museums: images and metadata]
  2. Institutional archives: electronic records management for born digital documents. Internal distribution and preservation
    1. Access through LAN or Intranet to staff and clients
    2. Authenticity: ensure authenticity through internal procedures, some high-security or financial concerns may require extra procedures
      1. Audit trail: software systems to automatically document chain of ownership and use
      2. Completeness: potential storage and delivery problems, have to ensure multi-faceted items remain complete
      3. Original vs. copy: is a re-created copy sufficient, or does it need to have the digital signature and/or encryption
    3. Cost analysis and benefits: benefits must outweigh costs
      1. Does the electronic strategy simplify, or complicate retrieval and access?
      2. Are the resources accessible and used?
      3. Can it be used for promotion, marketing, financial development?
    4. Record type: fixed (created and not changed, eg letter) or mutable (updated with new data, drafts and versions)
    5. Scheduling: retention and disposition schedule, maintenance of vital records, authentication of records,
    6. Security: ward off attacks, discover and recover from attacks, track and prosecute,
    7. System management: ensure hardware and backups work, have disaster recovery plan,
    8. Software: go to NARA and Dept. of Defense for updated criteria
  3. [Digital Libraries: e-books, e-journals, websites, etc.]
Advertisement