LoCloud crawler-ready tagging tool (CRTT)

The LoCloud crawler-ready tagging tool (CRTT) is an experimental prototype that seeks to demonstrate how a metadata aggregation and ingestion process, as the one employed by Europeana, may be simplified. The test case that is used to validate the approach is content from small cultural heritage institutions that do not currently have an established mechanism for providing content to Europeana.

The quality of each metadata record produced through the Europeana workflow will usually be greater than an entry in the index of one of the mainstream search providers. The efficiency with which content is being ingested as well as the scalability of the search engine approach is however infinitely better.

When comparing the performance of an auto-indexed URI towards a manually crafted metadata object in terms of a white-box text search, the results are similar. In terms of more sophisticated search types where the detailed information contained in a metadata profile like EDM is used and combined across collections, a search engine approach is presently not be able to compete on potential precision.

How to use it?

The functions available in the CRTT are shown in the UML Use Case diagram linked below. Click the thumbnail to enlarge it.


Log in

The tool is available at http://crtt.avinet.no(external link)

For authentication details to test the service, please contact sis at avinet.no or tgo at avinet.no

Create a collection

  • To add a new collection, choose "Manage Collections" from the left margin menu.
  • At the bottom of the page under the headline "Add new or update selected collection"
  • Specify the following mandatory information
    • The name of the collection
  • Specify the following optional information
    • Default location to be applied to all items in collection
    • Default edm:type to be applied to all items collection
    • Default rights statement to be applied to all items in collection
  • Press save

From the manage collection web site you can also perform operations like deleting the URLs, reindexing or recrawling the content.

That is it... move on.

Submit a URL or a sitemap to be indexed

  • Choose either "Submit URL" or "Submit Sitemap" from the left margin menu.
  • Select the collection to add the URL to or upload sitemap to
  • Add URL or upload file.

That is it... move on.

Add custom metadata extraction rules

Since each site typically uses HTML more for visual formatting and layout than for structuring content, it is often necessary to define custom metadata extraction rules. To do so:

  • Choose "Manage Rules" from the left margin menu
  • Choose the collection you want to add rules to
  • Choose which element the extracted information should go into
  • Add an expression, using CSS selector-style syntax
  • Press save.

Existing rules are available at the bottom of the page and can be edited or deleted.

Wait ...

The crawling and indexing is happening in the background.

Once the crawling has completed, you can verify the aggregated data through the "Search Demo" that is available under "Tools" in the left margin menu.