Archives for category: Database

Kristina Neumann, one of our PhD candidates in the Department of Classics at UC, will be giving a presentation soon on her work with the Google Earth database mentioned here earlier.  She has done amazing things with this database and created a series of KML files that allows her to express the reach of Antioch coinage in a stunning way.

This paper is part of the joint AIA/APA meetings in Chicago happening now. See it in session 5D at 12:30 on Saturday January 4.

In ancient times, much like now, authorities determined which foreign currency was accepted in a community. For Neumann, this made coins an ideal representation of a political relationship among cities. For example, if lots of Antiochene coins were discovered in a neighboring city, it’s likely a political agreement existed between the two governments.

Coins were also a data-rich resource for Neumann. In addition to tracking where the coins were found, she cataloged critical information about a coin — such as when it was minted and under whose authority it was made — that has been derived from the images and inscriptions imprinted on it. Other artifacts, like pottery, were less likely to have such identifiers.

Neumann uses Google Earth to convert the vast information in her coin database into a visual representation of Antioch’s political borders. She analyzes how the software plots which coins were found where and in what quantity across different historic time periods. This way she can follow the transformation of Antioch’s political influence as it was absorbed by the Roman empire.

She has found Antioch’s civic coins were spread farther out than previously theorized, and they were particularly abundant along a known trade route. Neumann can scan centuries of change in seconds with Google Earth to show the overall contraction of Antioch’s political authority but also its continued and evolving influence in selected regions and cities — and eventually its greater integration within the empire.

Google Earth allows Kristina Neumann to track change in Antioch as it was absorbed by the Roman Empire.

Her talk is already getting some news attention, which I have been tracking here.

Advertisement

According to FileMaker there is an issue between iOS 7 (due to be released on Sept 18) and FileMaker Go that affects the creation of a unique UUID number, which has the potential to wreak havoc on databases that rely on that unique number for syncing. This bug hits our own databases, as well as the copy of the database that is hosted at this blog.

Syncing from multiple copies of a database requires that each record have a very unique number. This is more than a straight serial number, since two separate copies of the database can each create a record in the same table which would give them the same serial number. Instead, our database relies on something more specific, a UUID which is generated from several types of information.

The UUID that we use is a  custom function written by Jereme Bante a few years ago named UUID.New. It creates a unique number for each record based on:

  • timestamp
  • recid (an internal number generated by FileMaker)
  • the NIC (or MAC) address of the device that created the record

This is stored as a custom function to allow all tables in the database access to it.

According to FileMaker, all iOS devices under iOS 7 will return the same NIC address,. This can theoretically return the same UUID for two records if two devices created a new record in the same table at the exact same time.

I am not very worried about this myself. The odds of two records returning the same UUID are pretty small. Also the syncing scripts that I have use items other than the UUID for matching. For instance each new record is given  _DeviceCreated and a _DeviceModified fields. Those are set to auto-enter a calculation [Get ( SystemPlatform ) & “-” & Get ( HostName )]. So unless both devices are iPads and are both named the same (which shouldn’t happen), they won’t supply the same data.

If you are worried about this bug you can switch from using the UUID.New function to another of Jereme Bante’s functions named UUID.Random which replaces the NIC portion of the UUID to a set of random numbers. Switching to this function won’t affect your old records and won’t require that you wait for FileMaker to fix FMGo.

First off, thanks to John for letting me post here. I do hope to be back as the work described below progresses.

In May and June of this year I spent just over two weeks initiating the online publication of the Kenchreai Archaeological Archive (KAA). This is an initiative of the American Excavations at Kenchreai, which conducts its work via a permit from the Greek Ministry of Culture and under the auspices of the American School of Classical Studies at Athens. Joseph Rife of Vanderbilt University is the director and I’m grateful for his permission to publish the results of this collaborative project. The focus of my work is the written and visual documentation of the excavations carried out at Kenchreai in the 1960s by the University of Chicago. These records are now in the Isthmia Museum, along with the objects that the project saved.

Cutting to the chase after the above preliminaries, I am modeling KAA using the Resource Description Format (RDF) in combination with the principles of Linked Open Data (LOD). Before offering an introduction to RDF, I’ll say that I’m using it because RDF gives me a simple and robust structure for describing the highly variable archaeological information that I am discovering in the extant records.

Superbrief intro to RDF

What is RDF? Simplistic answer is that it’s a W3 standard for encoding information, one with a formal description at http://www.w3.org/TR/rdf-concepts/ . More usefully, RDF has at its core the concept of a triple. For its part, a triple is a three part statement consisting of:

  • A subject: what you are talking about.
  • A predicate: the type of information that you’re saying about the subject.
  • An object: the value – meaning content – of that information.

Informally, the phrases:

  •  “Sebastian Heath”
  •  “is a”
  •  “human”

Could be understood as the RDF triple:

  • Subject: me (“Sebastian Heath”)
  • Predicate: assertion of nature (“is a”)
  • Object: human being (“human”)

Or:

  • Augustus
  • held the office
  • Roman Emperor

The last example becomes more interesting when we replace the words with web addresses:

I’ve now used a set of publicly available web addresses to construct a triple about an historical individual. In doing so I’ve merged the ideas of Linked Open Data (LOD) into this discussion of RDF.

LOD is a set of best practices that suggests using URIs – here meaning well-constructed and stable web addresses – to identify publicly accessible resources. It further suggests that the information available at those address should be machine readable.

I’m now a long way from Kenchreai. Before getting back on track, here are some links to information about RDF and LOD that readers may find useful:

RDF and LOD at Kenchreai

OK, back to Kenchreai. Upon my first diving into the 1960s notebooks, I was pleased to see that the project was very organized about creating identifiers for the archaeological phenomena they were encountering. “Archaeological phenomena” is my fancy term for things like “trench”, “layer”, “object”, “sherds found together”, etc. I happily note that I am at the early stages of understanding how these ideas were manifested at the site in the 1960s so that what follows here is highly preliminary and subject to change.

To give an example by way of a series of steps.

  • The Area E notebook for August 4, 1963 reads in part, “At level 0.90 m we started putting all sherds in [box E121].” You can see an image of that page via http://kenchreai.github.io/kaa/notebook-page-e-i-038 . Click thru on the image for more detail.
  • From box E121, a sherd was pulled and sent in to be inventoried.
  • When inventoried that sherd was assigned the ID “KE 670”. You can see the relevant entry in the inventory book via http://kenchreai.github.io/kaa/KE0670 .
  • As was the practice at Kenchreai (and at other American excavations in Greece), KE 670 was also assigned a “subject number”, in this case “P 176”. This indicated that it was the 176th piece of pottery inventoried.

I could go on, but I hope it’s clear that there a lot of RDF triples implied in the above narrative. Taking KE 670 as our staring point, some of them are:

  • KE0670
  •  type
  •  Inventory number
  • KE0670
  •  is part of
  • box-e0121
  • KE0670
  • is the same as
  • P0176

If you’d like to see everything KAA is saying about KE0670 (to use the fully padded version), go to http://kenchreai.github.io/kaa/KE0670 . But please understand that that’s the temporary location of KAA. I’ll report its permanent address when that’s available.

You may have noticed that the above pseudo-RDF makes reference to other Kenchreai identifiers. One of them is box-e0121, with KE0670 being said to be “is part of” that. KAA in turn says the following about box-e0121:

  •  box-e0121
  • type
  • excavation-box
  • box-e0121
  • is part of
  • trench-e-ii-x-1

And again in turn:

  • trench-e-ii-x-1
  • is part of
  • area-e

And yes, there are URIs for those identifiers: http://kenchreai.github.io/kaa/box-e0121 and http://kenchreai.github.io/kaa/trench-e-ii-x-1 .

I can summarize the above by saying that: KAA is establishing web-based equivalents of the Kenchreai identifiers and is using RDF triples to indicate relationships between those identifiers.

The above means that I think I have a single conceptual structure, the RDF triple, that I can use to represent all information inherent in the materials – both written, visual, and physical (meaning the objects themselves) – now stored in the Isthmia museum. As implied above, I’m at the stage of putting these ideas to the test.

An additional point: because I’ve worked to choose sensible strings of characters for each ID, it’s easy to turn them into URIs. You’ve seen some already, here are some more examples:

Note that on each of those web-pages, you can click on the identifiers it references to see what KAA says about those. Furthermore, and this is important, you can scroll to the bottom of each page and click the “as rdf” link to get a machine readable representation of the data represented at that web address.

Querying the Kenchreai RDF

Because KAA is a list of triples, it’s easy to make that list available. For now see:

for two versions of all current triples.

I’m not going to go into the details of the format of those files other than saying that they’re “raw rdf”. I’ve made them available as a convenience for readers, but also so that I can demonstrate a simple query into this dataset.

“Query” is just another fancy term for “extract useful information from some data.” Part of the RDF suite of technologies is a language for describing such queries. It’s called SPARQL and its details go beyond the scope of this post.

My use case for now is finding all inventory numbers that are said to be part of “Area E”, which is just the designation for a part of the site where the project excavated. This is an interesting problem because, as you may have noticed, KAA does not explicitly say that an inventory number is part of a particular area. But I do plan to assert that inventory numbers are part of excavations boxes (when they are), and that excavation boxes are part of trenches, and then that trenches are part of areas.

The “is part of” relationship is indicated by the RDF predicate “dc:isPartOf”. “dc” stands for “Dublin Core”, which is a set of “core” terms for describing data. As in, I have a standardized way for expressing the logical relationship that one KAA identifier is part of another. Read more about “dc:isPartOf” at http://dublincore.org/documents/dcmi-terms/#terms-isPartOf .

So here’s an example of a SPARQL query that finds all inventory numbers that are part of “kaa:area-e”:

PREFIX dc: <http://purl.org/dc/terms/>
PREFIX kaa: <http://kenchreai.github.io/kaa/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?kenchreai_id
FROM <https://dl.dropboxusercontent.com/u/17002562/kenchreai.rdf>
WHERE {
?kenchreai_id ?p kaa:inventory-number .
?kenchreai_id dc:isPartOf+ kaa:area-e . }

The super important part is the ‘+’ symbol after ‘dc:isPartOf’ on the last line. That will cause a SPARQL query engine to follow all dc:isPartOf predicates to see if an identifier is said to be part of ‘kaa:area-e’. Cleverly, this builds on the pseudo-RDF for KE0670 that I presented above.

As a convenience, I have set up a link to the SPARQL query-engine (ok, “endpoint” for those in the know) at http://sparql.org/sparql.html to run that query and return readable results. That link is http://sparql.org/sparql?query=PREFIX+dc%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0D%0APREFIX+kaa%3A+%3Chttp%3A%2F%2Fkenchreai.github.io%2Fkaa%2F%3E%0D%0APREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0ASELECT++%3Fkenchreai_id%0D%0AFROM+%3Chttps%3A%2F%2Fdl.dropboxusercontent.com%2Fu%2F17002562%2Fkenchreai.rdf%3E%0D%0AWHERE+%7B%0D%0A++%3Fkenchreai_id+%3Fp+kaa%3Ainventory-number+.%0D%0A++%3Fkenchreai_id+dc%3AisPartOf%2B+kaa%3Aarea-e+.%0D%0A+%7D+ORDER+BY+%3Fs&default-graph-uri=&output=xml&stylesheet=%2Fxml-to-html.xsl

Yes, it’s true; I have only done the data entry to show that KE0670 is part of Area E. But when I’ve done more work, you’ll be able to get the entire list.

Conclusion

So, yeah, I think it’s cool that I have a simple structure to encode all data types in a format that can be queried using standards-based third party tools. And I’m really glad that I can do this without having to define a separate table for each datatype. That’s what I would have to do had I chosen a relational model for KAA. So perhaps the major take away from this post is that RDF represents one way to overcome the shortcomings of the relational databases that are so prevalent on archaeological projects today. I doubt that all will be with me as to that opinion, since this discussion has been too brief to justify that conclusion. But I’ll return to this space when progress warrants it and look forward to an ongoing exchange on the ideas I’ve floated here.

googleearthbarcharts

One of our graduate students here at UC is investigating the movement of artifacts from the origin to archaeological find spot. She had been gathering her data in a FileMaker Pro database and wanted to be able to visualize the quantity of material either sourced or found in various cities. Since she already had a data table of cities with coordinates gathered from Google Earth, I decided to see if I could get FileMaker to talk directly to Google Earth. This database is the result.

The database consists of a table of cities, and a table of objects.

cities

objects

The objects have a field for Source City and a field for Find City. They also have a quantity. You can limit your query to anything: source city, find city, material, object type, dates, etc. You can view the summary results by either source or find city and export the result into a kml file that can be viewed in Google Earth.

This will work on an iPad as well, with FileMaker Go and Google Earth, but you will need to use an intermediary file manager (such as GoodReader) to change the extension from .txt to .kml.

summary

Read the rest of this entry »

What I will be reading during the long Thanksgiving weekend.

OCHRE: An Online Cultural and Historical Research Environment
by J. David Schloen and Sandra R. Schloen
November 2012
THIS book describes an “Online Cultural and Historical Research
Environment” (OCHRE) in which scholars can record, integrate,
analyze, publish, and preserve their data. OCHRE is a multiproject,
multi-user database system that provides a comprehensive
framework for diverse kinds of information at all stages of research. It
can be used for initial data acquisition and storage; for data querying
and analysis; for data presentation and publication; and for long-term
archiving and curation of data. The OCHRE system was designed by
the co-authors of this book, David Schloen and Sandra Schloen. The
software for it was written by Sandra Schloen.