Archives for category: Database

Before I can talk about putting the database on a tablet I want to show you what the full PARP:PS database looks like, and a little bit about why it looks that way.

Like most field projects the database of PARP:PS is built to reflect the reality of the excavation workflow. For instance: almost all of the work that we do, from excavation to initial pottery processing, wet sieving, and artifact registration is all done at the same place. Since we don’t have to move bags of artifacts and samples from the site to a different location for processing, we don’t have a bag inventory system. That is reflected in the database: we simply have contexts and finds and don’t track bag numbers.

The pottery is initially cleaned and read by the excavators, with some ceramicists approving the reading. The initial reading is a simple classification of the material into 23 classes (seen on top of the Pot Quant Entry layout). The ceramic experts normally do not show up until the excavation is over. Since what they are doing is an entirely separate read, I can’t split the quantified material from one class to another. So I have an entirely separate table named Ceramic Details to hold that information.

We have two different phasing categories: report phases and canonical phases. Report phases are the initial phasing in the field. By the time we have examined them for publication we might have reassigned several SUs from one phase to another and that becomes the canonical phase (or simply phase).

The rest of the database should be fairly familiar to anyone that has worked on excavation databases before.

Right now the database houses information on contexts at the trench and SU levels. It has all of the finds, and the roughly quantified pottery. The detailed ceramic info isn’t quite finished as I haven’t yet finished working with the ceramicists to get it to their liking. The database also contains information from the science team, in the form of a list of samples that have been examined and detailed faunal information (developed with Emily Holt). We don’t yet have floral information. We track the conservation done to our artifacts. We have a fairly advanced media data table which is integrated into all other areas of the database.

The Context layouts summarize the data from the other parts of the database. There is a navigation aid on the right which allows you to move from trench to trench without doing additional searches. This aid also appears on the SU layouts. Read the rest of this entry »

An article written by Billur Tekkok, Sebastian Heath and myself is about to appear in Studia Troica. They have kindly allowed us to host a pre-print version of the article.

The authors present a non-technical overview of the database structures that record information about the Post-Bronze Age ceramic assemblage at ilion. its purpose is not to fully document the system used at troia, but instead to identify practices that can be useful in other contexts. The article particularly stresses that it is important to assign a primary identity to all sherds that will be subject to individual study and that this identity can be re-used in such record keeping processes as drawing and photography. further use of such identities in print and digital publication is likely to make online linking of ceramic data to contextual information easier in the future.

This article describes our method for recording what was a truly staggering amount of pottery, even for an urban site. And this post gives me the opportunity to show one of my favorite pictures of Troy:

This is the area in front of the dig house at Troy. The vertical sidewalk in the photo separates the Bronze Age team ceramic processing tent on the left from the Post Bronze Age team ceramic processing on the right. Keep in mind that this is pottery from the current year only, waiting to be processed. The tent itself is full of tables with pottery spread out for analysis.

The Troy database is probably more similar to data collection schemes found in Greece than the work that I am posting based on the Pompeii data. Since our PARP:PS pottery is not read for publication until after the project is complete, I haven’t had to add tables to find all of the associated drawings of a piece. Another difference is how the individual numbers are assigned. The Troy database would hand out the next unique individual sherd numbers to the scholar. At Pompeii the scholars will come and study the ceramics during the winter when we are not there. So I have to devise a way for someone to assign a number to a ceramic without the possibility of duplication. We have a procedure in place but it is relatively untested.

Bill Caraher in his New Archaeology of the Mediterranean World blog has mentioned this blog frequently. Lately he has pondered what to do about his participation in a larger field project in the Princeton Polis Expedition. More specifically, he is trying to address how much his small study group should invest in its own data structure which may or may not be compatible with the data set of the project as a whole.

“I have been diligently reading John Walrodt’s Paperless Archaeology over the past few weeks. This blog documents in detail how a project implemented their digital workflow. From what I have seen so far, the tools that they developed and deployed served to facilitate their ongoing, in the field, research (although I am sure that there are provisions for archiving the data in a responsible way).”

He is correct, of course, in that this blog is currently focused on ongoing field research. There are things to say about data repositories, but I haven’t gotten there yet. My main focus on PARP:PS is data collection and immediate consumption and analysis of that data for preliminary publication.

It might help to realize that there are two different things at play: datasets and databases. I create databases to manage my datasets. The datasets might vary slightly from project to project and region to region, but are fairly interoperable. The variation that does exist is primarily a reflection of the variation that you get in survey/excavation/finds processing techniques from project to project. I can export the contents of any of the databases that I manage into a few dozen or so tables (PARP:PS currently has 35 tables) that can be key-linked using any database tool available.

But the database itself in its current FileMaker form does much more than store the data. We use it to view and summarize the data in a number of different ways. We can view all material from an single SU (Stratigraphic Unit), from a single phase (what Bill refers to as “level” in the Princeton Polis project), or a whole trench. Since we also have defined rooms and properties, we can add those to the database and view everything in those contexts as well. The database imports the images and creates the find numbers that we use to avoid data entry mistakes, and has validation routines to verify that hand-entered data is correct.

To put it another way: while my database might not be useful for many projects other than PARP:PS, I am pretty certain that my dataset will be useful for comparison to other projects at Pompeii or in Italy in general.

Getting back to Bill’s dilemma. His two options, as he expressed them, are to “develop a data structure best suited to answer our immediate research questions…On the other hand, I could imagine a data structure (undoubtedly more complex) best suited to preparing the Polis data for some form of digital publication (or at least archival storage).  Few projects in the Eastern Mediterranean with a Byzantine focus have made their data publicly available. In this regard, the Polis data could be an important step toward making stratigraphic, typological, and chronological data from the Byzantine period available in digital form.  At the same time,the two Early Christian churches represent just one part of a much larger and more complex site. Taking the time to produce a thorough and well-structured dataset could be a fool’s errand if it ends up being incompatible with other work ongoing at the site or finds very few comparable datasets elsewhere in the region.”

From my perspective this is an easy decision to make, although I haven’t seen any of the data. The more attention given to the structure of the data set is directly related to your ability to use distinct parts of that data set for analysis. It is my experience in fieldwork that inelegant solutions (Bill’s term) exist because no one has spent a great deal of time and energy to produce a more elegant one. Sometimes a project is looking for someone like Bill to show them the benefits of a better designed data structure. I have been on both sides of that conversation: I have created data structures that have been adopted by the larger project and I have incorporated data structures that others want to use in my own. I find that they key is to prove the benefits of the new solution and the conclusion becomes obvious. The issue of comparable datasets outside of the project isn’t, of course, Bill’s problem. If he is the first one to make such a tool available, and he publishes his data structure, it is up to those following him to try to make their data comparable.

Much of this is to say that I am shifting to posting about databases for the next few weeks. I will post a clone of the PARP:PS database in its current form (it is, of course, unfinished) as well as the files necessary to put the database on an iPad using FMTouch. The database is a complex beast and I don’t expect people to understand it right away, but I do plan on a series of posts explaining parts of the database in more detail.