Notes from Megameeting 22nd October 2007


BarbaraSierman Koninklijke Bibliotheek, Netherlands
BruceAmbacher UM
CandidaFenton HATII, Univ Glasgow
SimonLambert STFC
DonaldSawyer NASA GSFC
HelenTibbo UNC
JohnGarrett NASA/GSFC
KatiaThomaz INPE, Brazil
MarkConrad NARA
RobertDowns Center for International Earth Science Information Network (CIESIN), U Columbia

All the discussion at this meeting was conducted by chat, so the following transcript of the meeting (with a few typos corrected) is complete.

The only specific action was that DonaldSawyer volunteered to give some explicit examples of provenance or the lack thereof.

Other topics for next week:

  • "Mandatory" versus "risk assessment" approach in certification.
  • Whether provenance should follow the OAIS definition.
  • Finish reviewing Donald Sawyer's proposal on how to deal with references to authenticity.

Katia Thomaz >> (All): i am reading Don´s message
Mark Conrad >> (All): I responded to Don's message.
Don Sawyer >> (All): Sorry my action was done so late.
cclrc >> (All): So ... Don's email basically proposes how to eliminate or clarify the uses of  the words "authenticity" in the TRAC doc.
RobertDowns >> (All): I am reading Mark's response
cclrc >> (All): Mark, I haven't seen your reply yet.
Mark Conrad >> (All): I received it back from the listserv.
Don Sawyer >> (All): I've read Mark's comments and will have a couple responses when we get to them.
cclrc >> (All): Don't wait for me to start the discussion.  I will be able to follow and will read Mark's email as soon as it arrives.
Don Sawyer >> (All): Has everyone been able to read my response to the action item?
cclrc >> (All): Yes
BruceAmbacher >> (All): yes
RobertDowns >> (All): Yes
RobertDowns >> (All): I also have read Mark's comments to your response
Don Sawyer >> (All): We've not heard from Barbara or Candida
BarbaraSierman >> (All): I'm reading the comments
BruceAmbacher >> (All): Mark, In B1.3 what do you read as "mandatory"?  I don't see it has to be mandatory.
Mark Conrad >> (All): The mandatory is in Don's comments - not the text.
Katia Thomaz >> (All): ok for me.
candida fenton >> (All): I have just read over the comments. 
BruceAmbacher >> (All): In B1.3 I read Don's comments as saying it can not be made mandatory.  I think we all in agreement. Yes?
Don Sawyer >> (All): In my text I pointed out that TRAC does not require that Provenance be provided by the Producer  of information submitted to the repository.  This is explicitly stated in TRAC B6.10.
Don Sawyer >> (All): By 'mandatory', I referrring to it being a stated requirement.
Mark Conrad >> (All): Bruce, No. There should be no mandatory elements in the standard.
BarbaraSierman >> (All): The argument that this information is not always available is a good reason for not making it mandatory
Mark Conrad >> (All): How does this fit with the stated goal of this group - the proposed standard should use a risk assessment approach rather than a fully mandated approach.
Don Sawyer >> (All): Mark, i don't find the position that there should be no mandatory elements in the standard something that could work.  
Mark Conrad >> (All): How do you define risk assessment approach?
Don Sawyer >> (All): The standard, and does TRAC, has requirements that are to be met.  How an auditor scores the meeting of these requirements is another issue and not one we've addressed yet.
Mark Conrad >> (All): If it is up to the auditor to score this how do you: 1..decide what is mandatory and what is not? 2. exclude requirements like provenence prior to ingestion from the standard?
Don Sawyer >> (All): Mark, TRAC sets up a list of requirements.  These are 'mandatory' in that sense.  TRAC also says that provenance prior to ingest is not part of the requirements.  However, this does ot prevent any archive or repository from requiring it.  This certification standard needs to work for a variety of repository types, not just national archives.
RobertDowns >> (All): Some producers of scientific data will not provide provenance information for the scientific data that they are providing. 
Mark Conrad >> (All): I am not talking about requirements for a national archives. I am talkig about requirements for any trusted repository.
BarbaraSierman >> (All): The word "encourage" in Mark's comment leaves space for such interpretations
BruceAmbacher >> (All): While not mandatory, how can a repository expend resources and effort for an object that it can not establish the provenance for?
Mark Conrad >> (All): Robert, Why would a repository accept a deposit of scientific data without  provenance information?
BruceAmbacher >> (All): Why would any repository accept any objects it does not have/get the provenance for?
Mark Conrad >> (All): Exactly!
candida fenton >> (All): If a user is to 'trust' a repository surely they will want to know where teh data held has come from (provenance)
Don Sawyer >> (All): Scientific archive often have no recourse other than to take the submitter's work that information submitted is what he/she says it is.  So, the source is not only the source but also the implicit definition of provenance.  But not necessarily any explicit provenance.
BarbaraSierman >> (All): What about webarchiving projects? Is there always provenance info?
RobertDowns >> (All): If the repository has a relationship with the scientist that is producing the data, knows that the scientist produced the data, requests the data, then receives the data from the scientist, then the repository knows that they received the data that the scientists produced
Don Sawyer >> (All): And yes, Web archiving was gong to be my next example.
cclrc >> (All): Isn't the provenance of a web archive that "this page was harvested at such a time/date and this is what it contained"?
Mark Conrad >> (All): In the case of web archiving the provenence information would include: who grabbed the data from the web, what dates, using what tools with what settings.
BruceAmbacher >> (All): So, we seem to agree that while not mandatory, any good repository will want to know the provenance of its objects and confirm the authenticity (to the extent it can)  Web archives operate on a different relationship and establish their own provenance - beginning when they harvest and noting the original source(s)
BarbaraSierman >> (All): Oke, I agree for webarchiving. How with legal deposit objects, is there always provenance info? 
Mark Conrad >> (All): Robert, In terms of your example the provenence informtaion would be that the data was recieved from this scientist who refused to divulge how the data was created.
Katia Thomaz >> (All): again, we should define provenance.
BruceAmbacher >> (All): We should use pre-existing definitions rather than create a new definition.
cclrc >> (All): We seem to be in the same position with respect to provenance as were with authenticity.
Don Sawyer >> (All): Bruce, I agree - all archives want to know the provenance of material that comes to it.  However it may not be explicitly provided.
Katia Thomaz >> (All): i didn´t say we should "create"  a definition.
Mark Conrad >> (All): From the OAIS reference model: Provenance Information: The information that documents the history of the ContentInformation. This information tells the origin or source of the Content Information, anychanges that may have taken place since it was originated, and who has had custody of it since it was originated. Examples of Provenance Information are the principal investigatorwho recorded the data, and the information concerning its storage, handling, and migration.
Katia Thomaz >> (All): we should equalize the concepts
Mark Conrad >> (All): Note that the OAIS definition explicitly says since origination - not since ingest/deposit.
BruceAmbacher >> (All): Katia, what does equalize mean in this context?
Don Sawyer >> (All): Mark, thanks.  Given the role of OAIS, it makes  sense to use/;assumr OAIS definitions unless there is a problem with one of them.
Katia Thomaz >> (All): i didn´t understand your question. posibly  used a wrong verb...
BarbaraSierman >> (All): As we cannot foresee what kind of material will be ingested in the repositories in the next years, and from what kind of origin it comes from, instead of trying to find a final definition, it could be better to try to find several examples of provenance info, per kind of material
Katia Thomaz >> (All): all of us should understand the same thing about the terms
BruceAmbacher >> (All): Can we agree that the goal is to know as much about the object, as far back to its vreation as possible and relay that to the users?
Mark Conrad >> (All): Katia, Yes. That is why we have been talking about definitions for several weeks now.
JohnGarrett >> (All): I think the definition of provenance is correct in that it starts at origination.  But again it isn't always possible to get provenance from the beginning.
JohnGarrett >> (All): That doesn't mean that an archive doesn't preserve it.
Don Sawyer >> (All): OAIS tries to provide a framework/model of concepts and terms.  It does not define a certification standard.  TRAC is attempting to do this, and it is saying that explicit provenance prior to submission is not required for certification.
BruceAmbacher >> (All): Agreed, we all have greater confidence in certain data stes than others
Mark Conrad >> (All): Without some provenance information a traditional archives would not preserve the data.
JohnGarrett >> (All): Again even National Archives in its preservation of federal websites only gets information from the agencies of a snapshot and not any provenance back to origin of the data on the pages.
Don Sawyer >> (All): Bruce- exactly.
RobertDowns >> (All): Often, scientists point to a publication to show how their data were produced or provide a readme file with the data.
BruceAmbacher >> (All): John, the provenance of federal websites and NARA's knowledge vary by the type of harvest and the type of scheduling and appraisal that preceded the harvest. or direct transfer from the agency
BruceAmbacher >> (All): Robert, doesn't that pub or readme file constitute the provenance and documentation for that data?
RobertDowns >> (All): Yes, if it is all that is received
Mark Conrad >> (All): Robert,  Such provenance information would be very detailed. Presumably if this is a peer-reviewed publication there would actually be enough information to independently verify/validate the information.
Don Sawyer >> (All): I think we all agree that it is desirable to get as much provencne prior to submission as possible.  I think we should make this clear and  leave it at that.
JohnGarrett >> (All): I agree, the more you can get the better, but you can only get what you can get.
Mark Conrad >> (All): Don, What if anything would be "mandatory" provenance information to be ingested from your point of view?
JohnGarrett >> (All): And at least for scientific archives it is better to be able to grab the content data and preserve it without getting much provenance.  Would be more of a risk to just not ingest it.
BruceAmbacher >> (All): Ah, but what is the minimum level that would mean we should reject the data?
Katia Thomaz >> (All): so DR must get provenance information
Mark Conrad >> (All): Katia, See this definition from the OAIS:Preservation Description Information (PDI): The information which is necessary foradequate preservation of the Content Information and which can be categorized asProvenance, Reference, Fixity, and Context information.
Katia Thomaz >> (All): i think we have been discussing the extension of this information, not the necessity
Mark Conrad >> (All): John, If the content is a string of numbers with no provenance information what have you preserved?
JohnGarrett >> (All): I think the minimum provenance is just knowing how/from whom  the archive got the content information.
Katia Thomaz >> (All): i agree.
Don Sawyer >> (All): i don't see the certification standard requiring any provenance prior to submssion apart from the creditability of the producer.  We should not mix up Representation Information with provenance.
BruceAmbacher >> (All): How else can we assure the user about origins of the data and its quality?
Mark Conrad >> (All): Are we throwing out the OAIS definition of PDI?
Don Sawyer >> (All): In many science submissions, that quality is based on the reputation of the submitter.
BruceAmbacher >> (All): Don, you verify the content against what is provided and must make some statement about the bona fides of the producer.
JohnGarrett >> (All): Well I would require some format information to know what the numbers were and if I knew who the numbers came from and what they're supposed to represent, the archive would make an assessment of whether that was information they would ingest.  And I would archive it if it were in the archives mandate.  
Don Sawyer >> (All): OAIS is not certification standard.  TRAC is where we're trying to be more explicity about what is required for certification - broadly speaking. Some repositories may require more.
JohnGarrett >> (All): I would also keep the provenance of how/from whom we got it and that there is not other provenance.  Users can then decide how much they trust the information.
BruceAmbacher >> (All): As a noted scientist I may produce excellent data in my area and crap outside my area.
Don Sawyer >> (All): Bruce - yes, but these are not easy to be explicit about.
Mark Conrad >> (All): Provenence information is listed as "necessary" in the OAIS. Provenence information in the OAIS is defined as listed above.
JohnGarrett >> (All): That is quite true.  But provenance tracks who/where/etc it comes from not an explanation of what the content is.
Mark Conrad >> (All): Provenence includes "how" the information was created.
Katia Thomaz >> (All): the content depends on the DR policies
BruceAmbacher >> (All): TRAC is trying to establish the steps/criteria that should be certified as being core to a quality archives.  Provenance is one crucial aspect to establishing confidence in the data quality.
Don Sawyer >> (All): Mark, OAIS provides a model for communication.  You conform to OAIS in communication when you use the terms in a  way that is consistent with OAIS.
Helen Tibbo >> (All): Hello all! I'm at ASIST right now.
Don Sawyer >> (All): sorry, I didn't finish my thoughts above - ignore for now
BruceAmbacher >> (All): Helen, we have been "discussing" provenance, much like we discussed authenticity last week.
JohnGarrett >> (All): I think we all agree that we always want to get as much provenance information as is useful for our archive.
Katia Thomaz >> (All): again, provenance is always necessary but its content depends on DR policies
JohnGarrett >> (All): Do we also agree that it is not always possible for all archives to get all the provenance information before the content is delivered to the archive?
BruceAmbacher >> (All): Yes to both.  So what, if anything, should be changed in TRAC?
RobertDowns >> (All): That is a realistic statement
Katia Thomaz >> (All): i think it is always possible to get some provenance
Mark Conrad >> (All): No archives gets ALL of the provenance information.
Don Sawyer >> (All): Mark, OAIS provides a model for communication.  You conform to OAIS in communication when you use the terms in a  way that is consistent with OAIS.  I don't think that a statement, such as is in TRAC B6.10 that doesn't require explicity provenance prior to the ingest process (and clearly this involved negotiations with the Producer), is a violation of OAIS terms or concepts.
Mark Conrad >> (All): Don, How do you justify that when you look at the OAIS definition of provenance information?
JohnGarrett >> (All): So for me the question becomes what is the minimum provenance that is required since we can never get it all.
Katia Thomaz >> (All): what about including something in A3. Procedural accountability & policy framework
Mark Conrad >> (All): John, Which gets back to the question of whether we have "mandatory" requirements or list the ideal an d used a risk assessment approach.
JohnGarrett >> (All): Perhaps the minumum amount that is acceptable is an Archives policy that needs to be adhered to.
Don Sawyer >> (All): That may be a good way to phrase it.
Helen Tibbo >> (All): You may have covered this, but what is the repository's duty (if any) in verifying the truthfulness of the provenance data?
BruceAmbacher >> (All): OAIS sets a detailed threshhold.  TRAC wants a repository to document its standards and show proof the data conforms to its criteria.  And Helen, authentic data with clear, fully documented provenance can be corrupt.
JohnGarrett >> (All): My personal feeling is that it would be more useful if we had some few items that were mandatory, but I could live with everything just being scored.
Don Sawyer >> (All): Helen, I asume you are referring to provenance applying to the history of the object prior to submission, and not the history of object during submission and while under the repositories control?
Katia Thomaz >> (All): people, i must leave you now. i see you next wek.
JohnGarrett >> (All): Obviously scores on some items would be more important than others.
JohnGarrett >> (All): Bye Katia
Katia Thomaz >> (All): have a nice week.
Helen Tibbo >> (All): Don, yes, the history beofre it comes to the repository.
BruceAmbacher >> (All): The evidence is intended to be illustrative examples of what can suffice to prove, in this case, adequate provenance.
BruceAmbacher >> (All): I am referring to evidence in TRAC's examples.
BruceAmbacher >> (All): So, what is the prep for next week?  Where are we going?
Helen Tibbo >> (All): Sorry to have come in late. I rather forgot I was on central time...
Don Sawyer >> (All): Helen, from TRAC view, none!  Not because it may not be important in many repositories, but because there are may repositories where this can not be done.  I think this gets, again, to the primary scope of what Certification is trying to cover.  The main focus of TRAC has been on preservation and NOT on certifying that a repository meets every one of its objects as laid out in some charter.  If it were the latter, then I think  the certification task would be extremely difficult.  However, I think we need a discussion on this.
Helen Tibbo >> (All): So again, what are we envisioning for certification? If I collect more provenance data will I get a higher score in that category? Will I gat a platnum rating rather than gold or silver?
Don Sawyer >> (All): We still need to finish reviewing my action item, and Mark's comments to this text.  We got off on a discussion of provenance.
BruceAmbacher >> (All): I have to go.  See you next week
Helen Tibbo >> (All): Bye!
candida fenton >> (All): I have to go too. Type next week.
JohnGarrett >> (All): Bye, I'm out of here also
Don Sawyer >> (All): Helen, the rating needs to be relative to meeting the requirements.  If its not in the requirements, then it would not affect the score.  
Don Sawyer >> (All): Bye!
cclrc >> (All): I'm not sure we have any actions specifically - but feel free to suggest!
Mark Conrad >> (All): So is provenance information in or out of the requirements?
Don Sawyer >> (All): Mark - I'm still here.  Still needs to be decided.
Mark Conrad >> (All): If provenance is in, are we using the OAIS definintion?
Mark Conrad >> (All): Sounds like a good discussion for next week. 
Helen Tibbo >> (All): I think we need to continue on this as well.
Don Sawyer >> (All): I'll try to come up with some explicity example re: provenance or the lack thereof.
cclrc >> (All): OK, thanks Don
Don Sawyer >> (All): bye, again.
Helen Tibbo >> (All): Bye
cclrc >> (All): Bye all.
RobertDowns >> (All): Bye
Mark Conrad >> (All): I think we also need to talk about "mandatory" versus "risk assessment". That is how this discussion got started.
Mark Conrad >> (All): Bye.

-- SimonLambert - 22 Oct 2007

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2008-02-13 - KatiaThomaz
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback