Abstract
Saving designs before experiment and data before analysis and publishing and then (lightweight) publishing of negative or inconclusive results, are encouraged. A proposal for a lightweight registry of experimental designs and data may be more effort than it is worth, given current tools for timestamping and electronic lab notebooks.
Tenth Anniversary
On the approach of the tenth anniversary of the idea to register experimental designs without disclosing those designs publicly, the author is pulling that registry idea out into the light of a blog with single digit readership by updating the one page proposal, Appendix G from 2012, to account for the “new” internet. The registry is seen as an extremely lightweight method of recording the existence of files. Those files remain with the user and can later be shared privately or publicly at the user's discretion.
The proposal will be placed directly in the Appendix mnp Model's Journal of Negative Results in the main mnp Manual: An Architecture for the Fine-Grained Structure of Everything since the existence of free timestamp servers and well reviewed free electronic lab notebooks makes this proposal less attractive. One minor advantage of the proposal is that a creator can send a number or a line of data rather than a cryptic file to a receiver, though in all cases the original file will eventually need to be sent if verification is important.
For reference, here is the revised proposal for a registry of methods and data.
Lightweight Registry Proposal - Deprecated
Initially intended to be a registry of experimental designs in physics with no requirement to publish the designs themselves, the concept could be applied more widely. Proving that one did a body of work by a certain date is useful for academia in general, to prove prior experimental design but also to prove that a body of data was gathered by a time and maybe to prove drafts were done by a certain time. Registering lab notebooks occasionally at least puts a “seal” on the work, though dates and times between submissions are not “proven” by the contents of the notebook.
Outside academia, copyright in general is an effort to establish authorship and time. Establishing time of ideas for patents and prior art is relevant to some, including academia. The author's imagination is neither unlimited nor fast, so further ideas are welcome.
So proposed here is a fast lightweight method of proving a timeline for data that might be kept private or made public.
Prior Art
The need to prove that something existed at a certain time was handled in the past by mailing it to oneself in a stamped envelope so that the post office would postmark the package (thanks, E). One had one chance to open the package to prove the contents, after that the contents were out of the bag. The advent of photocopiers meant one could also use the materials before opening.
Notaries could be paid to put stamps on documents and record when that stamp was applied. This was a fairly heavy investment and depended on users keeping the documents unchanged. Without sealing the documents inside a container (see the previous paragraph), this might not be considered reliable.
Current Art
An industry buzzword is RFS 3161 compliant timestamping. The commercial services take information (as suggested here) and create a hash of the information and the servers credentials and send that file back as a timestamp token, which is stored by the creator of the information. No need to store the timestamp remotely. Other services store the original document remotely and create the timestamp remotely. Some free timestamp services exist, though finding them and verifying them can take time. And using them can require programming.
Not itself a lightweight solution.
Certainly the author is not introducing a completely new concept.
Putting anything on the internet is considered by many to be permanent. Photo owners who let their account lapse sometimes discover otherwise. Depending on reference frame, portions of the internet may become beyond the event horizon. Time of publication is an issue. Establishing that something was published at a certain time seems problematic. The author wishes material put on the internet DID have a date visible, that search results had a creation or substantive edit date. He has had the experience of reading material and documentation only to discover that it was written years ago about prior versions. Or written for completely different audiences, as when tax information addresses “you” but applies only to business owners.
If a blog post is considered proof that it was created on a certain date, blogging can be used. If the dates are not maintained or can be adjusted, this is not so reliable.
Creating a project on github maintains the commit date (thanks E and 2018 notebook entry) for all to see. The project name and author is publicly known, as is the contents. Github might not be amenable to keeping millions of projects whose single file consists of length, CRC32, and SHA-512 for free. Naming conventions might be established. Or not. All project names are unique to that author, so conflicts are not an issue.
NFT's, if the author could ever understand them, probably do not contain a date resistant to spoofing or counterfeiting.
Even heavier or more expensive solutions include:
ISO 9001 general quality control requirements can include keeping track of documents and dates. Costs.
The US Food and Drug Administration (FDA) has requirements for tracking work. Probably very heavyweight, given the profits and human safety issues involved.
Timestamp Servers
Direct timestamp services exist on the internet. Timestamp servers take a cryptographic hash of an existing file and return an encrypted file containing basically proof that that hash was submitted at a certain time to a trusted server. Many cost. Some are free. Most require programming.
Free Timestamp Servers
A list of free timestamp programs can be found in https://gist.github.com/Manouchehri/fd754e402d98430243455713efada710 . The list was last updated six months prior to January 2022 review. The discussion can reveal changed experiences and new servers found by others. Some servers are limited to 100 per month or 5 per day or 10 per day or 20 in 20 minutes or non-commercial use only. Finding and using a timestamp server requires either knowledge or a program, some of which make creating timestamps invisible.
Electronic Lab Notebooks
Electronic Lab Notebooks exist. Most cost. The idea has existed since the 1950's. Implementations started to be feasible twenty or thirty years ago. Most Electronic Lab Notebooks have improved (or been impemented) in the last ten years. Searching those three words turns up many reviews and sources. Reviews may include 40 products in the list. Many are industry specific. Many are large. Even searches adding the word free turn up reviews of mostly fee based services. For example, SciNote is accepted by the FDA, NIH, and European Commission, includes inventory tracking, standard operating procedures management, and project management. Most do much more than just provide timestamps for information.
Free Electronic Lab Notebooks
For electronic lab notebooks, two references might be useful. A review article relevant to academic research from Nature Protocols (2022-01-14): Higgins, S.G., Nogiwa-Valdez, A.A. & Stevens, M.M. Considerations for implementing electronic laboratory notebooks in an academic research environment. Nature Protocol (2022). https://doi.org/10.1038/s41596-021-00645-8 runs ten pages in two columns. The short story: implementing ELN's is hard and requires knowing the lab's needs. ELN's offer advantages of searching, archiving, and sharing but have a learning curve.
A free, open source electronic lab notebook is eLabFTW https://www.elabftw.net It can be run locally or hosted by organizations centrally or remotely on the web. Installation normally uses Docker, so is moderately complicated or moderately easy depending on ones experience. To run locally, half a day for setup by a moderately savvy user is one estimate. The existence of such notebooks and timestamp services allows the author to put this Lightweight Registry for Experimental Design and Data on hold.
Deciding to use an electronic notebook after researching options and requirements takes some time. Using electronic lab notebooks might be useful for many.
Advantages of a Registry of Methods and Data
Without creating any stigma, having a registry for methods and data might make submitting to a Journal of Negative Results easier. If the methods are already packaged and the data can be packaged for delivery to responsible reviewers, then only a summary of results may be needed for submittal to a JNR.
Introduction to the concept of Journal of Negative Result
Regarding a public Journal of Negative Results, which attempts to create a repository for failed experiments and ideas, there have been many efforts. Motivations for a Journal of Negative Results include the oft cited 2005 paper by Ionnadis in PLOS Medicine which has a medical focus and suggests most published findings are false. At least that synopsis gains attention. It also suggests that negative studies in some fields, if published, might appropriately lead to abandonment of the field. Again, note the medical focus. Citation: Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124 August 30, 2005.
Mentioned in https://www.aje.com/en/arc/negative-results-dark-matter-research/ are
- ALL Results http://www.arjournals.com/ojs/ encourages negative results as valuable pieces of information in science, including Nano http://arjournals.com/index.php/Nano and Physics http://arjournals.com/index.php/Phys , The website supports late editions of Internet Explorer 7, shows the current physics journal sample from 2011, and announces the creation of the Phys section in 2012-06-26. Appears moribund, sigh.
- (Mega) publisher PLOS takes, since 2015, inconclusive or null or negative results if the results make a contribution to the field as Positively Negative http://blogs.plos.org/everyone/2015/02/25/positively-negative-new-plos-one-collection-focusing-negative-null-inconclusive-results/
- F1000Research f1000research.com
- further from physics is BMC Psychology http://www.biomedcentral.com/bmcpsychol which has created data notes as a shorthand method of making data available,
- PLOS One, Journal of Negative Results in Biomedicine http://jnrbm.biomedcentral.com/
The dark matter article notes that negative findings, in the rare event of being published, are less likely to be cited http://www.plosone.org/article/info:doi/10.1371/journal.pone.0054583 . The author suggests a lack of citation may not be a measure of utility; researchers benefiting from negative results by avoiding an area or even a field may not cite, but still benefit by redirecting their energies. Measuring those intangible benefits is not easy. The article raises the question “Would you take the time to write up negative results if there were a simple template and some credit for your efforts?”
Motivations for a Journal of Negative Results include The often cited 2005 paper by Ionnadis in PLOS Medicine which suggests most published findings are false. At least that synopsis gains attention. Citation: Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124
Registry Proposal Details - Deprecated
Now that the reasons to create a registry are seen as transitory, the proposal itself is included here but not fully rounded out.
Data to Store
Perhaps timezone, but don't worry about spoofing. Perhaps URL, but do not worry about spoofing or VPN's.
Data Required
Hope to work without cookies. Hope to work without having the website re-written or spoofed.
Design Parameter
For a designer whose experience includes
- working on the Camp Fire response, which burned an area bigger than the Bay Area,
- using 16K DRAM chips from manufacturers claiming the rare errors were from cosmic rays
- upgrading to hard drives with 10 Megabytes of storage
- graduating from (shared) dial up to ADSL
- moving from ADSL to fiber only recently
the author retains an acute awareness of storage space, storage reliability, bandwidth, minimal resource demands, privacy, user effort, and user learning/knowledge requirements.
One concern, of course, is the user reaction “do I really have to learn something new?” Another, similar to modern reaction to email, “That looks really old.” “Like last year.” “Why do you use green rather than blue? It's ugly.”
Storing the data is the easiest part of this proposal. Keeping it secure is more work. Retrieving it is yet more work. Making the user interface pleasant is work. Making it secure and resistant to denials of service and tampering is a lot of work.
Limitations on Use
To limit the denial of service by a/some users creating a lot of entries: Set a limit on number of submissions per day? Use captcha or something similar to assure human use. Though the author finds that irritating.
If we really do not want commercial or vanity use, restrict users outside .edu addresses.
Responsibilities of the User
Primary is keep an exact copy of the file from which the record / time stamp / identity stamp was created. This applied to copyright applications since the advent of copyright, so will not be unfamiliar. Still, the author has at times struggled to keep archival copies locally or not so locally and keep them findable. Storing encrypted files on the net is fine. The user still must retain the key and assure the encrypted file remains accessible.
Personal Notes on Keeping Notebook
Keeping directories constant has been virtually impossible; single files are more manageable. The author has found it hard not to go back to electronic records of thoughts and do spell check without changing the substance. If I were really concerned, I'd know where the original was kept and what its name was. As an old time user of computers, I DO have a lot of backup copies. Just finding the version I want is tough and of course the contents COULD be spooked or changed.
Responsibilities of the Keeper
Keep the information as append only, do not go back to change previous entries, just allow additions. Backup the data in multiple manners, save the encryption key for those backups, have a succession plan, try to avoid Hollywood scenes of kidnapping or rubber hose steganography, Succession plans: does the community take over decisions, do we worry about privatization of the data. If backup is kept off-line (or on) bit-rot
Challenge
The start of bulletin board systems was accompanied by science fiction that worried about nefarious use of encrypted communication. There are so many ways to communicate, in the open and in “private” that I will not worry about that.
Naming
A catchy name is needed for new (or old) concepts hoping for acceptance. Meme's welcome.
- Container
- time capsule
- cache
- vault
- registry
- notary...
- store
- registry
- repository
- What stored
- plans
- myplans
- experimental time capsule
- experiment registry
- my notary proposal (note the favored mnp acronym)
- notebook repository
- lab notebook snapshots
- file
- methods
- Combined terms
- methods cache
- experimental methods
- journal of pending results
- registry of experimental design and data (redd)
- Bare terms trying to be memes
- knox no locks
- I did it
- I got it
- Proof
- 200 Proof
- prior art
- been there
- done that
- remember when
- back then
- my history
- keeper of the flame
- whats your plan
We want to go viral if we want lots of attention and use. More relevant for advertising or other money making ventures. And a dot com name, not a dot org name.
Academia, not so much.
Meditations on the Statistical Physics of Information Storage
The details of the proposal for a Lightweight Registry remind the author of the interesting proof from Introduction to Statistical Physics. That proof suggested that information storage need involve no energy. But retrieval does involve energy. And changing existing storage also requires energy to clear, to read and revise if necessary.
In like manner, a lightweight record keeper need keep no state or extra information. When limitations like number of requests per day or time between requests are placed, the programming energy costs go up, sometimes a lot if security is involved. Keeping cookies is work, and may require permission from the user, depending on reference frame. Again, increasing transaction energy and cost.
The original proposal from 2012 is included here. Since it too is outdated, it is mostly shown in lighter text.
If researchers in a field were to file their methodologies and predictions prior to experiment with a registry, the subsequent results should have more power and respect in that field.
Required submission: length and more checksums (and proof that a human is submitting)
Optional submissions: Topic, Title, Author, Contact Info, Date, Keywords, Text. Any information can be kept “private” for a period of time chosen by the submitter.
The subsequent papers on that experiment would quote a submission number and length and checksums and provide the document that matches. This would allow readers to know the methodologies and predictions at the time of submission to FR Journal (jfr.com is taken).
The registry would take no view on the reliability of the checksums or the information submitted, only that the submission was made with the data provided. The users would decide how much to trust. For example, if it is subsequently found that a 1M file with a CRC32 and an SHA-256 is easily modified while maintaining length and sums, then the value of the submitted information would go down.
The data would be stored off-line after a (short?) while, rather than being maintained only on-line. Verifying old submisions might cost and be a minor profit source. Or bringing an old submission up for public view for a while might cost.
The Future Results Journal may be more relevant in fields with more and smaller experiments and in fields where variation is greater such as medicine, sociology, economics, biology, environmental science.
The initial idea stems from seeing medical research performed “to significance.” Which has significant negative results.
No comments:
Post a Comment