VIDaaS Project survey: creating research databases

The VIDaaS Project has just launched another survey, this time aimed at researchers (from all disciplines) who have been involved in setting up research databases – particularly relational ones. We hope that the information this provides will help us to assess the potential benefits of an online database service such as ORDS, which we’re currently developing.

We estimate that completing the survey will take 15-20 minutes. All researchers who complete it (and supply a valid email address) will be entered into a prize draw for a £100 Amazon voucher.

The survey URL is:

https://www.survey.bris.ac.uk/oxford/creating_databases/

Please feel free to circulate this information to any colleagues to whom it may be relevant.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

VIDaaS/DataFlow workshop on the 2nd March

On Friday 2nd March the VIDaaS Project will be staging a joint workshop with our colleagues from the DataFlow Project at the Saïd Business School in Oxford. The day will run from 10:30 am until 5pm, and feature demonstrations of the database-as-as-service software developed by the VIDaaS Project and the DataStage software that forms the centrepiece of the DataFlow Project. Delegates will also get to look at the DataBank data repository system that Oxford is introducing, and hear about the cloud infrastructure that the University has built – partly in order to host the outputs of VIDaaS. There will also be plenty of time to ask question, discuss developments, and get to know the other delegates.

In the afternoon the workshop will split into two groups – a user-focused break-out, and a smaller technical session for those interested in learning how to install the software in their own institution or contributing to future development and customization.

If you’re interested in attending, please register at http://www.eventbrite.co.uk/event/2804728017. The workshop is completely free, and the venue is immediately opposite Oxford Rail Station, so you won’t even need to walk past any dreaming spires on your way there.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , | Leave a comment

VIDaaS Project Update, 24th January 2012

After meeting our Project Steering Group for the final time yesterday, now seems like a good time to give a brief update of where VIDaaS is at, and what we’ll have in place when we go to service in April.

The virtual infrastructure side of the project is going well, with the Oxford private cloud now largely in place apart from some networking kit which has apparently fallen foul of Customs and Excise. Likewise, the work on Identity and Access management, which took a long time to get off the ground, now seems to be progressing nicely. The DaaS part of the deal is taking shape, and we’ve received some useful feedback from our test users regarding the user interface and current functionality. We have a good idea of the costs of the future Online Research Database Service (ORDS), and a sense of the staffing levels required to offer the service within the University.

Encouragingly, we are getting a steady trickle of enquiries about the forthcoming service from researchers currently planning data-based research projects, and there seems to be growing concern about good research data management more broadly, which the ORDS helps to address.

It has become clear during the course of the last nine months that we will not be able to get every feature that our researchers have requested into the ORDS service by its April launch (writing a new database management system from scratch is no small task!) but we do now have a clear sense of what will be in place come the launch, and what functionality will have to wait until we’re in service. We’ll be publishing a ‘roadmap’ in due course to give people an indication of when the more advanced features of the system will be ready to use.

We have already tried accessing the underlying system using Microsoft Access as a front-end interface, and this seems to work just fine – now it’s a question of polishing our native interfaces.

One final thing to note – we are planning on staging our concluding project workshop on Friday 2nd March at the Said Business School in Oxford. Expect a more detailed announcement shortly, but, for now, block out that day!

Posted in Uncategorized | Tagged , , , , | Leave a comment

Naming the ‘Online Research Database Service’

Just a quick announcement that the service name for the DaaS at the University of Oxford will be the ‘Online Research Database Service’, or ORDS for short. This was felt to be reasonably descriptive of what we’re offering whilst not treading on the toes of any other systems or services within the University.

So when we refer to the ORDS in this blog or in the VIDaaS website in future you will know that we are referring to the local service that we will be offering our staff and students from April 2012 onwards built upon the DaaS software. Other institutions wishing to adapt the DaaS for their own use will need to consider something similar. Whilst we will be adding ORDS branding to the DaaS user interfaces, we will do so in a way that makes it very simply for other institutions to replace this with graphics and text more to their own tastes.

Maybe in the not-too-distant future we can create a single national service around the DaaS, which would almost certainly enable greater economies of scale, but for the time being we’re taking things one step at a time.

Posted in Uncategorized | Tagged , , , , | Leave a comment

International Data Curation Conference 2011

Maybe I’m just becoming increasingly specialised, but this year’s International Data Curation Conference seemed more varied than ever. Last year’s divide between delegates interested in improving library practices and delegates interested in supporting researchers was less in evidence, with research data management seeming increasingly like the continuum of processes that it should be. Themes this year included institutional and funding council policies, legal risks, rewards for researchers, the ethics of openness, research reproducibility, preserving software and scientific workflows, the costs and benefits of data curation, Freedom of Information requests, tweet preservation in the name of social science, new tools for data management, sharing, and curation, and the visualisation and communication of data to the public at large, doubtlessly along with various other bits and bobs that I didn’t get to hear about due to the usual restrictions of corporeal vestiture.

Ruth McNally explained that ‘Data that doesn’t flow is dead data’, whereas Jeff Heywood reminded us that it is storage that is ‘the itch that researchers really want scratched’; Andrew Charlesworth told us not to ignore the legal problems that can be associated with data, whilst Ellen Collins (Research Information Network) worried us with talk of how Freedom of Information requests can have unintended consequences on researcher behaviour even though the threat of being forced to reveal one’s data should in theory encourage better data management.

Thinking about how all this relates to the VIDaaS Project, as I’m supposed to, a number of things occur to me. Firstly, there is the encouraging sense that project, and the Database as a Service tool it is creating, should address several of the concerns raised: it will make it much easier to open up data for public inspection, should the data creator wish to do this (or is forced to); it should assist the citation of data and the ability to link datasets to publications; and finally it should open up some possibility of a sort of data reincarnation – it is very straightforward to import old Access and other databases that may have been lying around in a draw gathering dust for several years which may now get the chance to ‘flow’ again. I was also prompted to consider how we could capture richer metadata about the processes by which the data we serve was gathered – something to occupy my mind whilst others enjoy their festive breaks.

Anyone interested in viewing the non-award-winning VIDaaS and DaMaRO posters I exhibited at the conference can find digital representations of them via the links below:

VIDaaS poster

DaMaRO poster

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

A four country data management action plan

The report “A sufboard for riding the wave: towards a four country action programme on research data” was published recently by the Knowledge Exchange (KE) and builds of the “Riding the wave” report. The KE is formed by partners from Denmark, the Netherlands, Germany and the UK and aims to create a layer of openly available scholarly and scientific content in which research data plays a key role. The vision set out in the document is that of a collaborative infrastructure that supports seamless access, use, re-use and trust of data.

A surfboard for riding the waveIn order to achieve the vision four key drivers are identified: incentives, training, infrastructure and funding. These four elements are an excellent framework to analyse the RDM challenge as researchers are put at the core. It´s crucial to incentivize researchers to re-use and share data through recognition, and they need to be equipped with the data skills needed in their research domain.

Other stakeholders with prominent roles include libraries, scientific organizations, funders and journals. Libraries are positioning themselves through the emerging data libraries support services to help researchers access secondary datasets, and to manage and share primary data. This will result in libraries absorbing some of the costs. National and international scientific organizations should issue rules of scientific conduct specific to data to stimulate researchers. Funding agencies need to set data management requirements as part of grant applications. Editorial boards of journals have to press authors to provide access to replication data with the articles.

The report acknowledges the existence of a diverse data infrastructure with two levels: institutional and domain specific. Data management could initially be carried at the local level (researcher, institute) and the curation at higher levels (domain archives). In spite of this there are still many “orphaned datasets” without appropriate repositories and researchers´workflows tend not to be integrated with institutional services.

The action plan outlines a range of possible actions with long term objectives for making datasharing part of the academic culture and data logistics an integral component of scientific professional life, and for a sound infrastructure operationally and financially.

It´s remarkable to see such international collaborative effort in this field; this may help to avoid reinventing the wheel, and provide more coherent frameworks to address the data management challenge.

Posted in Uncategorized | Leave a comment

VIDaaS on tour – Copenhagen.

In mid-October VIDaaS continued its world publicity tour. Following on from its appearance at VMworld Las Vegas in September, it was the turn of VMworld Europe in Copenhagen.

Participating in Europe’s biggest IT conference (with over 7000 delegates) was always going to be a daunting prospect, but the VIDaaS team was well represented with Stuart Lee, Adrian Parks and myself all doing our bit to get the message across.

Adrian presented as part of a Colt session on the hybrid cloud, Stuart fielded questions from other interested parties in an executive briefing, and I sat on a stool on stage in a press briefing event with four business leaders to talk about our cloud experiences and our “Journey to the Cloud”.  It must have looked like a bad episode of Blind Date – I wasn’t aware until we started that we were the warm up for Paul Maritz (the CEO of VMware) so although I’d like to imagine the 100 journalists in the audience were all there to hear about Oxford, the hybrid cloud and VIDaaS, I imagine we were a side show for them. Nevertheless, the combined forces of the Oxford VIDaaS team on tour managed to gain some publicity for the project and what we are doing, including the lead story on Computing Weekly’s website for that day.

More importantly we formed some useful links within teams in VMware who are looking at similar challenges to the VIDaaS project, and learnt a good deal about the VMware vision for the future – all of which should help the service long term.

Posted in Uncategorized | Leave a comment

Cloud Infrastructure news

The ‘VI’ part of the VIDaaS project is now well underway. Partnering with VMware and using loan hardware kindly supplied by Cisco and EMC (specifically, a UCS blade centre and a VNX 5100 SAN), an initial implementation of the Oxford private cloud is now complete. The virtualisation platform is VMware vSphere, with vCloud Director and vShield running on top to provide the cloud abstraction layer. In the cloud layer we are running several prototype VIDaaS VMs, based on our chosen technologies of Debian Linux, PostgreSQL and JBoss.

While development of the DaaS software continues in parallel, the virtual infrastructure team has been investigating best practice for design and implementation of a production private cloud for Oxford. We will shortly have to return the loan equipment on which we have developed the prototype cloud, but the deployment of our live environment is already underway. For this we’ve selected Dell as our hardware provider, and we’ll be using both their blade and SAN technology (the storage element being provided by Compellent Storage Center).

We’ve also been doing some investigative work into how we might move workloads between the private and public cloud. Workloads in VIDaaS are primarily based on what we are calling project nodes, which are essentially Linux VMs running Postgres and JBoss, and it is these nodes that we have been moving between cloud services. For this initial testing, public cloud facilities have been offered by Colt Technology Services. Colt is a leading cloud service provider, certified by VMware through their vCloud Datacenter Services program. Our collaboration with Colt and VMware has been very fruitful and we have successfully migrated several prototype DaaS nodes between the Oxford and Colt clouds.

Posted in Uncategorized | Leave a comment

Researcher Requirements Report published

We are pleased to announced that the VIDaaS’s Project’s Researcher Requirements Report is now available from our website.

The Report is the product of a requirements gathering process lasting several months, which involved interviewing researchers from a range of disciplines, and conducting a national survey. Our chief aim was to gauge interest in the Database as a Service (which – fortunately for the project – turned out to be considerable), and to establish exactly what people would like to see offered by such a service.

However, the Report also provides an interesting snapshot of academic researchers and the IT staff who support them, and of the projects they work on. For example, it became clear that collaboration is very important to many researchers – and that a substantial proportion don’t currently have access to tools that permit them to share research data with colleagues as easily as they would like.

We asked about attitudes to making research data publicly available – a real hot topic at the moment, as research councils are increasingly requiring this as a condition of funding. While researchers seem to have mixed feelings about data publication (under half were happy with the idea of making their data generally available at the end of their project), many liked the idea of having a straightforward way of putting a particular subset of data on the Web to accompany publications such as journal articles – particularly if the dataset had a persistent URL or DOI that would allow it to be cited.

Evidence from elsewhere suggests researchers have good reason to be interested in this possibility: a presentation by Kevin Ashley at the recent DCC Roadshow in Oxford reported a study indicating that papers for which accompanying data was available were cited more than twice as often as those with no data available.

We also made some interesting discoveries about favoured software and data formats. Spreadsheets and statistical analysis packages were both common – the latter particularly among social scientists. Relational databases were also widely used, though it was noticeable that IT support staff were more than twice as likely to report relational database use as researchers were, perhaps indicating that academics find this method of managing data most useful when they have ample technical support available.

One slightly surprising finding was the prevalence of use of XML documents, particularly among humanities researchers: almost two thirds of this group make some use of them, and nearly a quarter said this was their chief method of handling structured data. On the other hand, document-oriented databases do not yet seem to have achieved the same level of popularity, with a quarter of survey respondents revealing that they weren’t even sure what these were.

All these desires and preferences (and many more that there isn’t space to talk about here – see the full report for details) were taken into account in compiling the prioritized technical requirements list described in an earlier blog post. This will guide the work done by our technical team over the next few months.

Posted in Uncategorized | Leave a comment

VIDaaS Web Bookmarks

A VIDaaS Diigo Group has been set up to keep track of sites of interest to the project. The websites include related projects, conference events, reports, tools, Oxford services as well as relevant websites and resources.

A feed with the latest bookmarks has been added to the front page of the VIDaaS website. Those interested can subscribe to the feed, where regularly new bookmarks will be added.

Posted in Uncategorized | Tagged , , , | Leave a comment