David Lee, who heads up with food and trade statistics team in Defra, spoke to us at our Unconference about the need for a coherent data community in Defra.
We've transcribed the video interview, below, for your reading pleasure.
– Hello David, would you like to introduce yourself?
Hi Stefan, my name’s David Lee and I head up the food and trade statistics team in the Great British Food Unit in Defra, which is a fancy name for the food policy team, basically.
We provide statistical support to all aspects of food policy, and also in particular on promotion of overseas trade and exports.
This is the first unconference I’ve ever been to, and it’s been really interesting and stimulating.
One of the things that I’ve kind of got a bee in my bonnet about, within the statisticians group at least, is the fact that it’s really, really difficult to achieve a coherent network of people that understand who we all are, and in particular, what skills we’ve got.
So, I’ve recently been developing my skills in R, which has been a really interesting time for me, and I’ve learnt lots of stuff about data science techniques and machine learning, but trying to find likeminded people in the Department is really hard.
So, seeing so many people, behind me, from the Environment Agency and JNCC, and various other places who are also interested in this stuff has been a really good experience.
– Excellent. So, erm, I mean...obviously family food survey stats was one of the big open data releases earlier this year…
– ...and that was, I guess, a really big boost for the open data programme at the time?
It was certainly an interesting challenge for us as a statistics team; it was certainly a first within the Department and possibly within Whitehall in terms of producing an open dataset based on household responses to a survey, in this case what we spend on food; what we eat – our nutritional intakes.
That was a big challenge for us, and it was a real team effort – amongst not just people in my team, but also people from the data programme, and Ellen Broad at the time in the Extended Ministerial Office, as well as drawing on expertise and examples from external people in other departments.
It was a real learning curve for us, and we’ve still got a way to go in terms of looking at the more recent data – which is more challenging, in terms of making sure it’s safe to be published as open data and can’t be used to reidentify people, which is something we’re working on.
– So this is one of the big challenges that you had earlier in the year as well, which is about how you take something which is personal data, and make it open data, without breaking any kind of laws?
That’s right, we had to write a privacy impact assessment looking at the level of risk, in terms of the specification of data being put together. Now, for really historic data back in the 1970s, it’s quite low risk, because there’s not a lot of other online data sources that can be applied against it.
On the more recent data, obviously nowadays everybody uses Twitter, everybody uses Facebook; you’ve got online registers – it’s a lot more challenging to come up with a data specification that you think is safe enough to publish, that can’t be attacked by people using all these other online sources to do a sort of jigsaw attack against it. That’s something that we still haven’t cracked, but we’re working on.
– You’re working on. You’ve already said that you’re developing your own skills in R, and you know – come from a statistics background, but moving possibly more towards data science – well, you may or may not agree with that point – and a follow up if you could actually continue: where do you see this whole area going for Defra?
So I think within the statisticians group, there’s a challenge there to upskill a bit. I think there’s a bit of a spectrum over whether you call yourself a data scientist, or statistician, or whatever – I mean, there are clearly experts at either end of the spectrum, but statisticians are quite well-placed: we all know a bit about statistics, obviously...we’re good with datasets...some of us know a bit about coding, so we’ve got a...got a good base, but I think it’s a skill set we can build.
I think, if we get access to some of these tools within the Department, we can get better and produce better added-value analysis. That’s a technical challenge to break through those barriers and gain access to those tools, but I think also building a critical mass in the Department – what I said before, a coherent community of people with these skills, sharing our knowledge and building our expertise together.
– Yeah... that’s excellent. Well, I hope you enjoy the rest of today. And, yeah – we managed to do this as well with Ken Roy just over there. So that’s been a bit of a challenge.
Yes, I’m looking really good with my Head of Profession sat beside me. So, thanks for that kudos. Cheers.