NSF Data Science Workshop

X-Team #7 a.k.a The Crabs

When my advisor sent me an email requesting that I apply to the NSF Data Science Workshop, which required submission of a White Paper on Big Data in my research, and saw that the deadline was in 5 days, I was doubtful. It sounded like a great experience—I'm quite the fan of workshops—and the prospect of going to Seattle for the first time was exciting too! However, after surviving the two-month-long, drowned-in-ink revisions to my first publication, I was extremely doubtful that I could write anything of quality in that time span. Of course, I wouldn't be in graduate school, and I definitely wouldn't be a PhD candidate if I let being unsure stop me, so I opened a blank document and started typing.

As often is the case with self doubt, I proved myself wrong. Not only did I manage to complete something I wasn't embarrassed to submit (technical writing may not come easy to me, but I'm not going to settle for submitting poorly written papers when I can work harder for a polished paper), but the paper was written well enough that I was invited to attend the workshop! Needless to say, I was thrilled.

At the workshop they divided us into X teams and then Y teams to discuss synergies, challenges, and solutions to data science problems across our fields. Our X teams were given mascots that appeared on our name badges to help us find each other on the first day; my team, pictured above, was the crabs. In this team were people working in Material Sciences, Genomics, Operations Research, Embedded Systems (me!), and much more. At a superficial glance, there seemed no reason for our grouping, though we were assured there was, and our first task was to determine what keywords from our papers brought us together. Hint: all of our projects dealt with data classification!

Our Y teams were meant to identify challenges and solutions, of which the overall consensus was a main challenge of deciphering jargon. Each field has a different names for the same concepts (e.g. graphs and networks). One of my Y teammates was particularly bitter towards us Computer Scientists—he felt we weren't transparent enough with our methods, which sparked an interesting conversation. At the end of the session, the most obvious problem was still communication. Each field has nuances and challenges that other fields don't, and thus people external to the field may not understand. For instance, when discussing the transparency of code, the teammate who was upset suggested publishing "better" pseudocode and complained that something presented in 20 lines actually took him 500 lines to implement. Ignoring factors such as language, skill set, and experience, I pointed out that 500 lines or even 250 lines of pseudocode would take up the entire page limit of a paper. This brought us to the discussion of journal publications versus conference publications and a fellow Computer Scientist pointed out that by the time a paper goes through a journal submission process, the information is outdated in our field. I suggested he just email the author and politely ask for the code, which he didn't know he could do. This is but one example of where communication can be improve a situation. Simply having frank discussions (without bring  emotion and ego along) to figure out how we should share and communicate knowledge can do wonders for interdisciplinary fields such as data science.

While I was agitated by the hostility this teammate seemed to have for Computer Scientists, it was important to hear what he had to say; after all he's one of the people that may be reading my papers, or collaborating on a project with me in the future. It is important to know the weaknesses in your own presentation of information to improve your writing as well as to multiply your range of impact. What good is coming up with a working algorithm if no one understands it and it gets swept under the rug? There is much everyone from all fields can do to improve communication, but I know when I sit down to write my next paper I'll hear that teammate's voice and pay extra attention to my pseudocode.

Data Science, as I've learned since entering graduate school, is a very broad topic. The principles and concepts can be seen and utilized in almost every field—possibly every field, one of my Y team members was actually an artist.  As we sat in these groups and talked, we saw a lot of overlap in what we do. This brings to light opportunities not only for collaboration, but for supporting relationships as well. Previously, when I had trouble with my projects I would go to my lab mates, my advisor, my committee, etc., but after talking with my teammates, we realized that sometimes looking outside of our department may be just as, if not more, beneficial in terms of seeking help.

Overall, the workshop was a great learning experience. I met great researchers, heard from great minds in the field, and learned some information I didn't know before—which is exactly what I would expect and hope for from an NSF workshop. If you're involved with data science, regardless of your field, I definitely recommend you communicating with people outside of your field and if they have a NSF Data Science Workshop 2016, I highly recommend attendance.

The poster I presented.