The core developer of Bokeh was kind enough to give us some of his time recently in order to shed some additional light on the project he helms for our readers. I won't spoil anything by shoehorning any summarized info here; instead, read on to get some insight into both Bokeh and Bryan Van de Ven.
You can find the Bokeh project here. Bryan's Twitter can be found here, and his LinkedIn is here.
Bryan Van de Ven: Sure, my name is Bryan Van de Ven. I currently work for Continuum Analytics, where I have been since it was founded in 2012. I am grateful for the opportunity it has provided to contribute to OSS projects such as Conda and Bokeh. There's a lot of (justified) talk about the "sustainability problem" in Open Source and I'd like to think we are helping to explore robust ways of providing tangible and meaningful support to OSS development.
How would you characterize Bokeh as it relates to other similar projects in the Python stack? For example, would it be correct to call Bokeh a replacement for Matplotlib? An upgrade? More comparable to Seaborn? Something different altogether?
I think "different" is the right word. As an example: until recently Bokeh lacked PNG and SVG export capabilities, which made MPL the go-to for academic publishing needs. Bokeh has recently added PNG and SVG exports, and I think they will cover many use-cases. But there are still probably instances where MPL is a better choice, if you have very precise or specific needs. Beyond that I think it's mostly a matter of taste and which sort of API a person prefers.
Bokeh's "plotting" API is lower level than Seaborn. HoloViews is a recent project for high-level exploratory data analysis in Python, that can also generate beautiful visualizations using Bokeh. HoloViews is definitely the "officially endorsed" very-high level API on top of Bokeh (to replace the old bokeh.charts) and so I think the combination of Bokeh+HoloViews directly comparable to Seaborn in terms of capability. But again, I think the best word is "different". HoloViews approach is extremely declarative, and whether a person likes or prefers this style is mostly a matter of taste.
This is Bokeh (Source).
What does Bokeh do better than other similar projects?
Basically what I think Bokeh does best is to allow people to create sophisticated data visualizations in browsers while staying where they are already comfortable and productive (i.e., Python or R).
So I think if you are looking to create interactive visualizations in web browsers, including in Jupyter notebooks, then Bokeh (or Bokeh+Holoviews) is a compelling choice. If you're looking to connect the incredible constellation of PyData tools (e.g. NumPy, SciPy, Pandas, sklearn, etc) to scalable and deployable web "data apps" with a minimum of code even less mucking with "web tech" then I think Bokeh is the clear choice.
Are there many other developers contributing to Bokeh? Is there opportunity for others to get involved?
The number of "dedicated core devs", i.e. people funded by Continuum or other sources to work directly on Bokeh, goes up and down with time. Right now there are 2-3 people spending a majority of their time on Bokeh, but it's been as high as 7 or 8 in the past.
However, we have tried to make Bokeh extremely welcoming to new contributors and I happy to report that more than a few people have tweeted "I just made my first ever OSS contribution and it was to Bokeh!" At present GitHub lists 242 total contributors to Bokeh. Many of those are one-time or small PRs, but those are actually extremely valuable. The effort (or distraction from other tasks) involved in creating a new PR yourself vs reviewing a PR someone else makes can be pretty substantial, so any PR helps lighten the load of the core devs and is always appreciated.
That said, we'd love to have more people become involved with Bokeh in substantial or long term ways. Part of the onus for making that happen is on us, and we've recently been working to open up our core dev discussions better by moving them to a public gitter chat channel. So if anyone is interested in getting more involved, please come by gitter.im/bokeh/bokeh-dev and give us a holler.
You instruct a DataCamp course on data visualization with Python. Can you tell us a bit about that? What can people expect from it?
The DataCamp course covers the basics of using the bokeh.plotting API, a selection of more advanced topics, and most importantly, practice creating Bokeh server applications. I think the course is a great way for people to get into using Bokeh, especially if lecture presentation plus exercises matches your learning style. In any case I think it's a good jumping off point that can help anyone get better oriented to the Bokeh community: where to go for further questions, some context to help use the documentation better, etc.
What is one piece of overlooked advice you would give individuals getting into a "data science" -- or related -- career?
I'm definitely *not* the right person to dispense advice in this area, because I am definitely not a scientist, data or otherwise. I think it's really valuable to introspect your own strengths and weaknesses realistically, and although I ended up in grad school for physics for a time, ultimately I am most productive as a tinker and a tool maker, not an explorer. So my meta-advice is to find and follow great voices like Hillary Mason, John Myles White, Lorena Barba and others, and see what they have to say.
Do you have any last words to share with our audience?
Just a (hopefully) handy list of resources people can turn to!
Example Apps: https://demo.bokehplots.com
Mailing List: https://groups.google.com/a/continuum.io/forum/#!forum/bokeh
Gitter Chat: https://gitter.im/bokeh/bokeh
Thanks again for taking the time to speak with us, Bryan!
- Bokeh Cheat Sheet: data Visualization in Python
- Automated Machine Learning: An Interview with Randy Olson, TPOT Lead Developer
- Introducing Dask for Parallel Programming: An Interview with Project Lead Developer