Data Scientists at Microsoft had announced an early public preview of Team Data Science Process (TDSP) in order to support secure collaboration within enterprise data science organization. With this, the TDSP would eventually help users structure their data and science projects by facilitating the set of standardized Git repositories, document templates and also utilities that will be relevant depending on the lifecycle of the project. The best part is that the team has created a separate repository of utilities in order to boost data science productivity.
Data Science Utilities
Interactive Data Exploration, Analysis, and Reporting (IDEAR) will allow data scientist to explore and visualize a data set interactively while the Automated Modelling and Reporting (AMAR) will offer baseline model training, model sweeping and also parameter sweeping. The TDSP team has gathered feedback from the data scientists and as a result has come up with new features along with improvisations on the existing ones.
R and Python languages have been extensively used for analytics and data science, and now the team has released IDEAR in Jupyter Notebooks. This will let data scientists who prefer Python to visualize the data and make use of the IDEAR functionalities. Furthermore, users can also upload the Jupyter Notebook to a Jupyter Notebook server and start investigating the data set by setting up the working directory on the Jupyter Notebook.
Apart from the Python the IDEAR in R will now successfully extract the date-time components from the date-time fields automatically. This means that the data scientists no more have to write code to extract the dateline components including year, month, weekday and use them as extra variables for any further analysis. Also, IDEAR in R will work with enhanced datasets which will allow the data scientists to visualize and gain insights on the dynamics of the date and time components, relative to the target variables.
Other new features include slices in the pie chart to enhance readability. IDEAR in both R and the Python can be readily run on Azure Data Science Virtual Machine (DSVM) and the coding style has also become more consistent. For more on this, visit Technet.