Data Science Consulting
Consulting services specializing in text, images, unstructured data, and healthcare/pharmaceuticals.
Certified by Microsoft as an Azure Data Scientist.
Boehringer Ingelheim, National Health Service, Tesco, CV-Library
How can you use data science and machine learning to predict how much your customers will spend?
Machine learning model to predict turnover of employees in an organisation of 3 million workers
I have worked on a number of different projects where a client needed to parse scientific literature and identify occurrences of molecules or proteins.
As an example, the molecule on the right is Aspirin. This is still a trademark of Bayer in some countries. But in a paper it could appear under acetylsalicylic acid, 2-acetoxybenzenecarboxylic acid,
C9H8O4, or a number of identifiers such as DB00945. There could also be identifiers that refer to other molecules, or identifiers that refer to only one version of a molecule.
Another example I have encountered often in clinical papers is the gene ERBB2, which is important in certain types of breast cancer. ERBB2 is also called Erb-B2 Receptor Tyrosine Kinase, HER2, HER-2 and many other names. These names often also refer to the protein expressed by the gene. Many names are similar to common English words, and are not always capitalised in text.
Because of these pathological effects, the task of identifying names of proteins, genes and molecules in scientific literature is fraught with difficulty.
I have developed several tried and tested techniques to disambiguate these terms. Usually I need a number of annotated examples to start with, and I will train a machine learning model to learn from these examples and annotate new publications as they come in.
This can be deployed on the client’s servers and provide daily updates on a dashboard. This allows a client to monitor the literature in real time for publications around a particular molecule, protein or gene, or to spot trends in advance.
A client in the retail industry had a fleet of vehicles delivering produce at different times of day. They used third party logistics software to plan the delivery schedules, however an element of the delivery schedules that was hard to plan was the unloading time of the vehicle when it arrived at the store.
Fortunately there was a system in place for recording vehicle ignition events, GPS location, and geofencing to identify the arrival and departure times of delivery vehicles, and past schedules were available to identify the quantity and type of product delivered on each drop, which driver was in charge, and the time of day and type of vehicle used.
Using this trove of logged data I was able to train a simple regression model that would predict the unloading time of any future delivery at the time that the schedule is being generated.
This allowed the client to save money on driver overtime, disruption caused by late deliveries, and fines due to drivers working longer than their legally permitted hours.
With one customer in the recruitment industry I found that the web form that was used for jobseeker signup was very long. By analysing the fields in the form I was able to establish that users were confused by some fields, and lingered for a long time in some areas.
Since users also uploaded their CV which contains explicitly lots of personal information, as well as implicit information such as the job type or salary that someone was looking for, I was able to train a deep neural network on past signup data over several years, to analyse the CV and fill out some of the fields in the signup form automatically.
This allowed a field to be removed, which boosted the conversion rate of the form by 7%, measured by A/B testing.
This is one small example of what can be achieved combining text analysis and conversion optimisation techniques.
If you have a long signup form on your website please let me know and I may be able to deploy a machine learning model to improve the user experience and boost your conversions.
When a pharmaceutical company develops a drug, it needs to pass through several phases of trials before it can be approved by regulators.
Before the trial is run, the drug developer writes a document called a protocol. This contains key information about how long the trial will run for, what is the risk to participants, what kind of treatment is being investigated, etc.
The problem is that each protocol is up to 200 pages long and the structure can vary.
For one pharma company I developed and trained a deep learning tool to predict more than 50 output variables from a clinical trial protocol. This allows pharma companies and regulators to analyse and quantify large numbers of protocols, allowing more accurate cost estimation.
The technique can be extended to other industries where large unstructured or semi-structured documents are the norm.
If you have a problem of this nature please get in contact with me and I will be glad to discuss.
"We were extremely pleased with every aspect of this project and with our working relationship."
Fast Data Science has leveraged natural language processing (NLP) and machine learning technology to analyze nearly 1.2 million open-ended responses to a women's rights survey. They've visualized the data, too.
"We were extremely pleased with every aspect of this project and with our working relationship."
Dec 10, 2020
Based on the data, Fast Data Science has created infographics and presented their findings in accessible language. They're now working on an interactive dashboard that will allow site visitors to deep dive into the content. They've opened the internal staff's eyes to future campaigns.
The client submitted this review online.
Please describe your company and your position there.
Through its vast network of National Alliances, our nonprofit is activating the global movement for reproductive, maternal and newborn health and rights.
I work as the Senior Communications Officer, working closely on the "What Women Want: Demands for Quality Healthcare from Women & Girls" campaign that heard from more than 1 million women and girls in 114 countries about the one thing they want most for their own reproductive and maternal healthcare through an open-ended survey.
For what projects/services did your company hire Fast Data Science?
The "What Women Want: Demands for Quality Healthcare from Women & Girls" campaign resulted in the collection of nearly 1.2 million open-ended survey responses from women and girls around the world, in 18 languages.
Fast Data Science helped us analyze the requests to help us create advocacy agendas around those requests; these advocacy agendas are currently being used to elevate the voices and demands of women to create policy change. Fast Data Science used NLP and machine learning to analyze the survey responses, and to create an interactive dashboard for advocates interested in examining the data.
What were your goals for this project?
We wanted to understand the data we had in our hands, and be able to examine it and use it in a way that would allow our group to elevate the demands of nearly 1.2 million women around the world.
How did you select Fast Data Science?
Fast Data Science had all the qualities we were looking for in our large data project - in-depth understanding of the technology needed to achieve our goals, as well as excellent staff capable of explaining the complex technical aspects of the work in a way that our entire team could understand. And, as the name implies, Fast Data Science was extremely fast, helping us meet our deadlines.
Describe the project in detail.
After connecting with Fast Data Science, we held an initial kickoff conversation where the complexities of our project were laid out. We then shared all of the data and work that we had done on the project with Fast Data Science to run their analysis and begin the work.
Within one month, which included several conversations and refining of our requests as our side understood the power of the technology available to us, we had an our data not only analyzed but shared back with our team in plain language and in clear infographics.
We then began a new phase of the project where Fast Data Science converted our massive amount of static data into an interactive dashboard using Python and other tools.
What was the team composition?
I worked most closely with the Data Science Team, with others from my organization's side joining in on phone calls and planning sessions.
Can you share any outcomes from the project that demonstrate progress or success?
Fast Data Science helped us analyze nearly 1.2 million open-ended survey responses, allowing us to clearly see what our data was saying, and is now helping us present that information for an external audience with an interactive online dashboard that will allow users to perform their own deep dives into our data.
This dashboard was a dream project that we were not sure was actually possible to create - Fast Data Science helped us realize this dream and we are very pleased with the work.
How effective was the workflow between your team and theirs?
Fast Data Science was extremely responsive, and available for calls, sharing of information, and even presentations with our global team on a regular basis. We communicated over email and through Zoom, and our communications were always efficient, informative, and enjoyable.
What did you find most impressive about this company?
I had no idea the power of data science technology! What Fast Data Science was able to do for my organization, in such a short amount of time, was unbelievable and opened our eyes to what is possible in the future for our other campaigns. We were also really happy to have found a company that was able to communicate with our team of non-data scientists in a way that was clear and totally understandable.
Are there any areas for improvement?
We were extremely pleased with every aspect of this project and with our working relationship with the Fast Data Science team.