RAx Labs Inc.
The Product
RAx is a web app that assists research scholars to speed up their information discovery, improve understanding of scholarly articles and organize knowledge to be more productive.
It is one of its kind tool in the research productivity landscape, integrating insights of researcher behavior and a deep understanding of how qualitative research is produced with need-driven Artificial Intelligence and innovative design.
Responsibilities
In this Product, I had to wear multiple hats and justify responsibilities of each role. Here is the summary of what I did in each role.
Product Architect
- I have been a part of every major architecture decision in the entire organization (Platform team, Web dev team & AI Team).
- I have redesigned the systems' architecture and achieved 4 nines of availability, improved scalability and fault tolerance.
- I have carefully analyzed the entire architecture to identify bottlenecks, single point of failures and possibilities for improvements in terms of scalability and availability.
- Devised and implemented auto-scaling strategies and blue-green deployment strategy for the customer facing services to reduce downtime and improve reliability.
- Migrated 25+ services to utilize event-driven and serverless architecture (AWS Lambda, DynamoDB, S3, SQS, SNS, ECS, RDS). This resulted in 34% drop in the AWS monthly bill.
- I have introduced Terraform and Ansible in the team and migrated 13 services to utilize IaC (Infrastructure as Code), as added CI/CD pipelines for 20+ services using Github Actions.
- Identified the technical use-cases and divided the application data into different database/storage solutions like AWS RDS, Elasticache, DynamoDB and AWS S3 standard and intelligent tier.
Tech Lead
- My first task a lead was to created a very product focused development team. I created a team of 8 engineers from 0 after filtering 500+ applications and conducting total of 100+ rounds of interviews.
- As a tech-lead I set up new codebase using modern technologies and establishing new coding standards.
- I have to ensure the quality of the implementation by reviewing code, approaches, deployment documents and test cases.
- I conduct monthly one-on-ones with my team to,
- Get feedback on how we can improve the processes and workflows to help the developers.
- Provide them feedback on how they are doing and ask them what we can do to help.
- Help the development team to create and keep track of their career/learning goals and mentor them to achieve their goals.
Technical Project Management
- Technical Requirement Clarification
- While in this state, all the requirements are shared with the engineering team and every part of the requirement is dissected and clarified.
- Many times, new cases/behavior are identified in this phase and they are prioritized by the product manager.
- Final version of the requirements and scope is freezed and the team can now start doing the technical research.
- Technical Research
- During this phase, the development team revisits the existing system and find what parts of the system needs to be changed or be created.
- Also, if there are any technological barriers/hurdles identified, the development team makes sure that, the project version is technically feasible.
- If there are multiple members in the team, some members are doing technical feasibility analysis for the unsure parts and the rest of the team identifies test cases.
- Test Case creation
- Using the final requirements, a through list of test cases are identified and documented.
- The engineering manager or tech-lead reviews these cases and finalizes the correct understanding of the requirements.
- Approach Creation
- After the approach has been finalized, the developer team breaks down tasks in smaller chunks and allocates the task between all involved developers.
- Internal dependencies on the tasks are identified and tasks are ordered based on that.
- The Technical Project Manager or tech-lead reviews this breakdown and tasks allocation and updates the priorities.
- Task Estimation
- After task breakdown and allocation, every developer estimates the effort in terms of man-hours
- After having man-hour estimate for every task, and list of internal dependencies, a development deadline is calculated with 20% buffer.
- Once finalized, a deadline is communicated to the the TPM and the Product Manager.
- Development
- The team starts developing the finalized approach and writing test cases for functions they write.
- While in development only, some demo's are scheduled. TMP and the tech-lead provides feedback on the demo and changes are incorporated.
- Developer Testing
- Once done developing, the development team executes all of the unit and e2e tests.
- If some cases are failing, they are fixed and tests are run again.
- Code Review
- The team starts with peer reviews, If any comments are provided they are fixed and reviewed again.
- After the team is done with peer reviews, the tech-lead reviews the pull request and changes are proposed.
- After making the final changes, tests are run and code is reviewed once again.
- Beta Testing
- While the code review process is in progress, a working build of the product is deployed on the staging environment and the product team is asked to test it
- If there are any bugs or feedback, they are fixed and incorporated, tests are run and code review is performed
- Ready for Launch
- Once all of the comments fixed, and the staging build is approved, the project is parked here until the marketing team is ready
- Launch & Live testing
- If mission critical services are getting deployed, a blue-green deployment strategy and A/B testing is applied and services are launched one by one.
- All the deployed services are tested in A/B testing mode and it is checked for any bugs or issues by the entire team.
- If there are any bugs found, they are prioritized based on the severity and fixed.
- If everything works out, the changes are rolled out to all of the users.
- Monitoring and health checks
- After the changes have been rolled out, the affected service are observed for changes in terms of latency, throughput and number of failed requests. If that goes above a threshold, a cloud watch alarm is triggered and the team is notified.
- Complete
- As a TPM, I was also responsible mitigating deadline risks and conducting retrospective sessions to get more insights and identify improvements as a team.
As a TPM, I was also responsible mitigating deadline risks and conducting retrospective sessions to get more insights and identify improvements as a team.
Technical details
RAx is technically a complex product. It has many moving components and plugins, We primarily use the below mentioned technologies.
Frontend
- NuxtJS
- This is our primary frontend framework(VueJS), for a newer version we might want to move away from this entirely.
- TailwindCSS
- We have adopted the TailwindCSS as our CSS framework.
- TailwindUI
- TailwindUI is the commercial version of the components created by the TailwindCSS team.
Chrome Extension
- VueJS
- Product's chrome extension has been developed with VueJS
Backend
Frameworks
- Serverless Framework
- We use serverless framework to develop AWS lambda functions.
- SailsJS
- Earlier the entire backend was written with SailsJS, but now we have started migration from the SailsJS to serverless framework
Payment Gateway
- Stripe
- RAx has a complicated implementation of the payment gateway integration. We are using billing API to charge customers in monthly subscriptions. This is also made to support upfront payments.
Infrastructure
- AWS ECS
- It is used as container orchestration service to increase utilization and ensure redundancy. Services which can not be migrated to AWS Lambda are run using ECS.
- AWS SQS
- SQS is primarily used an event queue and other services/lambda functions consume it.
- AWS Cloudwatch metrics
- Cloudwatch metrics are used to trigger alarms, which notifies developers in case of failures.
- AWS EC2 Instances
- Many AI services still use EC2 instances inside ECS clusters.
- AWS EBS Volumes
- EBS Volumes are used to store large repository of Open Access papers and user uploaded files, and Elastic search indexes this and runs NLP models.
- AWS Cloudwatch logs
- Cloudwatch is used as logging service
- AWS S3
- This is our primary blob/object storage solution. We store user files, invoices and our static website on S3.
- User files S3 bucket utilized the intelligent tier of S3
- AWS Cloudfront
- The landing page is being served using a cloudfront distribution, also some User files are also served using cloudfront
- AWS Auto Scaling Groups
- RAx uses auto-scaling groups for to maintain the throughput at unpredictable workloads
- AWS ALB
- ALBs are used in-front of the Auto Scaling Groups
- AWS Lambda
- AWS Lambda functions are created using serverless functions, we have already migrated most important services to use Lambda functions.
- API Gateway (for all HTTP and WebSocket requests)
- To server
Databases
- AWS RDS - MySQL
- This is our primary database instance, all of the application data is stored here.
- AWS DynamoDB
- AWS DynamoDB is being used as cache store for socket sessions.
- Redis
- The AI team uses redis in their dedicated server
Challenges
Wearing multiple hats.
- As I have my roles mixed with project management and technical lead, I have to put in a lot of efforts to justify each responsibility.
- It has been difficult but the it is one of the most proud things I have done.
Dealing with the technical debt, while maintaining the feature development pace.
- RAx had a lot of technical debt, the team was focused on the AI side of the things, the user's side of the product suffered.
- I had to convert the solution from a MVP like stage to a production grade system, and there were ton of challenges, one of them was to improve the system while maintaining the feature development pace.
Defining engineering process which align with the end-to-end product development lifecycle.
- When I started out with RAx, there were no clear guidelines on how they prioritize the engineering tasks, I had to do a lot of trial and error with processes to achieve a framework that works for the product team.
Building the team with very limited resources.
- When I joined, RAx didn't have significant financial resources, So, I had to find excellent engineering resources with a constrained budget.
Training the team to think like a product person and not just as a developer.
- This has been an issue with not just RAx, but I have rarely seen developers think like a product person, have opinions in terms of design and experience. And it is very important trait of the product development team. It took a while but it works
Making sure suggested architecture changes are implemented across multiple teams.
- I was responsible for suggesting and taking architecture decisions across multiple teams and services. Coordinating the progress was a challenge.