Research Publication
Title - Optimising Feature Selection: A Comparative Study of mRMR-Boruta/RFE Hybrid Approach



TLDR
Can be found on the home page.
Description
As the first author of an IEEE research publication, I led the development and evaluation of a hybrid feature selection model designed for high-dimensional datasets. My work involved researching about various data mining approaches and combining the strengths of multiple frameworks to create a hybrid feature selection model, one that has the capability to achieve optimal efficiency with minimal features. I explored techniques like mRMR (Minimum Redundancy Maximum Relevance) with wrapper-based techniques like Boruta and Recursive Feature Elimination (RFE) to enhance both the performance and interpretability of classification models. I implemented the methodology using Python, leveraging libraries such as scikit-learn, XGBoost, BorutaPy, NumPy, and Pandas. The model was benchmarked on real-world datasets from the UCI Machine Learning Repository, where it demonstrated a significant improvement in classification accuracy—from 90.21% on previous models to 95.83% with minimum features using the selected hybird model. This research was conducted under the guidance of Prof. Dilip Kumar Sharma, Dean (International Relations) at GLA University.
What the research journey was really like
The process wasn’t always smooth—far from it. I had to write a lot of code, debug issues, and continuously refine my implementation to match the expectations Prof. Sharma had for the project. While he often gave me the freedom to manage my time, there were phases when the pressure to deliver on tight deadlines really pushed me to my limits. I clearly remember one night where I stayed up late to finalize and send over an updated analysis.
Just as I was about to sleep, I got a reply—at 4:30 AM—with highlighted feedback and new ideas. I hadn’t even closed my eyes, and he had already found out my shortcomings. In a weird way, it kept me going. I’d wake up the next afternoon and dive in with his fresh perspective waiting in my inbox—it was intense, but also strangely motivating. Collaboration was mostly limited to my meetings and email exchanges with professor, but when I got stuck (more than once), professor connected me with few of his colleagues at our university who helped me gain the much needed clarity. As for the research paper itself—I wrote nearly all of it, but Prof. Sharma was instrumental in shaping it through multiple rounds of revisions, guiding the structure and sharpening the narrative.
How I came across this research opportunity
During a university hackathon, I was building an ed-tech solution—a fish detection algorithm to help farmers identify anomalies in their catch. In the midst of that, a graduate student named Shobhit Agrawal approached me for help with some machine learning challenges he had been stuck on for weeks. Apparently, he had reached out to several students at the event, but I was the only one who followed up. We met the following week, and worked through his entire masters thesis problem statement and the progress he has made. We worked together for the next 3 weeks through the data mining problem since I was really excited with the problem statement itself. This collaboration became a turning point. It introduced me to Prof. Dilip Kumar Sharma, who was supervising Shobhit’s research. After assisting with that project, I stayed in touch with Prof. Sharma and nurtured a strong working relationship, which eventually led to his guidance on my own independent research—culminating in this IEEE publication.
How I developed and executed the research
Because of my earlier experience with Shobhit’s thesis, I had a foundational understanding of data mining challenges, especially with high-dimensional datasets. For this research, Prof. Sharma gave me the freedom to design the study independently, while we met biweekly to review findings, challenges, and next steps. I began with a deep dive into feature selection techniques, intrigued by the trade-off between model performance and interpretability. I conceptualized a hybrid approach that combined mRMR’s statistical insight with wrapper methods like Boruta and RFE to enhance model accuracy and efficiency. At the time, I had working knowledge of Python and core ML concepts, but this research pushed me to go deeper—learning advanced techniques, evaluating models across UCI datasets, and tuning them for both speed and reliability. The iterative nature of the work taught me not just technical skills, but also how to approach research with structure, patience, and rigor.
Current Research Directions
My focus evolved from my undergraduate work on hybrid feature selection for large datasets, where I discovered that only a handful of features often exert disproportionate influence. This insight reshaped my view of technology's role: rather than pursuing more features, more data, we should identify the critical elements that drive the greatest impact in any system. This orientation—understanding which aspects most meaningfully affect outcomes—naturally drew me toward human/user focused thinking and currently I do have a inclination towards human-computer interaction, where I am exploring how understanding human needs and combining thoughtful design can be used to enhance human cognitive abilities and mental health.
TLDR
Can be found on the home page.
Description
As the first author of an IEEE research publication, I led the development and evaluation of a hybrid feature selection model designed for high-dimensional datasets. My work involved researching about various data mining approaches and combining the strengths of multiple frameworks to create a hybrid feature selection model, one that has the capability to achieve optimal efficiency with minimal features. I explored techniques like mRMR (Minimum Redundancy Maximum Relevance) with wrapper-based techniques like Boruta and Recursive Feature Elimination (RFE) to enhance both the performance and interpretability of classification models. I implemented the methodology using Python, leveraging libraries such as scikit-learn, XGBoost, BorutaPy, NumPy, and Pandas. The model was benchmarked on real-world datasets from the UCI Machine Learning Repository, where it demonstrated a significant improvement in classification accuracy—from 90.21% on previous models to 95.83% with minimum features using the selected hybird model. This research was conducted under the guidance of Prof. Dilip Kumar Sharma, Dean (International Relations) at GLA University.
What the research journey was really like
The process wasn’t always smooth—far from it. I had to write a lot of code, debug issues, and continuously refine my implementation to match the expectations Prof. Sharma had for the project. While he often gave me the freedom to manage my time, there were phases when the pressure to deliver on tight deadlines really pushed me to my limits. I clearly remember one night where I stayed up late to finalize and send over an updated analysis.
Just as I was about to sleep, I got a reply—at 4:30 AM—with highlighted feedback and new ideas. I hadn’t even closed my eyes, and he had already found out my shortcomings. In a weird way, it kept me going. I’d wake up the next afternoon and dive in with his fresh perspective waiting in my inbox—it was intense, but also strangely motivating. Collaboration was mostly limited to my meetings and email exchanges with professor, but when I got stuck (more than once), professor connected me with few of his colleagues at our university who helped me gain the much needed clarity. As for the research paper itself—I wrote nearly all of it, but Prof. Sharma was instrumental in shaping it through multiple rounds of revisions, guiding the structure and sharpening the narrative.
How I came across this research opportunity
During a university hackathon, I was building an ed-tech solution—a fish detection algorithm to help farmers identify anomalies in their catch. In the midst of that, a graduate student named Shobhit Agrawal approached me for help with some machine learning challenges he had been stuck on for weeks. Apparently, he had reached out to several students at the event, but I was the only one who followed up. We met the following week, and worked through his entire masters thesis problem statement and the progress he has made. We worked together for the next 3 weeks through the data mining problem since I was really excited with the problem statement itself. This collaboration became a turning point. It introduced me to Prof. Dilip Kumar Sharma, who was supervising Shobhit’s research. After assisting with that project, I stayed in touch with Prof. Sharma and nurtured a strong working relationship, which eventually led to his guidance on my own independent research—culminating in this IEEE publication.
How I developed and executed the research
Because of my earlier experience with Shobhit’s thesis, I had a foundational understanding of data mining challenges, especially with high-dimensional datasets. For this research, Prof. Sharma gave me the freedom to design the study independently, while we met biweekly to review findings, challenges, and next steps. I began with a deep dive into feature selection techniques, intrigued by the trade-off between model performance and interpretability. I conceptualized a hybrid approach that combined mRMR’s statistical insight with wrapper methods like Boruta and RFE to enhance model accuracy and efficiency. At the time, I had working knowledge of Python and core ML concepts, but this research pushed me to go deeper—learning advanced techniques, evaluating models across UCI datasets, and tuning them for both speed and reliability. The iterative nature of the work taught me not just technical skills, but also how to approach research with structure, patience, and rigor.
Current Research Directions
My focus evolved from my undergraduate work on hybrid feature selection for large datasets, where I discovered that only a handful of features often exert disproportionate influence. This insight reshaped my view of technology's role: rather than pursuing more features, more data, we should identify the critical elements that drive the greatest impact in any system. This orientation—understanding which aspects most meaningfully affect outcomes—naturally drew me toward human/user focused thinking and currently I do have a inclination towards human-computer interaction, where I am exploring how understanding human needs and combining thoughtful design can be used to enhance human cognitive abilities and mental health.
TLDR
Can be found on the home page.
Description
As the first author of an IEEE research publication, I led the development and evaluation of a hybrid feature selection model designed for high-dimensional datasets. My work involved researching about various data mining approaches and combining the strengths of multiple frameworks to create a hybrid feature selection model, one that has the capability to achieve optimal efficiency with minimal features. I explored techniques like mRMR (Minimum Redundancy Maximum Relevance) with wrapper-based techniques like Boruta and Recursive Feature Elimination (RFE) to enhance both the performance and interpretability of classification models. I implemented the methodology using Python, leveraging libraries such as scikit-learn, XGBoost, BorutaPy, NumPy, and Pandas. The model was benchmarked on real-world datasets from the UCI Machine Learning Repository, where it demonstrated a significant improvement in classification accuracy—from 90.21% on previous models to 95.83% with minimum features using the selected hybird model. This research was conducted under the guidance of Prof. Dilip Kumar Sharma, Dean (International Relations) at GLA University.
What the research journey was really like
The process wasn’t always smooth—far from it. I had to write a lot of code, debug issues, and continuously refine my implementation to match the expectations Prof. Sharma had for the project. While he often gave me the freedom to manage my time, there were phases when the pressure to deliver on tight deadlines really pushed me to my limits. I clearly remember one night where I stayed up late to finalize and send over an updated analysis.
Just as I was about to sleep, I got a reply—at 4:30 AM—with highlighted feedback and new ideas. I hadn’t even closed my eyes, and he had already found out my shortcomings. In a weird way, it kept me going. I’d wake up the next afternoon and dive in with his fresh perspective waiting in my inbox—it was intense, but also strangely motivating. Collaboration was mostly limited to my meetings and email exchanges with professor, but when I got stuck (more than once), professor connected me with few of his colleagues at our university who helped me gain the much needed clarity. As for the research paper itself—I wrote nearly all of it, but Prof. Sharma was instrumental in shaping it through multiple rounds of revisions, guiding the structure and sharpening the narrative.
How I came across this research opportunity
During a university hackathon, I was building an ed-tech solution—a fish detection algorithm to help farmers identify anomalies in their catch. In the midst of that, a graduate student named Shobhit Agrawal approached me for help with some machine learning challenges he had been stuck on for weeks. Apparently, he had reached out to several students at the event, but I was the only one who followed up. We met the following week, and worked through his entire masters thesis problem statement and the progress he has made. We worked together for the next 3 weeks through the data mining problem since I was really excited with the problem statement itself. This collaboration became a turning point. It introduced me to Prof. Dilip Kumar Sharma, who was supervising Shobhit’s research. After assisting with that project, I stayed in touch with Prof. Sharma and nurtured a strong working relationship, which eventually led to his guidance on my own independent research—culminating in this IEEE publication.
How I developed and executed the research
Because of my earlier experience with Shobhit’s thesis, I had a foundational understanding of data mining challenges, especially with high-dimensional datasets. For this research, Prof. Sharma gave me the freedom to design the study independently, while we met biweekly to review findings, challenges, and next steps. I began with a deep dive into feature selection techniques, intrigued by the trade-off between model performance and interpretability. I conceptualized a hybrid approach that combined mRMR’s statistical insight with wrapper methods like Boruta and RFE to enhance model accuracy and efficiency. At the time, I had working knowledge of Python and core ML concepts, but this research pushed me to go deeper—learning advanced techniques, evaluating models across UCI datasets, and tuning them for both speed and reliability. The iterative nature of the work taught me not just technical skills, but also how to approach research with structure, patience, and rigor.
Current Research Directions
My focus evolved from my undergraduate work on hybrid feature selection for large datasets, where I discovered that only a handful of features often exert disproportionate influence. This insight reshaped my view of technology's role: rather than pursuing more features, more data, we should identify the critical elements that drive the greatest impact in any system. This orientation—understanding which aspects most meaningfully affect outcomes—naturally drew me toward human/user focused thinking and currently I do have a inclination towards human-computer interaction, where I am exploring how understanding human needs and combining thoughtful design can be used to enhance human cognitive abilities and mental health.