CV/Resume: Hikaru Tsujimura, PhD
Updated on 3 September, 2025
Profile
Current interests (September, 2025)
Building on my research in Cognitive Neuroscience—where I studied how human values, decision-making, and actions are internally represented—I have become interested in AI Safety, particularly in understanding how misaligned internal representations in AI systems lead to unexpected values and behaviors. This challenge is likely to persist in next-generation AI systems because:
- AI capabilities are rapidly expanding, diversifying and becoming increasingly complex.
- Digital architectures are not inherently designed to align with human values.
- Forced alignment can produce unintended or workaround behaviors.
- No standardized methods exist to reliably interpret internal representations and link them to model outputs.
My goal is to develop principled approaches for interpreting these representations and mitigating alignment risks in AI systems.
Technical Skills
- Python: Numpy, Pandas, Tensorflow, PyTorch, Scikit-learn, Matplotlib, Seaborn, and much more
- LLMs: LLama2-3.2, Qwen3, DeepSeek, Gemma2, GPT-OSS, OpenAI API, Quantized Fine-Tuning (QLoRA), Unsloth, TransformerLens
- AI Safety:
- Chain-of-Thought (CoT) Reasoning in LLMs
- Mechanistic Interpretability (SAE, TransformerLens)
- Overconfidence Analysis in LLMs
- Safe Reinforcement Learning (PPO)
- Programming: Bash, JavaScript, R, Cluster-Job, Matlab, SQL (PostgreSQL), Docker, FastAPI, Streamlit
- Analysis:
- Statistical: Regression, ANOVA, Bayesian methods, PCA, Factor Analysis, Mixed Models
- Machine Learning: Supervised & Unsupervised Learning, Variational Inference, Neural ODE, RNN (LSTM, GRU)
- Behavioral: Human-subject Experiments, Large-scale Online Studies, Survey Design (GDPR-compliant)
- Neuroimaging: Sleep-EEG, fMRI, Brain Activity Pattern Modeling
- Experimentations: Large-scale, Online, Human-subjects/GDPR, Behavioral/Survey, Brain-imaging (Sleep-EEG/MRI)
- Writing: Journal papers, Preprints, GitHub Blog
Recent Experience
Post-Doctoral Researcher, 2019-Present - Cardiff University
⇨ Machine Learning (ML) Projects
# 1. Group Project 2023 @ Neuromatch Academy Workshop in Deep Learning course (July, 2023)
- Led a team of 7 PhD students and researchers to complete a 3-week ML project.
- Utilized PyTorch, pre-trained networks (e.g. CLIP) and 4.6M human choice data from 14K participants.
- Aligned models with the above data to visualize internal representations of everyday objects in humans (GitHub).
# 2. Independent Projects (August, 2023 - August, 2024)
- Optimized model alignment with individual participants using the above data.
- Utilized pre-trained networks, efficient quantized fine-tuning (QLoRA), LLMs (e.g. Llama 2).
- Discovered participant-specific internal representations (GitHub).
# 3. Group Project 2024 @ Neuromatch Academy Workshop in NeuroAI course (July, 2024)
- Led a team of 2 PhD students to complete a 2-week NeuroAI project.
- Used RNN-based models (Neural ODE, LSTM, GRU) for motor tasks (3-Bit Flip Flop and Random Target task).
- Achieved strong alignment between model outputs and muscle activity data across varied tasks (GitHub).
# 4. Collaborative Project 2024-2025 @ University College London (May, 2024 - Present)
- Visiting postdoc in Prof. Bradley Love’s lab, extending the previous independent work (#2).
- Utilizing Tensorflow, Variational Inference, ML models (e.g. Deep Support Vector Machine).
- Training ML models to predict and generate human behaviors and brain activity patterns (fMRI).
# 5. AI Safety Fundamentals - Alignment Course 2024 @ BlueDot (October, 2024 - February, 2025)
- Gained core knowledge on AI safety, such as Scalable Oversight, Mechanistic Interpretability, and AI Controls.
- Led an individual project developing a novel interpretability method with SAE and generative models (GitHub).
# 6. Supervised Program for Alignment Research (SPAR) 2025 @ Kairos (February, 2025 - May, 2025)
- Mentored by Kellin Pelrine and Puelma Touzel (Mila) on building more trustworthy, reliable AI systems.
- Fine-tuned Llama 3.2 models (1B/3B/11B) to detect overconfident behaviors in outputs. (Group/Preprint)
- Visualized internal confidence signals with TransformerLens to spot model overconfidence. (Solo/Preprint)
# 7. Machine Learning Institute 2025 (May, 2025 - July, 2025)
- Implemented and deployed hackathon-style weekly ML/AI projects for 6 weeks.
- Utilized Streamlit, Docker, FastAPI, PostgreSQL, LLMs (e.g. OpenAI’s Whisper model) and PPO algorithms.
- Selected Projects: Digit recognizer (Demo) | Own GPT model from scratch (GitHub Repo)
# 8. Impact Research Groups (IRG) 2025 @ Arcadia (July, 2025 - September, 2025)
- Selected to conduct research on Chain-of-Thought reasoning in LLMs (DeepSeek-R1, Qwen3, Gemma2, GPT-OSS).
⇨ Cognitive Science Projects
# 9. Memory & Curiosity Projects @ Motivation and Memory Lab (2019-2022)
- Collaborated with BBC on online studies (behavioral/survey) with 500 subjects.
- Utilized Python, JavaScript, SQL, Regression, PCA, Factor analysis, K-means Clustering, Linear Mixed Model.
- Identified three motivational factors from survey data that predicted BBC users’ news consumption behavior. (Preprint)
# 10. Decision Making Projects @ Cognition and Computational Brain Lab (2022-2023)
- Conducted 28 online behavioral studies (30-45 min long) with 700 subjects.
- Reduced online study duration by 99% using Python/R for agile implementation, analysis and visualization.
- Uncovered impacts of initial practices for later choice preferences, enhancing user behavior prediction. (Preprint)
Prior Experience
PhD Student , 2013-2019 - University of Manchester
# 11. Sleep & Memory Projects @ Sleep Lab (2013-2019)
- Supervised by Prof. Penny Lewis
- Investigated impacts of sleep on memory and brain activity changes over time (Paper).
- Gained hands-on experience in sleep (EEG) and brain-imaging (fMRI) data analysis, teaching and supervision.
- Learned Python/R and analysis of A/B tests, t-tests, ANOVA, Bayesian statistics, linear/non-linear model fitting.
Voluntary Research Assistant, 2011-2013 - Goldsmiths, University of London
# 12. Face Processing Projects @ Banissy Lab (2013-2019)
- Supervised by Prof. Michael Banissy
- Investigated on various characteristics of face processing (e.g. perceiving physical fitness from static facial images)
- (Paper1, Paper2)
Education
- PhD in Psychology, University of Manchester, Manchester, UK, 2013-2018
- MSc in Cognitive Neuroscience, University College London, London, UK, 2010-2011
- BA in Psychology, Saint Louis University, St. Louis, MO, USA, 2006-2010