CV/Resume: Hikaru Tsujimura, PhD

Updated on 22 October, 2025

Alt Text

Profile

Researcher in Cognitive Neuroscience and AI Safety (Interpretability, Chain-of-Thought reasoning in LLMs) at UCL/Cardiff.
Links: LinkedIn ↗ / X ↗ / Google Scholar ↗ / GitHub ↗

Current interests (October, 2025)

Building on my research in Cognitive Neuroscience—where I studied how human values, decisions, and actions are internally represented—I now focus on AI Safety, particularly on understanding how misaligned internal representations in AI systems lead to unintended value preferences and behaviors.

This challenge is likely to persist as:

AI systems grow rapidly in capability, complexity, and diversity.
Digital architectures are not inherently designed to align with human values and thinking processes.
Forced alignment often induces workaround or deceptive behaviors (e.g. “scheming” AIs).
We lack standardized methods to reliably interpret internal representations linked to model outputs.

Accordingly, my first goal is to develop principled approaches for interpreting and mitigating representational misalignment in AI systems.

In 2025, frontier models began demonstrating new roles in enhancing human creativity and productivity (e.g. New Chess Move Discoveries by DeepMind/Goodfire; Scientific Data Interpretations by Leap Lab). That same year, my study also uncovered hidden emotional and logical structures underlying assertive and persuasive model outputs, offering insights into effective human communication. Building on this, as my second goal, I am eager to build tools that leverage frontier models to amplify human cognitive capacities.

Technical Skills

Python: PyTorch, Transformers, Peft, Accelerate, TransformerLens, Scikit-learn, Numpy, Pandas, Matplotlib, Wandb, and more
LLMs: LLama2-3.2, Qwen3, DeepSeek, Gemma2, GPT-OSS, DeepCogito, Magistral, OpenAI API(Whisper)
AI Safety:
- Chain-of-Thought (CoT) Reasoning in LLMs
- Mechanistic Interpretability (SAE, TransformerLens)
- Overconfidence Analysis in LLMs
- Safe Reinforcement Learning (PPO)
AI Safety Research Tools
- Quantized Fine-Tuning (QLoRA), Unsloth, TransformerLens, Vast.AI, Google Colab, Remote Server
Programming: Bash, JavaScript, R, Git, Cluster-Job, Matlab, SQL (PostgreSQL), Docker, FastAPI, Streamlit
Analysis:
- Statistical: Regression, ANOVA, Bayesian methods, PCA, Factor Analysis, Mixed Models
- Machine Learning: Supervised & Unsupervised Learning, Variational Inference, Neural ODE, RNN (LSTM, GRU)
- Behavioral: Human-subject Experiments, Large-scale Online Studies, Survey Design (GDPR-compliant)
- Neuroimaging: Sleep-EEG, fMRI, Brain Activity Pattern Modeling
Experimentations: Large-scale, Online, Human-subjects/GDPR, Behavioral/Survey, Brain-imaging (Sleep-EEG/MRI)
Writing: Journal papers, Preprints, GitHub Blog/Codebase
Communication: Leadership to Masters/PhDs, Teaching, Conference presentations, Collaboration with scholars/industrial partner

Recent Experience

Post-Doctoral Researcher, 2019-Present - Cardiff University

⇨ Machine Learning (ML) Projects

# 1. Group Project 2023 @ Neuromatch Academy Workshop in Deep Learning course (July, 2023)

Led a team of 7 PhD students and researchers to complete a 3-week ML project.
Utilized PyTorch, pre-trained networks (e.g. CLIP) and 4.6M human choice data from 14K participants.
Aligned models with the above data to visualize internal representations of everyday objects in humans (GitHub).

# 2. Independent Projects (August, 2023 - August, 2024)

Optimized model alignment with individual participants using the above data.
Utilized pre-trained networks, efficient quantized fine-tuning (QLoRA), LLMs (e.g. Llama 2).
Discovered participant-specific internal representations (GitHub).

# 3. Group Project 2024 @ Neuromatch Academy Workshop in NeuroAI course (July, 2024)

Led a team of 2 PhD students to complete a 2-week NeuroAI project.
Used RNN-based models (Neural ODE, LSTM, GRU) for motor tasks (3-Bit Flip Flop and Random Target task).
Achieved strong alignment between model outputs and muscle activity data across varied tasks (GitHub).

# 4. Collaborative Project 2024-2025 @ University College London (May, 2024 - Present)

Visiting postdoc in Prof. Bradley Love’s lab, extending the previous independent work (#2).
Utilizing Tensorflow, Variational Inference, ML models (e.g. Deep Support Vector Machine).
Training ML models to predict and generate human behaviors and brain activity patterns (fMRI).

# 5. AI Safety Fundamentals - Alignment Course 2024 @ BlueDot (October, 2024 - February, 2025)

Gained core knowledge on AI safety, such as Scalable Oversight, Mechanistic Interpretability, and AI Controls.
Led an individual project developing a novel interpretability method with SAE and generative models (GitHub).

# 6. Supervised Program for Alignment Research (SPAR) 2025 @ Kairos (February, 2025 - May, 2025)

Mentored by Kellin Pelrine and Puelma Touzel (Mila) on building more trustworthy, reliable AI systems.
Fine-tuned Llama 3.2 models (1B/3B/11B) to detect overconfident behaviors in outputs. (Group/Preprint)
Visualized internal confidence signals with TransformerLens to spot model overconfidence. (Solo/Preprint)

# 7. Machine Learning Institute 2025 (May, 2025 - July, 2025)

Implemented and deployed hackathon-style weekly ML/AI projects for 6 weeks.
Utilized Streamlit, Docker, FastAPI, PostgreSQL, LLMs (e.g. OpenAI’s Whisper model) and PPO algorithms.
Selected Projects: Digit recognizer (Demo) | Own GPT model from scratch (GitHub Repo)

# 8. MATS 2025 Application Project (August, 2025)

Conducted a 18/20 hour Mechanistic Interpretability project (two experiments).
Employed PyTorch, nnsight, as well as providing project codebase.
Probed hidden representational structures in Qwen3 to identify implicit mathematical reasoning knowledge (Google Doc).

# 9. Impact Research Groups (IRG) 2025 @ Arcadia (July, 2025 - October, 2025)

Conducted research on Chain-of-Thought (CoT) reasoning in LLMs (DeepSeek, Qwen3, Gemma2, GPT-OSS).
Replicated Emmons et al., (2025)’s work, showing CoT faithfulness under misleading prompts, when think-mode is off.
Discovered that such faithful behaviors vanished with think-mode, raising a need for a new intervention (Google Slide).

⇨ Cognitive Science Projects

# 10. Memory & Curiosity Projects @ Motivation and Memory Lab (2019-2022)

Collaborated with BBC on online studies (behavioral/survey) with 500 subjects.
Utilized Python, JavaScript, SQL, Regression, PCA, Factor analysis, K-means Clustering, Linear Mixed Model.
Identified three motivational factors from survey data that predicted BBC users’ news consumption behavior. (Preprint)

# 11. Decision Making Projects @ Cognition and Computational Brain Lab (2022-2023)

Conducted 28 online behavioral studies (30-45 min long) with 700 subjects.
Reduced online study duration by 99% using Python/R for agile implementation, analysis and visualization.
Uncovered impacts of initial practices for later choice preferences, enhancing user behavior prediction. (Preprint)

Prior Experience

PhD Student , 2013-2019 - University of Manchester

# 12. Sleep & Memory Projects @ Sleep Lab (2013-2019)

Supervised by Prof. Penny Lewis
Investigated impacts of sleep on memory and brain activity changes over time (Paper).
Gained hands-on experience in sleep (EEG) and brain-imaging (fMRI) data analysis, teaching and supervision.
Learned Python/R and analysis of A/B tests, t-tests, ANOVA, Bayesian statistics, linear/non-linear model fitting.

Voluntary Research Assistant, 2011-2013 - Goldsmiths, University of London

# 13. Face Processing Projects @ Banissy Lab (2013-2019)

Supervised by Prof. Michael Banissy
Investigated on various characteristics of face processing (e.g. perceiving physical fitness from static facial images)
(Paper1, Paper2)

Education

PhD in Psychology, University of Manchester, Manchester, UK, 2013-2018
MSc in Cognitive Neuroscience, University College London, London, UK, 2010-2011
BA in Psychology, Saint Louis University, St. Louis, MO, USA, 2006-2010