**Latent Manifold Z and Model Relationships.**

Have you ever wondered if an LLM that “reason” more deeply — one that’s been rigorously trained for complex problem-solving — would create better, more semantically rich text embeddings? If a model can reason its way through intricate logic puzzles, its internal representation of language and understanding of meaning, may be superior.

Prepare for a plot twist. Our latest research, “Do Reasoning Models Enhance Embedding Models”, reveals a fascinating paradox: the enhanced reasoning capabilities of models optimized with Reinforcement Learning with Verifiable Rewards (RLVR) do not consistently translate to superior semantic representations when these models are used as backbones for text embedding models.

**Table1.** **Mean embedding benchmark performance (3 seeds).** We compare the base backbone M_base versus its RLVR-tuned reasoning model backbone M_reason; we also include an SFT-tuned backbone for reference. The ∆ (Std) column (gray) shows the mean performance gap ± standard deviation. The near-zero deltas for RLVR indicate that RLVR largely preserves the base model’s semantic effectiveness, contrasting with larger shifts under SFT.

This isn’t just an unexpected result; it presents a scientific puzzle: Why do powerful reasoning models, after all that specialized training, yield embedding performance that’s statistically identical to their simpler, non-reasoning counterparts?

To crack this puzzle, we needed to look beyond traditional performance metrics, which can often mask deeper internal dynamics. We introduced Hierarchical Representation Similarity Analysis (HRSA) — a novel framework that lets us dissect how models represent information at three distinct levels:

Representation Level: Focus on the coordinate basis and features.
Geometry Level: Focus on the overall shape and local structure of the latent manifold (the conceptual space where embeddings live).
Function Level: Focus on how effectively information can be read out for downstream tasks.

The “Aha!” Moment: Manifold Realignment

What HRSA unveiled is a phenomenon we term Manifold Realignment.

Think of the latent manifold (your embedding space) like a meticulously organized library. Each book has its place, and the entire layout makes sense.

When you train a model with Supervised Fine-Tuning (SFT), it’s like a new librarian comes in and completely reorganizes the library, moving shelves and re-categorizing everything. This fundamentally restructures the semantic landscape.
But when an LLM undergoes RLVR optimization, it’s different. RLVR is less about bulldozing the library and more about optimizing the pathways people take through the existing library. The global structure of the library (the overall semantic map) largely stays intact. However, within certain sections, the RLVR process might rearrange the books on the shelves for better local access (reorganizing the local geometry). Occasionally, if this rearrangement is extensive, even the labels on the shelves might slightly shift (coordinate basis drift).

**Figure 2. Heatmap of Dimension-Wise Correlation** (left, a metric used in Representation-level analysis) and **Linear CKA** (right, a metric used in Geometry-level analysis). **Columns:** SFT vs. RLVR. **Rows:** LLMs vs. Embedding Models. Dimension-wise correlation tests whether each coordinate basis in one model can be matched to the same coordinate in another model, while Linear CKA measures the global manifold geometry. The higher the diagonal scores, the more similar of two models.

**Table 2. Orthogonal Procrustes Analysis used in Representation-level analysis.** High inverse row entropy corresponds to one-to-one feature mapping.

**Table 3. k-NN mean overlap** across layers between base models and reasoning models (and their embedding model variants). Higher mean overlap indicates more preservation in local geometry of latent manifold.

**Figure 3. Functional-level analysis via Cross-Model Linear Probes.** For each dataset split (train, dev, test), the left bar corresponds to the M_base (or M_base^Emb) and the right bar corresponds to M_reason (or M_reason^Emb). The linear probe is trained on M_base (or M_base^Emb in embedding model analysis) representations and evaluated on both models. The smaller the ∆, the stronger the cross-model linear probe transfer, the more retain in the linear readout.

Here’s the kicker: When these RLVR-tuned models are then adapted into embedding models through contrastive learning, something remarkable happens. It’s like the head librarian (contrastive learning) comes in and ensures that all versions of the library, whether from the base model or the RLVR-tuned model, are aligned and intuitive for users. The system re-algin, effectively overwriting the reversible drifts and aligning the embedding spaces (except for the irreversible local geometry reorganization).

The Proof in the Pudding: Realignment in Action

The beauty of Manifold Realignment is not just theoretical; we can see it happening. Observe the training dynamics in the figure below:

**Figure 3. The training dynamics** of the embedding model pairs DeepSeek-R1-Distill-Qwen-1.5B-Reasoning-Embedding vs Nemotron-Research-Reasoning-Qwen-1.5B-Reasoning-Embedding. Step 0 indicates LLM backbones, and step 781 indicates the final checkpoint of the embedding models.

This chart visually demonstrates how quickly the models’ representations realign during contrastive learning. At Step 0 (the LLM backbone), there might be some differences. But as contrastive learning progresses, the representational similarity rapidly increases and stabilizes. This “realignment” ensures that despite the local tweaks made by RLVR, the models converge to highly similar and effective embedding spaces.

As you can see, even though RLVR can induce coordinate basis drift, contrastive learning acts as a powerful aligning force, ensuring the global semantic structure and linear readouts remain robustly consistent across base and reasoning-tuned models.

The “So What?”: Implications for AI Development

What does this mean for you, the developer or researcher building with LLMs?

This research suggests that RLVR primarily optimizes how models traverse an existing semantic landscape, rather than fundamentally redrawing it. This is incredibly important because it means:

You can gain the powerful reasoning benefits of RLVR without fear of “destroying” or degrading the underlying world knowledge and broad semantic understanding encoded in your base LLM’s embeddings. The RLVR process refines the model’s trajectory-generation policy without destructively overwriting valuable semantic relationships.
RLVR preserves the base model’s representational backbone, which is crucial for retaining broad generalization capabilities.

Our HRSA framework itself is a significant contribution, providing a new toolkit for interpretability studies, allowing us to disentangle how different training methods reshape the internal workings of AI models, and organizing the messy RSA framework. This insight opens doors for future training designs , perhaps even achieving similar effects to RLVR with geometry-aware regularization in SFT.

In essence, you get the best of both worlds: a smarter, more capable reasoning model, whose rich semantic representations remain fully effective for embedding tasks.

Paper: https://www.arxiv.org/abs/2601.21192
Codes: https://github.com/HKUST-KnowComp/Reasoning-Embedding
Models and data: https://huggingface.co/collections/lucaswychan/reasoning-embedding