golang汇编语法参考: https://go.dev/doc/asm

1. pseudo寄存器

  • SB: Static base pointer 全局基地址. 比如foo(SB)就是foo这个symbol的地址
  • FP: 帧指针. 用来传参的, 比如
    • first_arg+0(FP): 第一个参数
    • second_arg+8(FP): 第二个参数(64bit CPU)
  • SP: 栈指针. 指向栈顶. 用于局部变量. CPU都有物理SP, 语法上看前缀来区分:
    • x-8(SP), y-4(SP): 使用pseudo SP
    • -8(SP)使用物理SP
  • PC: 程序指针

2. 函数

格式: TEXT symbol(SB), [flags,] $framesize[-argsize]

  • symbol: 函数名
  • SB: SB伪寄存器
  • flags: 可以是
    • NOSPLIT: 不让编译器插入栈分裂的代码
    • WRAPPER: 不增加函数帧计数
    • NEEDCTXT: 需要上下文参数, 一般用于闭包
  • framesize: 局部变量大小, 包含要传给子函数的参数部分
  • argsize: 参数+返回值的大小, 可以省略由编译器自己推导


func swap(a, b int) (int, int)


TEXT ·swap(SB), NOSPLIT, $0-32
TEXT ·swap(SB), NOSPLIT, $0

这里-32是4个8字节的int, 即入参a, b和两个出参.

func swap(a, b int) (int, int)func swap(a, b, c, d int)func swap() (a, b, c, d int)func swap() (a, []int, d int)


3. 汇编举例

3.1. 汇编访问go的结构体


type reader struct {
    buf [bufSize]byte
    r   int


  • reader__size来表示reader结构体的size.
  • reader_bufreader_r分别表示结构体的两个域.

3.2. MOV的方向是从左到右. 和linux cp命令一致

MOVL    g(CX), AX     // Move g into AX.
MOVL    g_m(AX), BX   // Move g.m into BX.

3.3. 访问runtime的g结构体和timandy/routine的实现


3.3.1. 386和amd64

go标准库提供了go_tls.h, 其中定义了获取g的函数:

#include "go_tls.h"
#include "go_asm.h"
MOVL    g(CX), AX     // Move g into AX.
MOVL    g_m(AX), BX   // Move g.m into BX.

原理是使用一个不用的MMU寄存器来保存g, 把这个寄存器赋值给用户传入的寄存器CX, 这样CX就是g的指针.


#include "funcdata.h"
#include "go_asm.h"
#include "go_tls.h"
#include "textflag.h"

TEXT ·getgp(SB), NOSPLIT, $0-4
    MOVL    g(CX), AX
    MOVL    AX, ret+0(FP)

这里用到了get_tls()这个函数, 得到g的指针.

amd64的实现类似, 只是把MOVL换成MOVQ

3.3.2. arm

arm使用R10保存g, 但在汇编里要直接使用g, 不要使用R10, 因为R10不识别.

#include "funcdata.h"
#include "go_asm.h"
#include "textflag.h"

TEXT ·getgp(SB), NOSPLIT, $0-4
    MOVW    g, R8
    MOVW    R8, ret+0(FP)

这里的MOVW g, R8中的g, 就是R10

3.3.3. arm64


#include "funcdata.h"
#include "go_asm.h"
#include "textflag.h"

TEXT ·getgp(SB), NOSPLIT, $0-8
    MOVD    g, R8
    MOVD    R8, ret+0(FP)

3.3.4. mips/mips64


//go:build mips || mipsle
// +build mips mipsle

#include "funcdata.h"
#include "go_asm.h"
#include "textflag.h"

TEXT ·getgp(SB), NOSPLIT, $0-4
    MOVW    g, R8
    MOVW    R8, ret+0(FP)


//go:build mips64 || mips64le
// +build mips64 mips64le

#include "funcdata.h"
#include "go_asm.h"
#include "textflag.h"

TEXT ·getgp(SB), NOSPLIT, $0-8
    MOVV    g, R8
    MOVV    R8, ret+0(FP)

3.3.5. ppc64


//go:build ppc64 || ppc64le
// +build ppc64 ppc64le

#include "funcdata.h"
#include "go_asm.h"
#include "textflag.h"

TEXT ·getgp(SB), NOSPLIT, $0-8
    MOVD    g, R8
    MOVD    R8, ret+0(FP)

3.3.6. loong64

loong64使用R22保存g. 和MIPS不一样

//go:build loong64
// +build loong64

#include "funcdata.h"
#include "go_asm.h"
#include "textflag.h"

TEXT ·getgp(SB), NOSPLIT, $0-8
    MOVV    g, R8
    MOVV    R8, ret+0(FP)

3.3.7. riscv


#include "funcdata.h"
#include "go_asm.h"
#include "textflag.h"

TEXT ·getgp(SB), NOSPLIT, $0-8
    MOV    g, X10
    MOV    X10, ret+0(FP)

3.3.8. 总结

几个RISC的实现都差不多, 都是用一个特殊的寄存器来保存g的指针, 在汇编里直接用g来表示这个特殊寄存器.

在汇编里实现getgp, 在g.go里面声明它:

// getgp returns the pointer to the current runtime.g.
func getgp() unsafe.Pointer

3.4. 使用g的指针获取goroutine id


type g struct {
    goid         int64
    paniconfault *bool
    gopc         *uintptr
    labels       *unsafe.Pointer

知道g的地址和goid的偏移量, 就能得到goid

// getg returns current coroutine struct.
func getg() g {
    gp := getgp()
    if gp == nil {
        panic("Failed to get gp from runtime natively.")
    return g{
        goid:         *(*int64)(add(gp, offsetGoid)),
        paniconfault: (*bool)(add(gp, offsetPaniconfault)),
        gopc:         (*uintptr)(add(gp, offsetGopc)),
        labels:       (*unsafe.Pointer)(add(gp, offsetLabels)),

// Goid return the current goroutine's unique id.
func Goid() int64 {
    return getg().goid

3.4.1. 如何得到goid的偏移量

g的"标准"定义在runtime2.go, 它是个巨大的结构体:

type g struct {
    // Stack parameters.
    // stack describes the actual stack memory: [stack.lo, stack.hi).
    // stackguard0 is the stack pointer compared in the Go stack growth prologue.
    // It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
    // stackguard1 is the stack pointer compared in the C stack growth prologue.
    // It is stack.lo+StackGuard on g0 and gsignal stacks.
    // It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash).
    stack       stack   // offset known to runtime/cgo
    stackguard0 uintptr // offset known to liblink
    stackguard1 uintptr // offset known to liblink

    _panic    *_panic // innermost panic - offset known to liblink
    _defer    *_defer // innermost defer
    m         *m      // current m; offset known to arm liblink
    sched     gobuf
    syscallsp uintptr // if status==Gsyscall, syscallsp = sched.sp to use during gc
    syscallpc uintptr // if status==Gsyscall, syscallpc = sched.pc to use during gc
    stktopsp  uintptr // expected sp at top of stack, to check in traceback
    // param is a generic pointer parameter field used to pass
    // values in particular contexts where other storage for the
    // parameter would be difficult to find. It is currently used
    // in three ways:
    // 1. When a channel operation wakes up a blocked goroutine, it sets param to
    //    point to the sudog of the completed blocking operation.
    // 2. By gcAssistAlloc1 to signal back to its caller that the goroutine completed
    //    the GC cycle. It is unsafe to do so in any other way, because the goroutine's
    //    stack may have moved in the meantime.
    // 3. By debugCallWrap to pass parameters to a new goroutine because allocating a
    //    closure in the runtime is forbidden.
    param        unsafe.Pointer
    atomicstatus atomic.Uint32
    stackLock    uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
    goid         uint64
    schedlink    guintptr
    waitsince    int64      // approx time when the g become blocked
    waitreason   waitReason // if status==Gwaiting

    preempt       bool // preemption signal, duplicates stackguard0 = stackpreempt
    preemptStop   bool // transition to _Gpreempted on preemption; otherwise, just deschedule
    preemptShrink bool // shrink stack at synchronous safe point

    // asyncSafePoint is set if g is stopped at an asynchronous
    // safe point. This means there are frames on the stack
    // without precise pointer information.
    asyncSafePoint bool

    paniconfault bool // panic (instead of crash) on unexpected fault address
    gcscandone   bool // g has scanned stack; protected by _Gscan bit in status
    throwsplit   bool // must not split stack
    // activeStackChans indicates that there are unlocked channels
    // pointing into this goroutine's stack. If true, stack
    // copying needs to acquire channel locks to protect these
    // areas of the stack.
    activeStackChans bool
    // parkingOnChan indicates that the goroutine is about to
    // park on a chansend or chanrecv. Used to signal an unsafe point
    // for stack shrinking.
    parkingOnChan atomic.Bool

    raceignore    int8  // ignore race detection events
    tracking      bool  // whether we're tracking this G for sched latency statistics
    trackingSeq   uint8 // used to decide whether to track this G
    trackingStamp int64 // timestamp of when the G last started being tracked
    runnableTime  int64 // the amount of time spent runnable, cleared when running, only used when tracking
    lockedm       muintptr
    sig           uint32
    writebuf      []byte
    sigcode0      uintptr
    sigcode1      uintptr
    sigpc         uintptr
    parentGoid    uint64          // goid of goroutine that created this goroutine
    gopc          uintptr         // pc of go statement that created this goroutine
    ancestors     *[]ancestorInfo // ancestor information goroutine(s) that created this goroutine (only used if debug.tracebackancestors)
    startpc       uintptr         // pc of goroutine function
    racectx       uintptr
    waiting       *sudog         // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
    cgoCtxt       []uintptr      // cgo traceback context
    labels        unsafe.Pointer // profiler labels
    timer         *timer         // cached timer for time.Sleep
    selectDone    atomic.Uint32  // are we participating in a select and did someone win the race?

    // goroutineProfiled indicates the status of this goroutine's stack for the
    // current in-progress goroutine profile
    goroutineProfiled goroutineProfileStateHolder

    // Per-G tracer state.
    trace gTraceState

    // Per-G GC state

    // gcAssistBytes is this G's GC assist credit in terms of
    // bytes allocated. If this is positive, then the G has credit
    // to allocate gcAssistBytes bytes without assisting. If this
    // is negative, then the G must correct this by performing
    // scan work. We track this in bytes to make it fast to update
    // and check for debt in the malloc hot path. The assist ratio
    // determines how this corresponds to scan work debt.
    gcAssistBytes int64

虽然goid在这个结构体里, 但获取它的offset很有难度:

  • gruntime的私有结构体
  • goid前面还有好些个fields, 不好直接"目测"出offset

timandy/routine解决了这个难题. 下面是原理.

  1. 定义了"精简版"的g, 这里只关心goid, gopc等极少field
type g struct {
    goid         int64
    paniconfault *bool
    gopc         *uintptr
    labels       *unsafe.Pointer
  1. 在初始化的时候, 寻找runtime.g对应的类型, 利用反射得到goid这个域的offset
var (
    offsetGoid         uintptr
    offsetPaniconfault uintptr
    offsetGopc         uintptr
    offsetLabels       uintptr

func init() {
    gt := getgt()
    offsetGoid = offset(gt, "goid")
    offsetPaniconfault = offset(gt, "paniconfault")
    offsetGopc = offset(gt, "gopc")
    offsetLabels = offset(gt, "labels")

// getgt returns the type of runtime.g.
func getgt() reflect.Type {
    return typeByString("runtime.g")

// offset returns the offset of the specified field.
func offset(t reflect.Type, f string) uintptr {
    field, found := t.FieldByName(f)
    if found {
        return field.Offset
    panic(fmt.Sprintf("No such field '%v' of struct '%v.%v'.", f, t.PkgPath(), t.Name()))


// eface The empty interface struct.
type eface struct {
    _type unsafe.Pointer //"标准"的eface这里是 _type *_type
    data  unsafe.Pointer

// iface The interface struct.
type iface struct {
    tab  unsafe.Pointer //"标准"的iface这里是 tab *itab
    data unsafe.Pointer

// typelinks returns a slice of the sections in each module, and a slice of *rtype offsets in each module. The types in each module are sorted by string.
//go:linkname typelinks reflect.typelinks
func typelinks() (sections []unsafe.Pointer, offset [][]int32)

// resolveTypeOff resolves an *rtype offset from a base type.
//go:linkname resolveTypeOff reflect.resolveTypeOff
func resolveTypeOff(rtype unsafe.Pointer, off int32) unsafe.Pointer


  • 用了go:linkname黑科技, 绕过golang的小写限制, 访问reflect.typelinks()函数, 它返回所有module的section信息.
  • typeByString()函数遍历上面的typelinks(), 匹配每个类型的string()属性, 和runtime.g一致就返回其反射类型reflect.Type


// typeByString returns the type whose 'String' property equals to the given string, or nil if not found.
func typeByString(str string) reflect.Type {
    // The s is search target
    s := str
    if len(str) == 0 || str[0] != '*' {
        s = "*" + s
    // The typ is a struct iface{tab(ptr->reflect.Type), data(ptr->rtype)}
    typ := reflect.TypeOf(0)
    face := (*iface)(unsafe.Pointer(&typ))
    // Find the specified target through binary search algorithm
    sections, offset := typelinks()
    for offsI, offs := range offset {
        section := sections[offsI]
        // We are looking for the first index i where the string becomes >= s.
        // This is a copy of sort.Search, with f(h) replaced by (*typ[h].String() >= s).
        i, j := 0, len(offs)
        for i < j {
            h := i + (j-i)/2 // avoid overflow when computing h
            // i ≤ h < j
            face.data = resolveTypeOff(section, offs[h])
            if !(typ.String() >= s) {
                i = h + 1 // preserves f(i-1) == false
            } else {
                j = h // preserves f(j) == true
        // i == j, f(i-1) == false, and f(j) (= f(i)) == true  =>  answer is i.
        // Having found the first, linear scan forward to find the last.
        // We could do a second binary search, but the caller is going
        // to do a linear scan anyway.
        if i < len(offs) {
            face.data = resolveTypeOff(section, offs[i])
            if typ.Kind() == reflect.Ptr {
                if typ.String() == str {
                    return typ
                elem := typ.Elem()
                if elem.String() == str {
                    return elem
    return nil

3.5. 使用timandy/routine的goroutine local


type threadLocalMap struct {
    table []any

type thread struct {
    labels                  map[string]string //pprof
    magic                   int64             //mark
    id                      int64             //goid
    threadLocals            *threadLocalMap
    inheritableThreadLocals *threadLocalMap

如果g.labelnil, 就创建thread结构体, 并将其指针保存到g.label
注: g.label本来是给pprof用的.

threadLocals是个table: table []any, 用index来索引

timandy/routine提供的API routine.NewThreadLocalWithInitial返回一个ThreadLocal的interface变量, 一般用于初始化一个全局变量, 作用是这个全局变量在每个goroutine调用ThreadLocal.Get()的时候, 为新routine分配一个新的thread local的实例.


type uuidptr = *uuid.UUID

type infoPerRoutine struct {
    tracingID uuidptr

//一般用全局变量来做thread local
var routineLocal = routine.NewThreadLocalWithInitial(func() any {
    return &infoPerRoutine{}

func getRoutineLocal() *infoPerRoutine {
    return routineLocal.Get().(*infoPerRoutine)


4. arm64汇编


4.1. Register mapping rules

  1. All basic register names are written as Rn.
  2. Go uses ZR as the zero register and RSP as the stack pointer.
  3. Bn, Hn, Dn, Sn and Qn instructions are written as Fn in floating-point instructions and as Vn in SIMD instructions.

