还原使用IL2CPP编译的unity游戏的symbol(一)

0x00 背景

使用unity开发iOS游戏的都知道,如果使用IL2CPP选项编译unity游戏,会生成CPP代码,然后再从CPP代码编译成可执行文件。使用这种方式编译出的iOS游戏非常难以逆向破解,因为你得去阅读arm汇编。更糟的是所有游戏中使用的字符串都被保存在了一个叫global-metadata.dat的资源文件里,只有在动态运行时才会将这些字符串读入内存。这使得用IDA对游戏进行静态分析变得更加困难,于是我写了这么一个IDA插件,可以从global-metadata.dat里提取出字符串,赋值给IDA中相应的变量。

0x01 IL2CPP代码分析

首先获得IL2CPP代码,我们可以迅速定位到

//MetadataCache.cpp
void MetadataCache::Initialize()
{
    s_GlobalMetadata = vm::MetadataLoader::LoadMetadataFile ("global-metadata.dat");
    s_GlobalMetadataHeader = (const Il2CppGlobalMetadataHeader*)s_GlobalMetadata;
    assert (s_GlobalMetadataHeader->sanity == 0xFAB11BAF);
    assert (s_GlobalMetadataHeader->version == 21);
    ...
}

我们看一下MetadataLoader::LoadMetadataFile是怎么样的:

//MetadataCache.cpp
void* MetadataLoader::LoadMetadataFile (const char* fileName)
{
    std::string resourcesDirectory = utils::PathUtils::Combine (Runtime::GetDataDir (), "Metadata");

    std::string resourceFilePath = utils::PathUtils::Combine (resourcesDirectory, fileName);

    int error = 0;
    FileHandle* handle = File::Open (resourceFilePath, File::kFileModeOpen, File::kFileAccessRead, File::kFileShareRead, File::kFileOptionsNone, &error);
    if (error != 0)
        return NULL;

    void* fileBuffer = MemoryMappedFile::Map (handle);

    File::Close (handle, &error);
    if (error != 0)
    {
        MemoryMappedFile::Unmap (fileBuffer);
        fileBuffer = NULL;
        return NULL;
    }

    return fileBuffer;
}

可以发现这段代码什么也没做,就是把global-metadata.dat给映射到内存中,返回了该内存的首地址,然后在刚刚的函数中将这个地址强制转换为了Il2CppGlobalMetadataHeader*型。通过这分析这两个函数,我们就有了读取global-metadata.dat的方法。

接下来,我们要看如何读取出所有字符串,还是在MetadataCache.cpp中,有GetStringLiteralFromIndex这么一个函数:

//MetadataCache.cpp
Il2CppString* MetadataCache::GetStringLiteralFromIndex (StringLiteralIndex index)
{
    if (index == kStringLiteralIndexInvalid)
        return NULL;

    assert(index >= 0 && static_cast<uint32_t>(index) < s_GlobalMetadataHeader->stringLiteralCount / sizeof (Il2CppStringLiteral) && "Invalid string literal index ");

    if (s_StringLiteralTable[index])
        return s_StringLiteralTable[index];

    const Il2CppStringLiteral* stringLiteral = (const Il2CppStringLiteral*)((const char*)s_GlobalMetadata + s_GlobalMetadataHeader->stringLiteralOffset) + index;
    s_StringLiteralTable[index] = String::NewLen ((const char*)s_GlobalMetadata + s_GlobalMetadataHeader->stringLiteralDataOffset + stringLiteral->dataIndex, stringLiteral->length);

    return s_StringLiteralTable[index];
}

循环调用这个函数,即可获得所有字符串。剩下的问题是,这些字符串的排列顺序与它们被定义的顺序并不一致,我们得找到它们被定义和被赋值的顺序。

我们可以从Il2CppMetadataUsage.cpp中看到这样一个定义:

//Il2CppMetadataUsage.cpp
extern void** const g_MetadataUsages[7877] = 
{
    (void**)&Contraction_t1673853792_0_0_0_var,
    (void**)&Level2Map_t3322505726_0_0_0_var,
    (void**)&String_t_0_0_0_var,
    (void**)&TypedReference_t1025199857_0_0_0_var,
    (void**)&ArgIterator_t2628088752_0_0_0_var,
    (void**)&Void_t1841601450_0_0_0_var,
    ...
    ...
    ...
    (void**)&_stringLiteral2004437333,
    (void**)&_stringLiteral3025533088,
    (void**)&_stringLiteral3687436746,
    (void**)&_stringLiteral2779811765,
    (void**)&_stringLiteral273729679,
};

我们的目标是找到这些指针是如何被初始化的。

仔细看一下Bulk_Assembly-CSharp_0.cpp这个文件,这是我们的游戏逻辑代码所在的文件,看一下这里面我们定义的函数长成什么样,随便取一个函数:

//Bulk_Assembly-CSharp_0.cpp
// System.Void EnemyAttack::Awake()
extern const MethodInfo* GameObject_GetComponent_TisPlayerHealth_t2894595013_m1131592430_MethodInfo_var;
extern const MethodInfo* Component_GetComponent_TisAnimator_t69676727_m475627522_MethodInfo_var;
extern Il2CppCodeGenString* _stringLiteral1875862075;
extern const uint32_t EnemyAttack_Awake_m1734153864_MetadataUsageId;
extern "C"  void EnemyAttack_Awake_m1734153864 (EnemyAttack_t2992602076 * __this, const MethodInfo* method)
{
    static bool s_Il2CppMethodIntialized;
    if (!s_Il2CppMethodIntialized)
    {
        il2cpp_codegen_initialize_method (EnemyAttack_Awake_m1734153864_MetadataUsageId);
        s_Il2CppMethodIntialized = true;
    }
    {
        GameObject_t1756533147 * L_0 = GameObject_FindGameObjectWithTag_m829057129(NULL /*static, unused*/, _stringLiteral1875862075, /*hidden argument*/NULL);
        __this->set_player_5(L_0);
        GameObject_t1756533147 * L_1 = __this->get_player_5();
        NullCheck(L_1);
        PlayerHealth_t2894595013 * L_2 = GameObject_GetComponent_TisPlayerHealth_t2894595013_m1131592430(L_1, /*hidden argument*/GameObject_GetComponent_TisPlayerHealth_t2894595013_m1131592430_MethodInfo_var);
        __this->set_playerHealth_6(L_2);
        Animator_t69676727 * L_3 = Component_GetComponent_TisAnimator_t69676727_m475627522(__this, /*hidden argument*/Component_GetComponent_TisAnimator_t69676727_m475627522_MethodInfo_var);
        __this->set_anim_4(L_3);
        return;
    }
}

可以看到,如果是第一次调用这个函数,il2cpp_codegen_initialize_method会被调用,我们看一下这个函数:

//il2cpp-codegen.h
inline void il2cpp_codegen_initialize_method (uint32_t index)
{
    il2cpp::vm::MetadataCache::InitializeMethodMetadata (index);
}

跟随它看一下InitializeMethodMetadata这个函数:

//MetadataCache.cpp
void MetadataCache::InitializeMethodMetadata (uint32_t index)
{
    assert(s_GlobalMetadataHeader->metadataUsageListsCount >= 0 && index <= static_cast<uint32_t>(s_GlobalMetadataHeader->metadataUsageListsCount));

    const Il2CppMetadataUsageList* metadataUsageLists = MetadataOffset<const Il2CppMetadataUsageList*>(s_GlobalMetadata, s_GlobalMetadataHeader->metadataUsageListsOffset, index);

    uint32_t start = metadataUsageLists->start;
    uint32_t count = metadataUsageLists->count;

    for (uint32_t i = 0; i < count; i++)
    {
        uint32_t offset = start + i;
        assert(s_GlobalMetadataHeader->metadataUsagePairsCount >= 0 && offset <= static_cast<uint32_t>(s_GlobalMetadataHeader->metadataUsagePairsCount));
        const Il2CppMetadataUsagePair* metadataUsagePairs = MetadataOffset<const Il2CppMetadataUsagePair*>(s_GlobalMetadata, s_GlobalMetadataHeader->metadataUsagePairsOffset, offset);
        uint32_t destinationIndex = metadataUsagePairs->destinationIndex;
        uint32_t encodedSourceIndex = metadataUsagePairs->encodedSourceIndex;

        Il2CppMetadataUsage usage = GetEncodedIndexType (encodedSourceIndex);
        uint32_t decodedIndex = GetDecodedMethodIndex (encodedSourceIndex);
        switch (usage)
        {
        case kIl2CppMetadataUsageTypeInfo:
            *s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = GetTypeInfoFromTypeIndex (decodedIndex);
            break;
        case kIl2CppMetadataUsageIl2CppType:
            *s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = const_cast<Il2CppType*>(GetIl2CppTypeFromIndex (decodedIndex));
            break;
        case kIl2CppMetadataUsageMethodDef:
        case kIl2CppMetadataUsageMethodRef:
            *s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = const_cast<MethodInfo*>(GetMethodInfoFromIndex (encodedSourceIndex));
            break;
        case kIl2CppMetadataUsageFieldInfo:
            *s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = GetFieldInfoFromIndex(decodedIndex);
            break;
        case kIl2CppMetadataUsageStringLiteral:
            *s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = GetStringLiteralFromIndex (decodedIndex);
            break;
        default:
            NOT_IMPLEMENTED (MetadataCache::InitializeMethodMetadata);
            break;
        }
    }
}

这正是我们要找到函数,传入一个metadata的id,这个函数会判断它是MethodDef还是StringLiteral等等,然后存入数组对应的index中。我们看一下s_Il2CppMetadataRegistration->metadataUsages是个什么东西。经过寻找可以发现在Il2CppMetadataRegistration.cpp中有这样一个定义:

extern const Il2CppMetadataRegistration g_MetadataRegistration = 
{
    2174,
    s_Il2CppGenericTypes,
    616,
    g_Il2CppGenericInstTable,
    5277,
    s_Il2CppGenericMethodFunctions,
    9825,
    g_Il2CppTypeTable,
    5615,
    g_Il2CppMethodSpecTable,
    10674,
    g_FieldOffsetTable,
    2479,
    g_Il2CppTypeDefinitionSizesTable,
    9900,
    g_MetadataUsages,
};

然后在Il2CppCodeRegistration.cpp中可以发现对这个结构体的调用:

static void s_Il2CppCodegenRegistration()
{
    il2cpp_codegen_register (&g_CodeRegistration, &g_MetadataRegistration, &s_Il2CppCodeGenOptions);
}

跟踪il2cpp_codegen_register函数,可以在il2cpp-codegen.h中找到:

inline void il2cpp_codegen_register (const Il2CppCodeRegistration* const codeRegistration, const Il2CppMetadataRegistration* const metadataRegistration, const Il2CppCodeGenOptions* const codeGenOptions)
{
    il2cpp::vm::MetadataCache::Register (codeRegistration, metadataRegistration, codeGenOptions);
}

继续,在MetadataCache.cpp中发现这段定义:

void MetadataCache::Register (const Il2CppCodeRegistration* const codeRegistration, const Il2CppMetadataRegistration* const metadataRegistration, const Il2CppCodeGenOptions* const codeGenOptions)
{
    s_Il2CppCodeRegistration = codeRegistration;
    s_Il2CppMetadataRegistration = metadataRegistration;
    s_Il2CppCodeGenOptions = codeGenOptions;

    for (int32_t j = 0; j < metadataRegistration->genericClassesCount; j++)
        if (metadataRegistration->genericClasses[j]->typeDefinitionIndex != kTypeIndexInvalid)
            metadata::GenericMetadata::RegisterGenericClass (metadataRegistration->genericClasses[j]);

    for (int32_t i = 0; i < metadataRegistration->genericInstsCount; i++)
        s_GenericInstSet.insert (metadataRegistration->genericInsts[i]);
}

现在我们发现,原来s_Il2CppMetadataRegistration就是metadataRegistration,那么s_Il2CppMetadataRegistration->metadataUsages其实就是g_MetadataUsages了。

把这些线索联系起来,我们可以得出,StringLiteral都是由MetadataCache::InitializeMethodMetadata这个函数来进行赋值的,我们只要将这个函数稍作修改,然后反复调用,即可按顺序复原所有StringLiteral了。

0x02 IDA插件

现在我们有了一个按顺序复原到的StringLiteral的表,我们只要将它映射到IDA中的代表StringLiteral全局变量中即可,现在我们的目标就是如何在IDA中找到这些StringLiteral变量。

回顾前一节的g_MetadataUsages这个变量的特征,我们可以发现所有StringLiteral都是在数组的最末尾,也就是说,如果我们在IDA中找到数组末尾的最后一个变量,我们就定位到了最后一个StringLiteral所在的位置了。经过对比,我直接上结论好了:

首先定位到__const段,往下拉,直到看到有很多QWORDXXX,这就是g_MetadataUsages的开始。然后就容易了,继续往下拉,直到QWORDXXX的序列结束,这就是g_MetadataUsages的结束,从这里开始运行我的IDA插件即可自动将所有QWORD重命名。

  1. g_MetadataUsages开始之前的数据:
    before metadatausages
  2. g_MetadataUsages的开始:
    metadata start
  3. g_MetadataUsages的结束:
    metadata end

0x03 实现

参见还原使用IL2CPP编译的unity游戏的symbol(二)

0x04 一些设置

在用Visual Studio设置Additional Include Directories和Additional Library Directories时设置以下两个选项即可:
C/C++ -> General -> Additional Include Directories
Linker -> General -> Additional Library Directories