0x00 背景
使用unity开发iOS游戏的都知道,如果使用IL2CPP选项编译unity游戏,会生成CPP代码,然后再从CPP代码编译成可执行文件。使用这种方式编译出的iOS游戏非常难以逆向破解,因为你得去阅读arm汇编。更糟的是所有游戏中使用的字符串都被保存在了一个叫global-metadata.dat的资源文件里,只有在动态运行时才会将这些字符串读入内存。这使得用IDA对游戏进行静态分析变得更加困难,于是我写了这么一个IDA插件,可以从global-metadata.dat里提取出字符串,赋值给IDA中相应的变量。
0x01 IL2CPP代码分析
首先获得IL2CPP代码,我们可以迅速定位到
//MetadataCache.cpp
void MetadataCache::Initialize()
{
s_GlobalMetadata = vm::MetadataLoader::LoadMetadataFile ("global-metadata.dat");
s_GlobalMetadataHeader = (const Il2CppGlobalMetadataHeader*)s_GlobalMetadata;
assert (s_GlobalMetadataHeader->sanity == 0xFAB11BAF);
assert (s_GlobalMetadataHeader->version == 21);
...
}
我们看一下MetadataLoader::LoadMetadataFile
是怎么样的:
//MetadataCache.cpp
void* MetadataLoader::LoadMetadataFile (const char* fileName)
{
std::string resourcesDirectory = utils::PathUtils::Combine (Runtime::GetDataDir (), "Metadata");
std::string resourceFilePath = utils::PathUtils::Combine (resourcesDirectory, fileName);
int error = 0;
FileHandle* handle = File::Open (resourceFilePath, File::kFileModeOpen, File::kFileAccessRead, File::kFileShareRead, File::kFileOptionsNone, &error);
if (error != 0)
return NULL;
void* fileBuffer = MemoryMappedFile::Map (handle);
File::Close (handle, &error);
if (error != 0)
{
MemoryMappedFile::Unmap (fileBuffer);
fileBuffer = NULL;
return NULL;
}
return fileBuffer;
}
可以发现这段代码什么也没做,就是把global-metadata.dat
给映射到内存中,返回了该内存的首地址,然后在刚刚的函数中将这个地址强制转换为了Il2CppGlobalMetadataHeader*
型。通过这分析这两个函数,我们就有了读取global-metadata.dat
的方法。
接下来,我们要看如何读取出所有字符串,还是在MetadataCache.cpp
中,有GetStringLiteralFromIndex
这么一个函数:
//MetadataCache.cpp
Il2CppString* MetadataCache::GetStringLiteralFromIndex (StringLiteralIndex index)
{
if (index == kStringLiteralIndexInvalid)
return NULL;
assert(index >= 0 && static_cast<uint32_t>(index) < s_GlobalMetadataHeader->stringLiteralCount / sizeof (Il2CppStringLiteral) && "Invalid string literal index ");
if (s_StringLiteralTable[index])
return s_StringLiteralTable[index];
const Il2CppStringLiteral* stringLiteral = (const Il2CppStringLiteral*)((const char*)s_GlobalMetadata + s_GlobalMetadataHeader->stringLiteralOffset) + index;
s_StringLiteralTable[index] = String::NewLen ((const char*)s_GlobalMetadata + s_GlobalMetadataHeader->stringLiteralDataOffset + stringLiteral->dataIndex, stringLiteral->length);
return s_StringLiteralTable[index];
}
循环调用这个函数,即可获得所有字符串。剩下的问题是,这些字符串的排列顺序与它们被定义的顺序并不一致,我们得找到它们被定义和被赋值的顺序。
我们可以从Il2CppMetadataUsage.cpp
中看到这样一个定义:
//Il2CppMetadataUsage.cpp
extern void** const g_MetadataUsages[7877] =
{
(void**)&Contraction_t1673853792_0_0_0_var,
(void**)&Level2Map_t3322505726_0_0_0_var,
(void**)&String_t_0_0_0_var,
(void**)&TypedReference_t1025199857_0_0_0_var,
(void**)&ArgIterator_t2628088752_0_0_0_var,
(void**)&Void_t1841601450_0_0_0_var,
...
...
...
(void**)&_stringLiteral2004437333,
(void**)&_stringLiteral3025533088,
(void**)&_stringLiteral3687436746,
(void**)&_stringLiteral2779811765,
(void**)&_stringLiteral273729679,
};
我们的目标是找到这些指针是如何被初始化的。
仔细看一下Bulk_Assembly-CSharp_0.cpp
这个文件,这是我们的游戏逻辑代码所在的文件,看一下这里面我们定义的函数长成什么样,随便取一个函数:
//Bulk_Assembly-CSharp_0.cpp
// System.Void EnemyAttack::Awake()
extern const MethodInfo* GameObject_GetComponent_TisPlayerHealth_t2894595013_m1131592430_MethodInfo_var;
extern const MethodInfo* Component_GetComponent_TisAnimator_t69676727_m475627522_MethodInfo_var;
extern Il2CppCodeGenString* _stringLiteral1875862075;
extern const uint32_t EnemyAttack_Awake_m1734153864_MetadataUsageId;
extern "C" void EnemyAttack_Awake_m1734153864 (EnemyAttack_t2992602076 * __this, const MethodInfo* method)
{
static bool s_Il2CppMethodIntialized;
if (!s_Il2CppMethodIntialized)
{
il2cpp_codegen_initialize_method (EnemyAttack_Awake_m1734153864_MetadataUsageId);
s_Il2CppMethodIntialized = true;
}
{
GameObject_t1756533147 * L_0 = GameObject_FindGameObjectWithTag_m829057129(NULL /*static, unused*/, _stringLiteral1875862075, /*hidden argument*/NULL);
__this->set_player_5(L_0);
GameObject_t1756533147 * L_1 = __this->get_player_5();
NullCheck(L_1);
PlayerHealth_t2894595013 * L_2 = GameObject_GetComponent_TisPlayerHealth_t2894595013_m1131592430(L_1, /*hidden argument*/GameObject_GetComponent_TisPlayerHealth_t2894595013_m1131592430_MethodInfo_var);
__this->set_playerHealth_6(L_2);
Animator_t69676727 * L_3 = Component_GetComponent_TisAnimator_t69676727_m475627522(__this, /*hidden argument*/Component_GetComponent_TisAnimator_t69676727_m475627522_MethodInfo_var);
__this->set_anim_4(L_3);
return;
}
}
可以看到,如果是第一次调用这个函数,il2cpp_codegen_initialize_method
会被调用,我们看一下这个函数:
//il2cpp-codegen.h
inline void il2cpp_codegen_initialize_method (uint32_t index)
{
il2cpp::vm::MetadataCache::InitializeMethodMetadata (index);
}
跟随它看一下InitializeMethodMetadata
这个函数:
//MetadataCache.cpp
void MetadataCache::InitializeMethodMetadata (uint32_t index)
{
assert(s_GlobalMetadataHeader->metadataUsageListsCount >= 0 && index <= static_cast<uint32_t>(s_GlobalMetadataHeader->metadataUsageListsCount));
const Il2CppMetadataUsageList* metadataUsageLists = MetadataOffset<const Il2CppMetadataUsageList*>(s_GlobalMetadata, s_GlobalMetadataHeader->metadataUsageListsOffset, index);
uint32_t start = metadataUsageLists->start;
uint32_t count = metadataUsageLists->count;
for (uint32_t i = 0; i < count; i++)
{
uint32_t offset = start + i;
assert(s_GlobalMetadataHeader->metadataUsagePairsCount >= 0 && offset <= static_cast<uint32_t>(s_GlobalMetadataHeader->metadataUsagePairsCount));
const Il2CppMetadataUsagePair* metadataUsagePairs = MetadataOffset<const Il2CppMetadataUsagePair*>(s_GlobalMetadata, s_GlobalMetadataHeader->metadataUsagePairsOffset, offset);
uint32_t destinationIndex = metadataUsagePairs->destinationIndex;
uint32_t encodedSourceIndex = metadataUsagePairs->encodedSourceIndex;
Il2CppMetadataUsage usage = GetEncodedIndexType (encodedSourceIndex);
uint32_t decodedIndex = GetDecodedMethodIndex (encodedSourceIndex);
switch (usage)
{
case kIl2CppMetadataUsageTypeInfo:
*s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = GetTypeInfoFromTypeIndex (decodedIndex);
break;
case kIl2CppMetadataUsageIl2CppType:
*s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = const_cast<Il2CppType*>(GetIl2CppTypeFromIndex (decodedIndex));
break;
case kIl2CppMetadataUsageMethodDef:
case kIl2CppMetadataUsageMethodRef:
*s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = const_cast<MethodInfo*>(GetMethodInfoFromIndex (encodedSourceIndex));
break;
case kIl2CppMetadataUsageFieldInfo:
*s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = GetFieldInfoFromIndex(decodedIndex);
break;
case kIl2CppMetadataUsageStringLiteral:
*s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = GetStringLiteralFromIndex (decodedIndex);
break;
default:
NOT_IMPLEMENTED (MetadataCache::InitializeMethodMetadata);
break;
}
}
}
这正是我们要找到函数,传入一个metadata的id,这个函数会判断它是MethodDef还是StringLiteral等等,然后存入数组对应的index中。我们看一下s_Il2CppMetadataRegistration->metadataUsages
是个什么东西。经过寻找可以发现在Il2CppMetadataRegistration.cpp
中有这样一个定义:
extern const Il2CppMetadataRegistration g_MetadataRegistration =
{
2174,
s_Il2CppGenericTypes,
616,
g_Il2CppGenericInstTable,
5277,
s_Il2CppGenericMethodFunctions,
9825,
g_Il2CppTypeTable,
5615,
g_Il2CppMethodSpecTable,
10674,
g_FieldOffsetTable,
2479,
g_Il2CppTypeDefinitionSizesTable,
9900,
g_MetadataUsages,
};
然后在Il2CppCodeRegistration.cpp
中可以发现对这个结构体的调用:
static void s_Il2CppCodegenRegistration()
{
il2cpp_codegen_register (&g_CodeRegistration, &g_MetadataRegistration, &s_Il2CppCodeGenOptions);
}
跟踪il2cpp_codegen_register
函数,可以在il2cpp-codegen.h
中找到:
inline void il2cpp_codegen_register (const Il2CppCodeRegistration* const codeRegistration, const Il2CppMetadataRegistration* const metadataRegistration, const Il2CppCodeGenOptions* const codeGenOptions)
{
il2cpp::vm::MetadataCache::Register (codeRegistration, metadataRegistration, codeGenOptions);
}
继续,在MetadataCache.cpp
中发现这段定义:
void MetadataCache::Register (const Il2CppCodeRegistration* const codeRegistration, const Il2CppMetadataRegistration* const metadataRegistration, const Il2CppCodeGenOptions* const codeGenOptions)
{
s_Il2CppCodeRegistration = codeRegistration;
s_Il2CppMetadataRegistration = metadataRegistration;
s_Il2CppCodeGenOptions = codeGenOptions;
for (int32_t j = 0; j < metadataRegistration->genericClassesCount; j++)
if (metadataRegistration->genericClasses[j]->typeDefinitionIndex != kTypeIndexInvalid)
metadata::GenericMetadata::RegisterGenericClass (metadataRegistration->genericClasses[j]);
for (int32_t i = 0; i < metadataRegistration->genericInstsCount; i++)
s_GenericInstSet.insert (metadataRegistration->genericInsts[i]);
}
现在我们发现,原来s_Il2CppMetadataRegistration
就是metadataRegistration
,那么s_Il2CppMetadataRegistration->metadataUsages
其实就是g_MetadataUsages
了。
把这些线索联系起来,我们可以得出,StringLiteral
都是由MetadataCache::InitializeMethodMetadata
这个函数来进行赋值的,我们只要将这个函数稍作修改,然后反复调用,即可按顺序复原所有StringLiteral
了。
0x02 IDA插件
现在我们有了一个按顺序复原到的StringLiteral
的表,我们只要将它映射到IDA中的代表StringLiteral
全局变量中即可,现在我们的目标就是如何在IDA中找到这些StringLiteral
变量。
回顾前一节的g_MetadataUsages
这个变量的特征,我们可以发现所有StringLiteral
都是在数组的最末尾,也就是说,如果我们在IDA中找到数组末尾的最后一个变量,我们就定位到了最后一个StringLiteral
所在的位置了。经过对比,我直接上结论好了:
首先定位到__const
段,往下拉,直到看到有很多QWORDXXX
,这就是g_MetadataUsages
的开始。然后就容易了,继续往下拉,直到QWORDXXX
的序列结束,这就是g_MetadataUsages
的结束,从这里开始运行我的IDA插件即可自动将所有QWORD
重命名。
g_MetadataUsages
开始之前的数据:
g_MetadataUsages
的开始:
g_MetadataUsages
的结束:
0x03 实现
参见还原使用IL2CPP编译的unity游戏的symbol(二)
0x04 一些设置
在用Visual Studio设置Additional Include Directories和Additional Library Directories时设置以下两个选项即可:
C/C++ -> General -> Additional Include Directories
Linker -> General -> Additional Library Directories