CSharp去除字符串的BOM

什么是BOM?
BOM: Byte Order Mark
UTF-8 BOM又叫UTF-8 签名,其实UTF-8 的BOM对UFT-8没有作用,是为了支持UTF-16,UTF-32才加上的
BOM签名的意思就是告诉编辑器当前文件采用何种编码,方便编辑器识别,但是BOM虽然在编辑器中不显示,但是会产生输出,就像多了一个空行。
EF BB BF UTF-8
FF FE UTF-16 aka UCS-2, little endian
FE FF UTF-16 aka UCS-2, big endian
00 00 FF FE UTF-32 aka UCS-4, little endian
00 00 FE FF UTF-32 aka UCS-4, big-endian

这里说的是去除从其他地方读取来的字符串(字节数组),而不是在代码里面写的字符串

public static String GetUTF8String(Byte[] buffer){
    if(buffer==null){return null;}
    if(buffer.Length<=3){return Encoding.UTF8.GetString(buffer);}
    Byte[] bomBuffer=new Byte[]{0xEF,0xBB,0xBF};
    if(buffer[0]!=bomBuffer[0]){return Encoding.UTF8.GetString(buffer);}
    if(buffer[1]!=bomBuffer[1]){return Encoding.UTF8.GetString(buffer);}
    if(buffer[2]!=bomBuffer[2]){return Encoding.UTF8.GetString(buffer);}
    Encoding UTF8EncodingWhitoutBOM=new UTF8Encoding(false);
    return UTF8EncodingWhitoutBOM.GetString(buffer,3,buffer.Length-3);
}

用法

public void Main(){
    Byte[] byteJson = HttpRequestUtil.Get(targetURL);
    String stringJson = GetUTF8String(byteJson);
    Console.WriteLine(stringJson);
}